We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each “common class” that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from “rare classes” whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew. We describe an active learning approach that (1) explicitly searches for rare classes by leveraging the contextual embedding spaces provided by modern language models, and (2) incorporates a stopping rule that ignores classes once we prove that they occur below our target threshold with high probability. We prove that our algorithm only costs logarithmically more than a hypothetical approach that knows all true label frequencies and show experimentally that incorporating automated search can significantly reduce the number of samples needed to reach target accuracy levels.
We address the ad hoc document retrieval task by devising novel types of entity-based language models. The models utilize information about single terms in the query and documents as well as term sequences marked as entities by some entity-linking tool. The key principle of the language models is accounting, simultaneously, for the uncertainty inherent in the entity-markup process and the balance between using entity-based and term-based information. Empirical evaluation demonstrates the merits of using the language models for retrieval. For example, the performance transcends that of a state-of-the-art term proximity method. We also show that the language models can be effectively used for cluster-based document retrieval and query expansion.
We address the query-performance-prediction task for entity retrieval; that is, retrieval effectiveness is estimated with no relevance judgments. First we show how to adapt state-of-the-art query-performance predictors proposed for document retrieval to the entity retrieval domain. We then present a novel predictor that is based on the cluster hypothesis. Evaluation performed with the INEX entity ranking track collections shows that our predictor can often outperform the most effective predictors we experimented with.
We address the core challenge of the entity retrieval task: ranking entities in response to a query by their presumed relevance to the information need that the query represents. As an initial research direction we explored two models for entity ranking that were evaluated using the INEX entity ranking dataset and which posted promising performance. A natural future direction to explore is how to generalize these models to address various types of information needs that are associated with entities.
In this work we study the cluster hypothesis for entity oriented search (EOS). Specifically, we show that the hypothesis can hold to a substantial extent for several entity similarity measures. We also demonstrate the retrieval effectiveness merits of using clusters of similar entities for EOS.
In this work we present a general model for entity ranking that is based on the Markov Random Field approach for modeling various types of dependencies between the query and the entity. We show that this model actually extends existing approaches for entity ranking while aggregating all pieces of relevance evidences in a unified way. We evaluated the performance of our model using the INEX datasets. Our results show that our ranking model significantly out-performs leading INEX systems in the tracks of 2007 and 2008, and is equivalent to the best results achieved in the 2009 track.
We developed 224Ra-loaded wires that when inserted into solid tumors, release radioactive atoms that spread in the tumor and irradiate it effectively with alpha particles (diffusing alpha-emitters radiation therapy [DaRT]). In this study, we tested the ability of intratumoral 224Ra-loaded wires to control the local growth of pancreatic tumors and the enhancement of this effect by chemotherapy. Pancreatic mouse tumors (Panc02) were treated with 224Ra-loaded wire(s) with or without gemcitabine. The tumor size and survival were monitored, and autoradiography was performed to evaluate the spread of radioactive atoms inside the tumor. Mouse and human pancreatic cancer cells, irradiated in vitro by alpha particles with or without chemotherapy, were evaluated for cell growth inhibition. The insertion of 224Ra-loaded wires into pancreatic tumors in combination with gemcitabine achieved significant local control and was superior to each treatment alone. A dosimetric analysis showed the spread of radioactive atoms in the tumor around the wires. Alpha particles combined with gemcitabine or 5-FU killed mouse and human cells in vitro better than each treatment alone. DaRT in combination with gemcitabine was proven effective against pancreatic tumors in vivo and in vitro, and the process may be applicable as a palliative treatment for patients with pancreatic cancer.
This study investigates the local control of lung-derived tumors through the use of diffusing alpha-emitting atoms released from intratumoral wires loaded with radium-224. Experimental results demonstrate effective control of tumor growth through localized irradiation, with implications for novel cancer treatments.
This research explores the efficacy of intra-tumoral Ra-224 loaded sources in inhibiting tumor growth and inducing necrosis in experimental solid malignant tumors. The findings highlight the therapeutic potential of alpha-emitting atom diffusion as a targeted cancer treatment strategy.
The study examines the impact of interstitial Ra-224 loaded wires on the growth retardation and survival prolongation of experimental lung carcinoma. Diffusing alpha-emitting atoms demonstrate a promising approach to localized cancer therapy with significant improvements in survival rates.