Basic information about the function of genes and cells is revealed by the expression-response of a cell to a genetic alteration. Using a readout of the expression response to a perturbation using single-cell RNA sequencing (scRNA-seq), perturba-seq is a new method for pooled genetic assessments. Perturb-seq enables engineering of cells to a certain state, sheds light on the gene regulatory system, and helps identify target genes for therapeutic intervention.
The efficiency, scalability and comprehensiveness of Perturb-Seq have been increased by recent technological developments. The number of tests required to evaluate various perturbations multiplies exponentially due to the great variety of biological contexts, cell types, states, and stimuli. This is because non-additive genetic interactions are a possibility. Running all experiments directly is impractical when there are billions of possible configurations.
According to recent research, the outcomes of disturbances can be predicted using machine learning models. They use pre-existing Perturb-seq data sets to train their algorithms, predicting the expression results of unseen perturbations, individual genes, or combinations of genes. Although these models are promising, they are flawed due to a selection bias introduced by the original experiment design, which affected the biological circumstances and perturbations chosen for training.
Researchers at Genentech and Stanford University introduce a new way of thinking about running a series of perturbation sequence experiments to investigate a perturbation space. In this paradigm, the Perturb-seq assay is carried out in a wet laboratory environment and the machine learning model is implemented using an interleaved sequential optimal design approach. Data acquisition and retraining of the machine learning model occur at each stage of the process. To ensure that the model can accurately forecast unprofiled perturbations, the researchers next use an optimal design technique to choose a set of perturbation experiments. To intelligently sample the perturbation space, one should consider the most informative and representative perturbations in the model, while allowing for diversity. This approach allows the creation of a model that has adequately explored the perturbation space with minimal perturbation experiments performed.
Active learning is based on this principle, which has been widely researched in machine learning. Document classification, medical imaging, and speech recognition are examples of the many areas that have put active learning into practice. The findings demonstrate that active learning methods that work require a large initial set of labeled examples (profiled perturbations in this case) along with multiple batches totaling tens of thousands of labeled data points. The team also performed an economic analysis showing that such conditions are not feasible due to the time and money limitations of iterative Perturb-seq in the laboratory.
To address the issue of active learning in a budget context for Perturb-seq data, the team provides a novel approach called ITERPERT (ITERative PERTurb-seq). Inspired by data-driven research, the main conclusion of this work is that it could be useful to complement data evidence with publicly available sources of prior knowledge, particularly in the early stages and when funds are scarce. Data on physical molecular interactions, such as protein complexes, Perturb-seq information from comparable systems, and large-scale genetic analyzes using other modalities, such as genome-scale optical clustering analysis, are examples of such background knowledge. Prior knowledge encompasses various forms of representation, including networks, text, images, and three-dimensional structures, which may be difficult to use when engaging in active learning. To address this, the team defines kernel Hilbert space replication across all modalities and uses a kernel fusion approach to merge data from different sources.
They performed intensive empirical research using a large-scale single-gene CRISPRi Perturb-seq data set obtained in a cancer cell line (K562 cells). They compared eight recent active learning methodologies to compare ITERPERT with other commonly used approaches. ITERPERT achieved accuracy levels comparable to the superior active learning technique while using training data containing three times fewer perturbations. By considering batch effects across iterations, ITERPERT demonstrated strong performance in critical gene- and genome-scale screens.
Review the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to join. our 34k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you'll love our newsletter.
Dhanshree Shenwai is a Computer Science Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking with a keen interest in ai applications. He is excited to explore new technologies and advancements in today's evolving world that makes life easier for everyone.
<!– ai CONTENT END 2 –>