Perfecting PDD in 2023: Motivation
This essay sets the stage for my series to come, Perfecting PDD in 2023
If you are going to do AI/ML-enabled phenotypic drug discovery (PDD) in 2023, you will need to design a high throughput experimental workflow that connects novel perturbations (small molecules, siRNA, CRISPR) to sub-cellular imaging capabilities. To find “hits” (perturbations that induce desirable phenotypes that are “different enough” from DMSO) for secondary assay triaging, large-deck screens will follow this method and exploratory teams will work on a smaller scale to prepare their disease model (cell line and phenotype) to eventually run in this fashion. To sustain reproducibility along all research fronts, and give your models the best chance for success, a company may choose to standardize the following components of experimental setups: 1) Plates 2) Cells 3) Media 4) Compounds (management and dispensing) 5) Incubation conditions (robotics) 6) Staining (robotics) 7) Imaging (microscopy) 8) Analysis.
For the following essays, I will discuss best practices and opportunities for improvement in each of these domains. While reproducibility is very important, reducing cycle times is also paramount. Biotech should model the tech industry’s incredibly low-resistance environment that allows testing of new ideas to happen quickly, with high resolution feedback for errors (feedback for errors could mean troubleshooting for biologists, but in my mind it would really be designed for medicinal chemists to benefit the most from). Drug discovery, and really PDD HTS campaigns, should be thought of as needle-in-a-haystack profiling projects. My essays will weave these two themes of reproducibility and efficiency together to reveal a tapestry of a map (mapestry??) for PDD companies to follow going forward, to have the best chance of finding as many needles as possible!
In my mind, images integrate complex biological processes that are the result of on and off-target interactions between compounds and DNA/RNA/protein. Covalent and non-covalent interactions play out throughout the duration of the incubation, before being chemically frozen (fixed with PFA/FA), stained (IF/ FISH), and imaged. I sometimes think of compounds as wrenches that rattle around an engine (cells) until they get caught on a piston or chain and cause the engine to seize, which would then trigger behavioral changes (some of which are described in Inflection Points). Traditional biochemical assays and reporter cell lines, commonly used in traditional target-based drug discovery (TDD) are reductive: readouts are light-based and report on single points of complicated and extended signaling pathways. Also, cell engineering arguments can be made, but are ultimately straw men because all drug discovery companies depend on workhorse cell lines that have been transformed/ engineered to express ‘artificial’ and beneficial features.
Even if you believe that your endpoint is representative of the biology of interest, background (unmeasured) RNA and protein expression can vary greatly over time, and possibly even more if your program relies on engineered cell lines. For traditionally difficult pathways, which also serve the largest patient populations, monitoring these “background effects” will probably provide new insights to medicinal chemists tasked with creating the compounds. This is more support for the idea that images integrate complex biological processes.
In 2023, with the advent of powerful AI/ML models for image analysis, it is up to the data scientist to ultimately position these models (many of which exist and are freely available in the world) optimally to extract biological measurements that are chemically relevant. There is also an argument to be made that images are the richest biological informational space besides single cell RNA seq from 10x Genomics. And to follow up, imaging assays (especially those run through time) have the fastest turnaround times! I am obsessed with cellular imaging approaches in drug discovery primarily based on these two reasons. In my mind, hesitancy from senior computational biologists to approach target deconvolution and MAO-(re)annotation via image analysis is due to the human-generated noise (reagent degradation, temperature, time), and low-resolution whole-well averages of features or embeddings.
In traditional drug discovery, it is up to the biologist to do the “biological due diligence” so to speak, and basically guarantee the chemist that the measured phenotype of interest is both biologically relevant and will stand up to the eventual cytotoxicity of lead-like compounds (assay robustness). Biologists in the most conventional senses will also be responsible for all manual cell culture and liquid handling activities upstream of PDD screening campaigns. While Director-level biologists are currently responsible for triggering in vivo experiments and analyzing data, this could be handled by cross-trained medicinal chemists, or a separate team focused on translational biology/ development.
In the future, biologists will name the features in the biology, in the same way that training of a tumor biopsy analysis model requires a doctor to sit down and annotate images before computation can proceed. Biodock is an example of a tool that can bring humans into the loop in such a way, and the fact that it was only made commercially available in 2022 is more of a testament to the fact that we are in fact in the middle of an image analysis revolution. PDD companies will want to experience the benefits of such a period, and those that embrace the following principles will. Good luck to the others.