Technique of the Week: LOWE
Recursion’s LLM workflow orchestration tool, LOWE, could make way for uniquely lean drug discovery teams and novel ways of collaborating across biotech/ pharma
I am Sidharth Sirdeshmukh, a drug discovery researcher at Dewpoint Therapeutics in Boston, MA. I spent high school and undergrad mostly doing synthetic biology research, briefly learned a variety of mass spectrometry techniques, and am now neck deep in the wonderful world of cells.
Welcome to my new series. Most of my posts will summarize emerging research methods and describe potential upside and downside. I’m mainly interested in new tools, and how they could be used in hardcore discovery environments. I hope these posts will be engaging and informative for students, scientists, and various members of the investor community with interest in tech-enabled discovery strategies.
Recursion’s LOWE:
Demo:
This essay focuses more on the possible applications of LLM-agents in biotech, rather than the technical aspects of LLM-agents or LOWE in particular.
Description:
Workflow orchestration is a computational method used to automate sequential data processes. While traditional orchestrators have been around since 2014 (ie. Docker and Kubernetes), LLMs bring unique benefits like constant refreshing, and optimized efficiency (link, link). Users can ask questions in plain text, and the model will probe all of the data resources (raw numerical, foundation models etc.) available to return quantitative answers in plain text. The excitement and promise of generative AI is driving interest in more sophisticated orchestration techniques, as a way to improve quality and volume of insights for organizations from their own data (link). Biotech/ pharma likely see publicly accessible workflow orchestration as a method that can empower research teams, and extract maximum value from their own rich, proprietary internal experimental datasets.
Recursion’s LOWE, an LLM-Orchestrated Workflow Engine, is the first example of such a model in biotech/ pharma being made available to the public. Their 13 minute demo (linked above) demonstrates how their platform can be simply prompted to find small molecule starting points for lesser known protein targets in high unmet need indications. Beyond finding starting points, generative chemistry algorithms accessible to LOWE (Matchmaker, acquired from Cyclica) can create chemical analogs that are predicted to bind to the chosen target and send them for testing on Recursion’s phenotypically-focused automated wet lab to confirm whether new compounds are ‘actually’ active against the target (CRISPR-induced morphological change pseudo target) in a real cell. This process can be repeated until compounds with sufficiently desirable properties are created.
Assessment:
The positives:
Tools like LOWE can improve parallelization and ultimately the efficiency of drug discovery. For companies with their own phenotypes of interest (ie. Eikon and Cellarity etc.), where target deconvolution based on ligand binding to proteins comes much later on in the process, LOWE actually gives them a substantial head start. This is because they can prioritize testing and refining compounds that are both probable binders of a desirable target, and also induce a desired, proprietary cellular phenotype, rather than just having one or the other.
Suppose Cellarity has run a CRISPR screen in a proprietary engineered cell line, and clustered single or combinatorial knock outs according to their transcriptomics data. Now, the team can plug in their own ‘lesser known gene associations’ into LOWE and cyclically query the Recursion compound database and generative algorithms until they find suitable starting points. It is worth noting that any existing phenotypic company could follow the steps of 1) CRISPR screen to find novel targets 2) Structure-based method to find starting points against targets 3) HT-ADME and phenotype confirmation without using LOWE, but we have not seen any examples of this yet.
LLM workflow orchestrators also empower non-chemists and lower ranking team members by reducing the resistance between hypothesis and experiment. LOWE allows scientists without computational chemistry backgrounds to pick targets, apply ADME/ PK filters, and generate new compound structures using the built-in Matchmaker feature. It is likely that most drug discovery companies will have access similar generative methods, however biologists usually need the support of higher ranking peers before compounds are ordered. So, in traditional drug discovery, where only medicinal chemists can design and order new compounds, maybe LOWE (and LLM workflow orchestrators in general) levels the playing field and actually accelerates programs because now everyone is a capable designer.
A related point, is that if these tools can be publicly accessed, then increasing the number of prompts increases the probability of finding valuable chemical matter, akin to bringing more miners to the mine. Beyond empowering (and increasing) teammates, perhaps these new tools will lead to unique, novel company structures in drug discovery. If LOWE can find lesser known protein targets according to genetics and cell morphology associations, then could there be lean, virtual teams, who resolve them (or pay a CRO), and then manage cheaper target-based computational programs to refine and develop NMEs? Finding new targets and target associations is the power of phenotypic drug discovery in my eyes, so perhaps crystallographer and computational chemist duos could become a trendy founder team archetype that will emerge around this platform.
A future for meta-orchestrators:
Finally, is it possible that as other biotech/ pharma companies make their own tools available, that we could see new founders developing their own orchestrators (meta-orchestration) to oversee and interact with numerous existing orchestrators? Perhaps the meta-orchestrator charges a hefty subscription fee to users, and develops an automated process for royalty splits among companies whose orchestrators were integrated during a search process? Could phenotypic drug discovery companies substitute their own phenotypic data into the meta-orchestrator (because they own the data/ patented the features), in place of Recursion’s?
In 10 years, could we imagine high schoolers spending their weekends querying meta-orchestrators to find novel therapeutic structures? Maybe prompters will depend heavily on just a few datasets or models under the umbrella, leading those companies to unite, while others with less valuable data assets will fade away. Meta-orchestrators could integrate learnings from real experimental outcomes, eventually planning and running wet-lab experiments without any human input. Also, what kind of unique power would the most popular meta-orchestrator have? And would this meta-orchestrator need to have its own unique internal data to encapsulate everyone else?
Challenges/concerns:
At the highest level, it is not clear whether LOWE’s value comes from its ability to find novel molecular entities (NMEs), or whether it generally accelerates TTL (time to Lead). Given recent interest in NVIDIA’s dependence on their data center revenues (link) which support generative AI models, it may be the case that NVIDIA (especially given their recent collaborations with Terray and Genentech) is more focused on infrastructure development rather than any initial outcomes. Perhaps this is another attempt to ‘make a market’ in virtual lab access from their pharma and biotech partners to increase data center use.
Given many of Recursion’s foundation models have strict licenses (link), maybe developing a small academic user base will allow Recursion to optimize their orchestrator and make it usable before it is useful in any commercial sense. Academics are not driven by generating clinical assets, so demonstrating downstream value with this user base could be a challenge. The upside is that Recursion seems like the right team to pilot this experiment with, given their intermediate size (relative to Terray and Genentech), which allows them to scale up but still pivot according to ongoing customer input.
Question 1: What are the limits of generalizability for foundation models developed from a single cell line, concentration, and time?
Since Recursion (like most drug discovery companies) follows the protein dependency paradigm, it seems counterintuitive to refer solely to HUVEC phenotypes for all diseases, rather than other more specific cell types. Also, given this CRISPR screen is run at a single time and concentration, the phenotypic reference is highly conditional. It is likely that changing concentration and time will result in modified target associations, which will obviously impact all downstream steps in this process.
Question 2: What do medicinal chemists think about Matchmaker, and what other functional data does LOWE have access to?
LOWE uses Matchmaker to predict compound structures against chosen protein targets. Can this platform interchangeably incorporate actual crystal structures or predicted (ie. AlphaFold I/II) structures? It is unclear whether Matchmaker is better than tools that computational chemistry teams are using for similar tasks (ie. Stardrop, Schrödinger etc.) for generating new compounds. And, while ADME/ PK are valuable filters, what about other functional readouts (ie. viability, qPCR etc.) typically required to move compounds down the value chain?
Question 3: What logistical hurdles stand in the way of seamless adoption of LOWE, and other LLM workflow orchestrators in biotech/ Pharma?
The demo mentioned that compound ordering takes 6-8 weeks, and running tests in the wet lab would likely increase timelines. More complicated hypothesized molecules from Matchmaker may be more difficult to physically synthesize and modify, which would make cycle times even longer. Because of this, while LOWE may add some diversity to ongoing internal chemistry approaches, I do not think that it can fully drive optimization efforts currently.