New AI Model by Google and ZebiAi
In a study published within the week in
the Journal of Medicinal Chemistry, researchers at Google, together with X-Chem Pharmaceuticals, demonstrated an AI approach for identifying biologically active molecules employing a combination of physical and virtual screening processes. It led to the creation of the Chemome Initiative, which launches today — a collaboration between Google’s Accelerated Science team and startup ZebiAI that aims to enable the invention of the many more small molecule chemical probes for scientific research.
As a part of the Chemome Initiative, Google says that ZebiAI will work with researchers to spot proteins of interest and source screening data the Accelerated Science team will use to coach AI models. These models will make predictions on commercially available libraries of small molecules — chemical probes that aren’t useful as drugs, but that selectively inhibit or promote the function of specific proteins — which will be provided to researchers for activity testing to advance some programs through discovery.
Making sense of the biological networks that support life and produce disease may be a complex task. One approach is using small molecules; during a biological system (e.g., cancer cells growing during a dish), they will be added at a selected time to watch how the system responds when a protein has increased or decreased activity.
Despite how useful chemical probes are for this type of biomedical research, only 4% of human proteins have a known chemical probe available. In an attempt to isolate new ones, Google and X-Chem Pharmaceuticals turned to the sector of AI and machine learning.
VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free Livestream.
As the coauthors of the study explain, chemical probes are identified by scanning the space of small molecules during a target protein to differentiate “hit” molecules which will be further tested. The physical a part of the method uses DNA-encoded small-molecule libraries (DELs) that contain many distinct small molecules in one pool, each of which is attached to a fraction of DNA serving as a “barcode” for that molecule. One generates many chemical fragments alongside a standard chemical handle. The results are pooled and split into separate reactions, where a group of distinct fragments with another chemical handle is added.
The chemical fragments from the 2 steps react and fuse together at the common chemical handles, and they’re connected to create one continuous barcode for every molecule. Once a library has been generated, it is often wont to find the tiny molecules that bind to the protein of interest by mixing the DEL with the protein and washing away the tiny molecules that don’t attach. Sequencing the remaining DNA barcodes produces many individual reads of DNA fragments which will then be processed to estimate which of the billions of molecules within the original DEL interact with the protein.
Google Chemome Initiative
Above: The fraction of molecules from those tested showing various levels of activity, comparing predictions from the classifier and random forests on three protein targets.
To predict whether an arbitrarily chosen small molecule will bind to a target protein, the researchers built a machine learning model — specifically a graph convolutional neural network, a kind of model designed for graph-like inputs like small molecules. The physical screening with the DEL provides positive and negative examples for a classifier, such the tiny molecules remaining at the top of the screening process are positive examples and everything else is negative examples.
The team physically screened three diverse proteins using DEL libraries: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). Using the DEL-trained models, they then virtually screened large make-on-demand libraries from drug discovery platform Mcule and an indoor molecule library at X-Chem to spot a group of molecules predicted to point out affinity with each protein target. Lastly, they compared the results of their classifier to a random forest model, a standard method for virtual screening that uses standard chemical fingerprints. They report that the classifier significantly outperformed the RF model in discovering potent candidates.
The team tested almost 2000 molecules across the three targets, which it claims is that the largest published prospective study of virtual screening so far.
“We’re excited to be a neighborhood of the Chemome Initiative enabled by the effective ML techniques described here and appearance forward to its discovery of the many new chemical probes. We expect the Chemome will spur significant new biological discoveries and ultimately accelerate new therapeutic discovery for the planet,” Google wrote during a blog post. “While more validation must be done to form the hit molecules useful as chemical probes, especially for specifically targeting the protein of interest and therefore the ability to function correctly in common assays, having potent hits may be a big breakthrough within the process.”