Rapid ID of Potential Anti-COVID-19 Agents Powered by Bridges-AI
AI Identifies More Than 20,000 Compounds with Possible Anti-Virus Activity, Thousands of Times Faster than Earlier Methods
by Ken Chiacchia
The COVID-19 pandemic has shown that speed can be as important as quality in medical research. A team from Carnegie Mellon University has developed a new computational pipeline for greatly speeding up identification of possible anti-COVID candidates using artificial intelligence (AI) on PSC’s Bridges-AI system. They used this tool to screen about 5 billion chemical compounds to select a small number of candidates for combating the disease, thousands of times faster than possible with previous methods.
COVID protease with an inhibitor molecule (light blue) in the active site
Why It’s Important
The success, so far, of the rapid COVID-19 immunization effort serves to hammer home a lesson we’d all already learned. Namely, in some medical research scenarios, speed is every bit as important as accuracy.
To help people who are infected before they can get vaccinated, as well as those who medically can’t be vaccinated, scientists are also still searching for medications that can disrupt the SARS-CoV-2 virus’s life cycle. That effort, too, needs to be fast as well as good. One method for finding new COVID-19 drugs is to simulate the interactions of candidate molecules with the target proteins that the virus needs to infect people. This saves the prohibitive time and expense of lab-testing every candidate by allowing scientists to test only the most promising. But the standard method of simulating large proteins with candidate drugs depends on the complex rules of quantum chemistry. This takes enormous computing power. It typically requires weeks to test a library of molecules.
“As computational chemists, when COVID happened we tried to think of what we could do to help. One idea, and what we’ve been doing in the past few years, is using AI for drug discovery. The traditional way is mostly a physics-based method to predict the binding between small molecules and proteins. And this is relatively slow. It still takes hours per compound. You use extremely large machines and it takes you days or weeks to test a library of compounds; so, you have a limited throughput.”—Olexandr Isayev, Carnegie Mellon University
Olexandr Isayev of Carnegie Mellon University wondered whether it would be possible to use the power of AI to supercharge that search. Working with colleagues at the University of North Carolina Chapel Hill, where he began the effort, and the University of Florida, he turned to the Bridges-AI supercomputer at PSC to make it work.
How PSC Helped
A flavor of AI called convolutional neural networks (CNN) has been incredibly successful in some fields. CNNs running on graphics processing units, or GPUs, have fueled a revolution in the ability of AI to recognize objects in images. But the quantum chemistry of large molecules relies on a lot more than what they look like. The information needed is much more complicated than a simple image, and strains CNNs’ capabilities. Work by other scientists has created quantum-based neural network potentials (NNPs) that can make accurate predictions for specific combinations of molecules. These tools are tens of thousands of times faster than classical quantum computations. But they have no ability to generalize. Present them with another set of molecules, and they’d be nearly useless.
Isayev and his colleagues had an idea to simulate the molecules in a way that avoided the massive quantum computations. Their plan would speed the computation by offloading the complexity onto a large database of molecular characteristics. It would require both the speed of GPUs and massive data handling capability. The National Science Foundation-funded Bridges-AI was perfect for the work. For one thing, it was designed from the outset for leading-edge speed in GPU-based AI training. It also offers Big Data capabilities through its confederation with the larger Bridges platform. The scientists gained access to Bridges-AI through an allocation from the COVID-19 high performance Computing Consortium.
“Bridges-AI has these state-of-the-art NVIDIA Tesla V100 chips. Those are critical. Also, in terms of data passing, the architecture of Bridges-AI is perfect. We interacted with Shawn Brown, PSC’s Director, and a couple of admins at PSC; the technical challenge was to orchestrate these workflows. The fantastic relationship we have with PSC was extremely important to the success of this project.”—Olexandr Isayev, Carnegie Mellon University
The scientists used Bridges-AI to train and test their NNP tool, called ANI (short for ANAKIN-ME, Accurate NeurAl networK engINe for Molecular Energies). The AI first learned how to predict interactions between drugs known to be effective against SARS-CoV-2 with three target proteins—two from the virus, one from human cells. By trial and error, ANI pruned connections between “layers” of computer processors reacting to specific characteristics of the molecules until it was reproducing the known interactions. Then the team tested ANI against another set of known drugs, this time without ANI “knowing” the answers ahead of time. Going back and forth between training and testing, the scientists honed the program’s ability to predict interactions accurately.
In a final computational step, the team used their AI on several databases containing about 5 billion chemical compounds, including antiviral compounds and FDA-approved or investigational drugs. In just a day, ANI narrowed the field by predicting which were most likely to interfere with the target proteins in a way that would block infection.
Isayev and his colleagues made datasets of structures and properties of the most promising anti-COVID agents—20,000 antiviral compounds and FDA approved drugs—freely available to the research community here. They also entered in the European Union’s COVID Challenge project. In the next step of that competition, several groups of lab scientists are working on synthesizing and testing the compounds in virus-related tests. The work was also recognized by an Editors’ Choice Award from HPCWire, a leading publication in the high performance computing field, for “Best Use of high performance Data Analytics & Artificial Intelligence” during the virtual 2020 International Conference for High Performance Computing, Networking, Storage and Analysis (SC20).
Isayev’s team is also preparing a paper on the work for submission to a peer-reviewed journal. You can read an earlier paper on their development of ANI here.
Olexandr Isayev, Carnegie Mellon University