How supercomputers are used in research?
Alongside the White House Office of Science and Technology Policy (OSTP), IBM announced in March that it might help coordinate an attempt to supply many petaflops of computing to scientists researching the coronavirus. As a part of the newly launched COVID-19 High-Performance Computing (HPC) Consortium, IBM pledged to help in evaluating proposals and to supply access to resources for projects that “make the foremost immediate impact.”
Much work remains, but a number of the Consortium’s most prominent members — among them Microsoft, Intel, and Nvidia — claim that progress is being made.
Petaflops of computer
Powerful supercomputers allow researchers to undertake high volumes of calculations in epidemiology, bioinformatics, and molecular modeling, many of which might take months on traditional computing platforms (or years if done by hand). Moreover, because computers are available within the cloud, they allow teams to collaborate from anywhere within the world.
Insights generated by the experiments can help advance our understanding of key aspects of COVID-19, like viral-human interaction, viral structure and performance, small molecule design, drug repurposing, and patient trajectory and outcomes. “Technology may be a critical a part of the COVID-19 research happening immediately everywhere the planet,” Dell Technologies VP Thierry Pellegrino told VentureBeat. (Dell Technologies may be a member of the Consortium.) “It’s crucial to the population of our planet that researchers have the tools to know, treat, and fight this virus. Researchers round the world are true heroes doing important work under extreme and unfamiliar circumstances, and that we couldn’t be prouder to support their efforts.”
Companies and institutions matched 62 projects within the U.S., Germany, India, South Africa, Saudi Arabia, Croatia, Spain, the U.K., and other countries with supercomputers from Google Cloud, Amazon Web Services (AWS), Microsoft Azure, IBM, and dozens of educational and nonprofit research institutions for free of charge. These are running on over 136,000 nodes containing 5 million processor cores and quite 50,000 graphics cards, which together deliver over 483 petaflops (430 trillion floating-point operations per second) of computing across hardware maintained by the Consortium’s 40 partners.
In addition to supercomputing infrastructure built atop its Azure cloud computing platform, Microsoft is providing researchers with networking and storage resources integrated with workload orchestration through Azure HPC. Concurrent with this is often the company’s AI for Health program, which in April allocated $20 million to developments in five key areas — data and insights, treatment and diagnostics, allocation of resources, dissemination of accurate information, and research project — to bolster work associated with COVID-19.
As a neighborhood of its work with the Consortium, Microsoft says it’s providing teams access to its scientists spanning AI, HPC, quantum computing, and other areas of computing at Microsoft Research et al. Much of those researchers’ work so far has entailed basic scientific discovery about COVID-19 itself and the way it interacts with the human host, including the planning of therapeutics, through:
- Research simulations.
- Molecular dynamics modeling.
- 3D mapping of virus protein structures.
Compound screening to ascertain if existing drug molecules are ready to inhibit cellular entry of the virus.
Microsoft says each organization it collaborates with receives a full Azure HPC environment, including Azure CycleCloud with the Slurm workload manager, best-fit Azure Virtual Machines, and storage. These are configured to scale on-demand and meet compute as necessary, and they’re tailored to the precise research needs of the grantee.
Nepali modeling and ventilator splitting
Through the Consortium, Microsoft’s AI for Health is supporting the nonprofit research institute Nepal applied math and Informatics Institute for Research (NAAMII), which is employing simulation to model how COVID-19 would spread among the Nepali population, given different scenarios. These models, Microsoft says, can show patterns that may potentially save lives and livelihoods.
Duke University, another grantee, is leveraging Azure to research ventilator splitting, a way that permits multiple patients to use an equivalent ventilator. The Matlab division of MathWorks teamed up with Microsoft to optimize the researchers’ analysis for distributed computing environments.
Google continues to supply compute, storage, and workload management services to Consortium grantees through Google Cloud Platform, and it recently put aside $20 million in computing credits for tutorial institutions and researchers studying COVID-19 treatments, therapies, and vaccines. As a neighborhood of its work with the Consortium, the corporate is collaborating on epidemiological modeling with Northeastern researchers and applying AI to medical imaging with the Complutense University of Madrid.
Google also partnered with the Harvard Global Health Institute to fund companies, government agencies, nonprofit organizations, and institutions performing on COVID-19 research. The tech giant — alongside Microsoft — also began a program with Microsoft-backed cloud company Rescale to supply HPC resources at no cost to groups working to develop COVID-19 testing and vaccines. Rescale provides the platform that researchers launch experiments and record results on, while Google and Microsoft supply the backend computing resources.
Amazon, like Google, is supplying compute and tools to researchers matched through the Consortium. Over 11 teams are currently using its infrastructure and dedicated Amazon Web Services solution architects conference with the scientists hebdomadally.
As a neighborhood of its AWS Diagnostic Development Initiative, Amazon is additionally providing $20 million in computing credits to over 35 institutions and personal companies that are leveraging AWS to further the event of COVID-19 point-of-care diagnostics — i.e., testing which will be done reception or at a clinic with same-day results. “This may be a global health emergency which will only be resolved by governments, businesses, academia, and individuals working together to raised understand this virus and ultimately find a cure,” said Teresa Carlson, VP of worldwide public sector at AWS, during a statement.
Developing protein decoys
At the MIT Media Lab, inspired by a researcher at Johns Hopkins University, a team is identifying “decoy” proteins of ACE2 receptors (the receptors coronaviruses bind to inside the human body) which may render COVID-19 inert. employing a machine learning model trained on data about the ACE2 receptor and running on AWS, the researchers are trying to predict which variants of the decoy won’t interact with other proteins within the body and cause harmful side effects. If all goes well, tests in mice will commence soon, with clinical trials beginning toward the top of summer.
In separate efforts, AWS is empowering researchers at the Children’s National Hospital to mix many data sets to spot genes that may be targeted to treat COVID-19. A team at Iowa State University is tapping evolutionary models with public genomic data sets to check the relationships between strains of COVID-19 to know how they mutate and spread. And scientists at Emory University are developing a web-based tool — tmCOVID — to extract and summarize key concepts in scientific studies on COVID-19.
Nvidia says that 14 of the Consortium’s projects have consumed over 3 million GPU hours on the Nvidia-powered Summit supercomputer at Oak Ridge National Laboratory. Summit is that the world’s fastest supercomputer, as ranked by the highest 500 lists of supercomputers. And it also offers its own 20,000-GPU infrastructure — Saturn V — which the company’s researchers are primarily using to optimize COVID-19 research applications
Nvidia has been using excess cycles on SaturnV to run [email protected], a distributed computing project that simulates protein dynamics in an attempt to assist develop therapeutics for various diseases, including COVID-19. it’s assisted in matching researchers to supercomputers supported each researcher’s specific requirements.
Quantum chemistry and virtual screening
In partnership with Microsoft, Nvidia is functioning with the University of California, Riverside on quantum chemistry solutions that enjoy GPU optimization. the amount of possible COVID-19 inhibitors are immense, and completing experimental studies on all the candidates is both infeasible and cost-inefficient. The hope is that the project’s predictive, GPU-enabled simulations — which spend to 800,000 GPU hours on Azure — will provide guidance for efforts that narrow in on the foremost promising candidates.
According to Nvidia, in but every week its experts helped project lead Bryan Wong’s package research code using HPC Container Maker, the company’s open-source tool that ships with 30 containerized HPC applications. and that they tapped Nvidia’s Nsight debugging tool to develop a fix for an onerous bug — making it possible to accomplish work scheduled to require 800,000 GPU-hours in 300,000 GPU-hours for a savings of $500,000.
At Carnegie Mellon University, a team led by Olexandr Isayev worked with Nvidia to use AI approaches to the task of high-throughput virtual screening, which uses algorithms to spot bioactive molecules. Unlike traditional scientific simulations, which take a brute force approach to problems by attempting to simulate every possible combination of molecular interaction, AI makes educated guesses that reduce the number of combinations to be simulated. This results in theoretically faster candidate drug discovery (and quicker field trials). Isayev estimates that it’d be the maximum amount as 1,000,000 times faster than usual mechanical calculations.
The first step within the process is using AI to research a library of molecules that will be purchased from chemical companies, preparing them for screening in simulation. the simplest candidates from the screening will then be simulated using AI-enhanced molecular dynamics, and top hits from the ultimate screening are going to be tested in partner laboratories.
After their work, Isayev and colleagues decide to deposit their data sets within the open-source COVID-19 data lake, a centralized repository of curated data sets maintained by Amazon’s AWS division in hopes that other researchers will enjoy them.
IBM VP of technical computing Dave Turek says COVID-19 research continues with partners across the spectrum — on machines powered by its hardware and within laboratories and institutions it’s relationships with. “Without any large contracts or anything of the type, [the Consortium] came together during thanks to both share resources and manage a process of expediting the scientific proposals that came into the consortia and match it to the simplest resources,” he said during a statement. “The teams are making rapid progress, and these supercomputing-powered projects are using novel approaches to understanding the virus.”
For example, IBM researchers at the Hartree Centre in Daresbury, England partnered with the University of Oxford scientists to mix molecular simulations with AI in discovering compounds that would be repurposed as anti-COVID-19 drugs. Using Summit and therefore the Texas Advanced Computing Center’s (TACC) Frontera, the fifth-fastest system per the highest 500, the team says they’re accomplishing months of research during a matter of hours.
Generating molecular compounds
With the assistance of IBM, researchers at the University of Utah tapped the National Center for Supercomputing’s Blue Waters and TACC’s Longhorn and Frontera to get quite 2,000 molecular models of compounds relevant for COVID-19. They ranked the models supported the molecules’ field energy estimates, which they theorized could help scientists design better peptide inhibitors of an enzyme to prevent COVID-19.
The team investigated the structure of the virus’s main protease, an enzyme that breaks down proteins and peptides, in complex with a peptide inhibitor called N3. They then applied an approach developed to spot Ebola-stopping molecules that involve molecular dynamic simulations and optimization of specific structures. This enabled the COVID-19 protease to interrupt down a series of comparable, easy-to-detect probes that had already been designed, serving because of the basis for assessments that test the inhibitors’ effectiveness.
The work built on a body of data about how the P.E. generated by atoms can provide a molecule a positively or charged “force field” that draws or repels other molecules. Using AMBER, a molecular dynamics code, the researchers observed experimental results within one hundred-millionth of a centimeter, a measure imperceptible to all or any but the strongest microscopes.
The University of Utah’s Schmidt lab will later transform the peptide leads into biopharmaceutical scaffolds called circular modified peptides. “Our hope is that we discover a replacement peptide inhibitor which will be experimentally verified within the next few weeks. then we’ll engage in further design to form the peptide cyclic to form it more stable as a possible drug,” University of Utah professor and research lead Thomas Cheatham said during a statement.
Mapping how COVID-19 spreads
It’s well understood that COVID-19 spreads via virus-laden droplets, which are transported around environments by air-con units, wind, and other sorts of turbulence. But transmission mechanism rates remain a topic of contention, and a few experts say gathering useful evidence of transmission mechanism could take years and price many lives.
In a safer pursuit of clarity, scientists at Utah State University, Lawrence Livermore National Lab, and therefore the University of Illinois shall use the Consortium’s supercomputing resources to review person-to-person transmission of airborne respiratory infectious diseases like COVID-19. they’re performing from the hypothesis that aerosolized droplets from human airways contaminate rooms more quickly than initially assumed. They’ll leverage high-fidelity multiphase large-eddy simulations (LES) — mathematical models for turbulence utilized in computational fluid dynamics — running on IBM hardware to work out cloud paths in typical hospital settings.
The short-term aim is going to be to know how long a cloud persists and where the particles settle, which could inform non-pharmacological techniques to scale back the spread. “The [goal] of this study is to fundamentally improve our understanding of the person-to-person transmission of airborne respiratory infectious diseases,” wrote the researchers during a statement. “Our findings will [make] it safer for health care professionals.”
Studying genetic susceptibility
Beyond isolating COVID-19-killing compounds and mapping the viral spread of the virus, researchers are trying to define risk groups by performing genome analysis and IBM supercomputer-enhanced DNA sequencing.
A team of scientists affiliated with NASA has observed that COVID-19 appears to cause pneumonia, triggering an inflammatory response within the lungs called acute respiratory distress (ARDS). to check this, they decide to use the supercomputer at NASA’s Ames research facility, which can sequence the genome on patients who develop ARDS and people who don’t.
If all goes well, the team believes their study will end in practical tools for predicting which COVID-19 patients are likely to develop ARDS, and thus which patients are likely to wish intensive support before the emergence of severe symptoms. Such tools could help guide medical care resource usage for the sickest patients, and enable health care workers to raised manage ongoing treatment.
Intel is actively involved in the design, development, and deployment of several Consortium-affiliated supercomputers, also because of the upcoming Aurora at Argonne National Laboratory in Chicago. the corporate says it’s a staff of engineers performing on code optimizations for HPC applications that include LAMMPS (a molecular dynamics code), Gromacs (a package for protein, lipids, and nucleic acids simulation), NAMD (another molecular dynamics code), AMBER, et al. Intel is additionally sharing tools, architecture knowledge, and software with partners to reinforce COVID-19 applications and scale their performance on Intel-based hardware.
One specific area of focus for Intel maybe a collaboration with NAMD to release a version of the code that gives faster simulations on Xeon processors that support AVX-512. the corporate says the many performance boosts will allow researchers to realize longer timescales within the simulation of relevant molecules related to COVID-19, by extension enabling them to raised understand aspects of virus infection with “atom-level” detail. The update is predicted to be made public for early use in June.
Hewlett Packard Enterprise
Some of Hewlett Packard Enterprise’s (HPE) work is completed through the Consortium, while the remainder is concentrated on a variety of consumers and partners. As a result of its acquisition of Cray in September 2019 for about $1.3 billion, HPE claims it now has more supercomputers and HPC systems in use than any leading research facility.
“High-performance computing is more powerful today than it’s ever been, and its massive computing power — alongside other advanced capabilities — has significantly transformed drug discovery,” said Peter Ungaro, former Cray CEO and head of HPE’s HPC and mission-critical systems group, during a statement. “Supercomputing and HPC systems unlock greater potential for AI and machine learning applications, and when applied to 3D modeling and simulations, dramatically [accelerate] time-to-insight and [increase] scientific outcomes. Our work within the consortium provides the researchers with HPC capabilities they wouldn’t normally have access to independently to assist fast-track the invention of a cure for the pandemic.”
Drug design research
In partnership with Microsoft, HPE is functioning with a team at the University of Alabama in Huntsville (UAH) to provide its Sentinel supercomputer through the Azure cloud. With the supercomputer, alongside a team of dedicated HPE experts, it’s supporting various stages of the drug design process at UAH.
The researchers are employing a molecular docking approach, a sort of bioinformatic modeling that involves the interaction of two or more molecules to yield a stable combination. Drawing on an outsized, open set of natural products found in plants, animals, fungi, and therefore the sea, Sentinel is performing calculations to work out how natural compounds interact with COVID-19’s protein. Previously, 20,000 molecular dockings might be improved against a protein target in seven or eight minutes, versus the complete 24 hours, it wont to take. Now, the research team can perform as many as 1.2 million molecular dockings per day.
Elsewhere, HPE is supporting work on the Lawrence Livermore National Laboratory with the Theta supercomputer, which is housed at the Argonne Leadership Computing Facility. The researchers’ goal is to use AI to accelerate the method of simulating billions of molecules from a database of drug candidates. They’ve narrowed down the number of potential candidates from 1040 to a group of 20, and they’ve tapped Catalyst — an HPE-powered HPC cluster that generates predictions like experimental and structural biology data — to enhance outcomes and speed up discovery.
HPE is additionally collaborating with France’s National Center for a research project and GENCI to arm scientists at the Sorbonne University in Paris with GENCI’s Jean Zay supercomputer, which HPE designed. The team is using Jean Zay to optimize the Tinker-HP software, an approach to parallel computing enabled by multiple graphics cards and designed to simulate at the extent of atoms for giant biological molecules. Tinker-HP simultaneously performs a variety of data-intensive calculations to make 3D simulations of molecular interactions faster and at high resolutions than would rather be possible.
Contributions from the private sector
The nature of the Consortium’s work isn’t strictly academic. Startups hope to use the group’s vast computational resources to develop treatments, molecular designs, and medicines targeting COVID-19.
Kolkata-based Novel Techsciences is identifying phytochemicals from the quite 3,000 medicinal plans and anti-viral plant extracts in India which may act as natural drugs against COVID-19. The team also plans to isolate plant-derived compounds that would help tackle multi-drug resistance that arises because the coronavirus mutates, to develop a comprehensive prophylactic treatment regime.
In London, Y Combinator-backed PostEr is overseeing the Moonshot Project, which aims to supply inhibitors supported over 60 fragment hits (i.e., molecules validated to bind to a target protein, making them a chemical start line for drug discovery) that are isolated in experiments to work out the molecular structure of COVID-19. By running machine learning algorithms within the background to triage suggestions and generate synthesis plans, PostEra has identified around 21 highly effective volunteer-submitted molecular designs, which can be synthesized by chemical company Enamine. The results of this project are going to be tested on animals in months.
If successful, PostEra’s would be one of the primary drugs developed in an open-source fashion. “[Machine learning] can reduce the time to work out optimal ways to form these compounds from weeks to days,” the corporate said during a statement. “[We believe] the worldwide scientific community [can suggest] drug candidates which may bind to, and neutralize, [COVID-19].”
Another private sector project is led by London-based AI startup Kumano. This team’s intention is to realize insights from diseases that are almost like COVID-19 — mainly other coronaviruses — to style an efficient COVID-19 drug. This effort relies on a genetic algorithm that searches the chemical space surrounding existing antiviral drugs and a deep learning-based classification model built on existing binding data. the corporate is combining those tools with docking and molecular dynamics simulations to reinforce the results and yield machine learning models which will be wont to score molecular designs for synthesis as antiviral compounds.
As for AI and drug development startup Innoplexus, it’s also working with the Consortium’s supercomputers to accelerate the invention of molecules that would cause a drug to combat COVID-19. It expects to run permutations on five promising candidates — specifically, candidates that are potent, non-toxic, and may be manufactured.
Despite the very fact that much of the work remains within the early stages, momentum around the Consortium appears to be accelerating.
Last month, IBM announced that UK Research and Innovation (UKRI) and therefore the Swiss National Supercomputer Center (CSCS) will join the Consortium, making available machines that include the University of Edinburgh’s ARCHER; the Science and Technology Facilities Council’s DIRAC; the Biotechnology and Biological Sciences Research Council’s Earlham Institute; and Piz Daint, the sixth-ranked supercomputer within the world, consistent with the highest 500. The new additions have brought the entire available petaflops up to 483 from 437 in May and 300 in mid-March.
“The COVID-19 HPC Consortium … is that the largest public-private computing partnership ever created. What started as a series of phone calls […] five days later, quite twenty-four partners came on board, many that are typically rivals,” said IBM’s Turek. “Without any large contracts or anything of the type, this group [is coming] together during thanks to both share resources and manage a process of expediting the scientific proposals that came into the consortia and match it to the simplest resources.”