With the Shelter-in-Place order came the shutdown of many Stanford research labs on campus, disrupting the in-lab work many faculty and their research teams were conducting and limiting their access to critical tools and instruments.
As a result, many researchers shifted their energy into computational analysis, work they could do remotely with the help of the Stanford Research Computing Center (SRCC) resources and support.
Since mid-March, SRCC has experienced a significant spike in requests by researchers to use shared and secure compute clusters, as well as to receive support from the technical and research data architects and consultants who make up the Researching Computing team.
Specifically, SRCC has responded to more than a 30 percent increase in support tickets, a six-fold increase in attendance at a virtual training session on shared compute clusters, and collaborated with more – and with record speed – third parties on data transfers than ever before.
Additionally, the SRCC team has supported the Stanford research community on 32 new COVID-19 research projects, half using the Sherlock shared compute cluster and half using Nero, a compute environment which enables research on High Risk and Protected Health Information (PHI) data.
“Many researchers now at home pivoted to doing analysis; those who are newer to computation have required more assistance and training from team members,” said Ruth Marinshaw, chief technology officer of Research Computing. “But even when life returns to ‘normal,’ we anticipate the demand for computation will remain because the game for many researchers has changed.”
Benefits of shared compute clusters
For Stanford researchers, there are numerous benefits to leveraging shared compute clusters, which can be thought of as a set of servers or computers all linked together. These clusters reduce stress for researchers by allowing them to focus on their science, and leaving the IT administration of operating, maintaining, patching, and updating the servers to the SRCC team. Sherlock and Nero are the two shared compute clusters designed specifically for Stanford faculty and their research teams, and they come with ongoing support from the SRCC team, including onboarding, training, and weekly office hours.
“These pre-built compute environments make it easier for researchers to run large data analyses projects in a collaborative way utilizing computing power and storage space that far exceed the limits of their individual laptops,” said Mark Piercy, technical liaison for Research Computing.
Together, Sherlock and Nero are supporting over 900 faculty members and their research teams – which includes more than 5,000 users – on countless projects. Among those are new projects focused on COVID-19 research that span a wide range of scientific areas.
“What’s cool about these projects is that they cut across disciplines and involve people from different parts of the research community, who are all coming together around the virus,” said Kilian Cavalotti, technical lead and architect for High Performance Computing.
Here’s a look at two of those projects.
When the pandemic first began to take hold in the United States, the Sherlock team quickly set aside servers on the cluster for COVID-19 specific research, and offered researchers these computational resources for free. Within a few weeks of notifying the research community of these dedicated resources, the Sherlock team received requests from principal investigators (PIs) for about 15 new COVID-19 research projects.
“When the magnitude of the pandemic started to become apparent, the first questions that came to mind were: How can we help? How can we contribute to the worldwide effort that is taking place to combat this disease?” said Kilian Cavalotti, technical lead and architect of High Performance Computing (HPC) for Stanford Research Computing. “The one thing we had at our disposal was computing power, so it quickly became clear that we had to dedicate at least some of that power to COVID-19 research.”
Cavalotti added: “Making sure these researchers could access our dedicated resources without having to get in line with other computing tasks on Sherlock has been a tremendous help in giving their critical work the priority that the situation imposes.”
Researchers in the School of Medicine were among those who used the processing power of Sherlock for a COVID-19-specific study. They investigated how COVID-19 affects the immune system cells of severely affected patients by analyzing – on Sherlock – single-cell RNA sequencing of patient blood samples. This helped to provide a cell atlas of the peripheral immune response to severe COVID-19 cases. The study, led by Associate Professor Catherine Blish, was published last month in the Nature Medicine journal.
"Our lab relied on Sherlock's dedicated reservations for COVID-19 research to analyze next-generation sequencing data that helped us better understand how the immune system responds to COVID-19,” said Aaron Wilk, a researcher on the project. “These resources enabled us to perform these analyses faster, and ultimately communicate our results to the public much quicker. In the midst of this urgent health crisis, this time has never been so valuable."
Since the start of the outbreak, the Nero team has been busy responding to an increasing need to facilitate and automate data transfers with third-party vendors on behalf of researchers who need to work together in a secure environment.
“Collaborations with public health officials and outside agencies that normally would have taken months and even years, have taken just weeks,” said Valerie Meausoone, research data architect and consultant with Stanford Research Computing, who focuses on Nero. “Through this experience, we’ve navigated – at an accelerated pace – the challenges of automating data transfers with cloud platforms, solved problems and bug issues quickly, and created secure pipelines for data flows that can be reused and repurposed for future research projects.”
One such project that the Nero team has supported is the National Daily Health Survey for COVID-19, which was launched by Stanford Professor of Radiology Dr. Lawrence “Rusty” Hofmann to predict surges in the virus in the United States. The project’s goal is to learn and predict which geographical areas will be most impacted by the virus based on data that survey takers provide. The survey is open and available to everyone from this website, and participants are encouraged to take the survey daily.
The SRCC team, and specifically the four staff who work on the Nero compute cluster, worked closely with Dr. Hofmann – and in record time – to facilitate data transfers of the survey responses, which contain high-risk data, between a third-party vendor and Stanford.
Specifically, the Nero team set up a Nero Google Cloud Platform (GCP) account for Dr. Hofmann, enabling his team to store and compute on high-risk data compliant with Stanford’s minimum security standards for handling such data.
In order to safely transmit the data between the third-party vendor’s Amazon Web Services (AWS) cloud storage solution and the Nero GCP solution, as well as to facilitate a data pipeline for daily updates on survey data, the Nero team developed a reusable workflow that perform a series of operations needed for this data pipeline. The Nero team used Google cloud functions because these are event-triggered, serverless functions which allow data to be moved and processed as soon as it becomes available.