The Stanford Research Computing Center (SRCC) is a joint effort of the Dean of Research and IT Services to build and support a comprehensive program to advance computational research at Stanford. That includes offering and supporting traditional high-performance computing (HPC) systems, as well as systems for high throughput and data-intensive computing. The SRCC also helps researchers transition their analyses and models from the desktop to more capable and plentiful resources, providing the opportunity to explore their data and answer research questions (on-premise or in the cloud) at a scale typically not possible on desktops or departmental servers. Partnering with units like ICME as well as the NSF XSEDE program and select vendors, the SRCC offers training and learning opportunities around high-end computing tools and technologies.
In addition, we offer for-fee bulk research storage (the Oak platform - currently at 5 PB), and systems engineering, administration and support for faculty research systems, servers, and clusters on an annual contract basis.
Stanford Research Computing Resources
Need access to compute resources beyond your desktop to support your sponsored or departmental research? You may want to try out the Stanford Sherlock cluster.
Purchased and supported with seed funding from the Provost, Sherlock comprises 127 compute servers and associated storage. Those 127 servers are available to run researchers' computational codes and programs, with resources managed through a fair-share algorithm using SLURM as the resource manager/job scheduler.
Faculty can also purchase additional dedicated resources to augment Sherlock by becoming Sherlock "owners". Choosing from a standard set of server configurations supported by the SRCC staff, owners' servers are "joined" to the base Sherlock cluster. "Owners" have access to the base cluster as before, through fair-share. But they also have priority access to the resources they purchased, whenever they want. When an owner's servers aren't in use, other owners can use them ... but non-owners cannot. The base Sherlock configuration of 125 servers was set up in June 2014. Since then Sherlock has grown to 1,200 compute nodes, 21,200 CPU cores, 628 GPUs and 1,290 TFlops of computing power used by 500 Principal Investigators and their 3000 research team members.
You can learn more about Sherlock by visiting http://www.sherlock.stanford.edu .
But what else might you use, beyond Sherlock?
There are a variety of compute clusters run by Stanford schools and departments. For example, the Stanford Research Computing Center (SRCC) manages HPC clusters for the Stanford Center for Genomics and Personalized Medicine, the Army HPC Research Center, and the School of Humanities & Sciences, as well as for individual PIs or labs. If you are from one of those units, drop us a note at email@example.com and we can get you started. If you are from another school or group at Stanford and need help, we can suggest options and talk to you about our services.
The SRCC also offers access to a shared campus HPC resource, FarmShare. Open to anyone with a SUNet ID, FarmShare is intended to be a short-term, low-intensity computational resource for students, courses, and researchers who are just getting started with computing. FarmShare is the resource to use for classes or instruction; Sherlock is limited to research only. See farmshare.stanford.edu for the details on how to try it out.
To provide a home for your research compute servers and disk storage arrays, Stanford offers a modern, state-of-the-art data center, the Stanford Research Computing Facility (SRCF). A Stanford building located on SLAC’s land, the SRCF provides a highly efficient hub for the physical hosting of high density compute and storage equipment, along with systems administration and support services. The SRCF opened for production use in November 2013. For more information on the new facility see the section below.
In addition, Research Computing currently hosts and provides system administration services in a smaller, secure, centrally-managed data center in Forsythe Hall (RCF). As equipment in the RCF is life-cycled, replacement servers will be housed at the SRCF, returning the Forsythe space to non-research computing use.
Contact us at firstname.lastname@example.org if you would like to explore hosting your new equipment at the SRCF and/or if you want to know more about our services and offerings.
Many program announcements for grant proposals require you to provide a description of local compute capabilities and facilities. We can help you out! Until we get that information posted, drop us a note at email@example.com and we can provide the needed text, tailored to your specific proposal.
The Stanford Research Computing Facility (SRCF) provides the campus research community with data center facilities designed specifically to host high-performance computing equipment. Supplementing the renovated area of the Forsythe data center, the SRCF is intended to meet Stanford’s research computing needs for the coming years. A Stanford building located on the SLAC campus, the SRCF was completed in the fall of 2013, with production HPC services being offered as of December 2013. The facility and services therein are managed by the Stanford Research Computing Center (SRCC).
Space and Power: The SRCF has 3 megawatts of power and can host 150 racks. While this implies an average rack density of 20kW, the infrastructure can support higher-density compute racks with power consumption requirements from 20 to 100 kW each. Of the estimated 150 racks, 25 compute racks will be for SLAC, 50 for the School of Medicine, and 75 for Stanford’s non-formula schools.
The SRCF has a resilient but not redundant power infrastructure. The transmission grade power, delivered to SLAC and the SRCF, is UPS and generator protected, providing significant assurance should there be a regional power outage.
Cooling: The building’s design is non-traditional and especially energy efficient. The facility is cooled with ambient air fan systems for 90% of the year. For the hotter days and for equipment needing chilled water, high-efficiency air cooled chillers are available.
Network Connectivity: The SRCF has multiple redundant 10 gigabit networks linking it to the campus backbone, the Internet, Internet2 and other national research networks. In the fall of 2014, 100 gigabit network connectivity was added between the SRCF and external networks. That bandwidth, coupled with the use of OpenFlow communications protocol (developed at Stanford) will provide unprecedented flexibility and capability in meeting the network transport needs of the research communities using the facility.
Three service models are supported at the SRCF.
- Hosting: a researcher purchases his/her own rack, PDUs and equipment and works with the SRCC to coordinate installation timing and access. The researcher is responsible for the management and system administration of the equipment. Equipment must be replaced with new equipment, or removed from the facility, before or when the equipment is 5 years old. Note that some schools, such as H&S, have purchased empty racks and PDUs on behalf of their faculty, recognizing that not all researchers will purchase entire racks of equipment at one time.
- Supported cluster: a researcher purchases his/her own rack, PDUs and equipment, and works with the SRCC to coordinate installation timing and access. The researcher pays the SRCC to provide system administration and support. Equipment must be replaced with new equipment, or removed from the facility, before or when the equipment is 5 years old.
- Shared cluster: The Provost provided the SRCC with capital funding to purchase computing equipment to encourage faculty to use the SRCF and the shared SRCC cluster model. This incentive represents access to additional HPC resources beyond those funded by grants and may greatly expand researchers’ computing capacity. The cluster purchased with those funds, Sherlock, is available for the use of any Stanford faculty member, and associated research teams, for his/her sponsored or departmental research. The base configuration of 125 servers is shared by all. Beyond using the base Sherlock platform, researchers can use their grant funds to add more servers and storage, choosing from a standard set of configurations. Purchased and managed by the SRCC, these PI-funded servers become part of the Sherlock cluster, but not available to the entire user base. PIs who follow this model are referred to as "owners". Owners have access to the servers they purchased, but they also can use other owners' servers when they are idle. At the present time, system administration and support of all components of the Sherlock cluster - whether base servers or owners' servers is funded by the Dean of Research and Provost. In the future, modest fees may be charged for system administration and support.
Note that the SRCF has been designed for hosting high-density racks. Toward this end, vendor pre-racked equipment is the preferred method for deployment. Hosting preference will be given to those researchers with high density, full racks of equipment, in order to make the best use of the resources.
SRCC Service and Facility Features
Assistance in specifying equipment, negotiating pricing, coordinating purchases and planning deployment into the data center
Technical specifications and boiler-plate facility descriptions for inclusion in proposals
Secured 24x7 entry
Monitored temperature and environmental control systems
Fire detection and fire suppression
For more information, contact Ruth Marinshaw, firstname.lastname@example.org
The Stanford Research Computing Center (SRCC) partners with ICME and the NSF XSEDE project to offer a variety of training opportunities around HPC technologies, methods, and tools. Some of the areas covered in previous training are SAP HANA, CUDA, GPU basics, Python, Introduction to Stanford HPC resources, MPI, and Intel HPC Tools.
For more information, contact us at email@example.com .
100G Network Adapter Tuning
This will detail suggestions for starting from a default Centos 7.2 system to a tuned 100g enabled system. Currently this documents the use of both flavors of the Mellanox 100g NIC, Mellanox ConnectX®-4 VPI-- MCX455A-ECAT (1 port) or MCX456A-ECAT (2 port) and Mellanox ConnectX®-4 EN-- MCX415A-CCAT (1 port) or MCX416A-CCAT (2 port).
Click here for the full article.
External High Performance Computing Resources
Campus resources are varied and are growing. They may well meet your needs. But what if you need more compute, need to run larger jobs, need to manipulate more data than local systems can accommodate? The SRCC can help link you to national compute resources, such as the OpenScience Grid and the NSF-funded XSEDE program. Or help you decide whether cloud computing could/should be in your portfolio of computing platforms. Contact us at firstname.lastname@example.org and let's start the conversation.