SRN Data Center Network Connectivity
This page contains information for system administrators and LNAs who have (or are supporting) equipment in the SRCF or RCF. The Stanford Research Network (SRN) at the core of the research data centers is a major leap forward in campus network architecture, something that is not yet present on the rest of campus. It has been multiple years in the making, and represents a multi-million dollar investment in both equipment and expertise, all towards the goal of keeping Stanford's research networks at the forefront of performance. As a user of this core network, it is worth knowing about some of the technologies in use in the SRN, and how it affects your uplinks, and your traffic.
This page starts with a description of the overall network architecture, and how your top-of-rack switch will connect to it. Next, it describes the two options you have for connecting your equipment to the network. If you decide to provide your own network equipment, this page explains what bandwidth options are available, the hardware you need to provide, and how you need to configure your top-of-rack switch.
New Network Architecture: VXLAN and EVPN
The Stanford University Network (SUNet) has gone through multiple iterations through the years, with the goal of meeting the needs of Teaching and Research. These needs have resulted in situations—like having a VLAN span multiple buildings—that would not normally be seen in most Corporate environments. In the more-recent past, this need was met with proprietary Cisco™ FabricPath™ technology. Unfortunately, this led to campus network problems, as the proprietary nature of FabricPath meant that issues would be harder to diagnose, and take longer to resolve.
By contrast, the SRCF and RCF core networks have only had one architecture, in which the SRCF and RCF core switch pairs existed as their own network, separate from campus. VLANs were created within the SRCF/RCF core, and routed by a pair of routers, also distributed between SRCF and RCF. If a VLAN needed to be stretched between the research data centers and campus, it was sent through a pair of 10-Gigabit connections to the MOA (School of Medicine) network area. Any networks that needed to be firewalled were routed by an on-campus firewall. Only a few changes to the initial iteration were made, namely the refresh of the SRCF/RCF firewalls, bringing them on par with campus network firewalls, allowing production research traffic to move off of campus.
Now, the Stanford Research Network (SRN) provides the SRCF (both SRCF1 and SRCF2) and RCF with the newest in redundant, load-balanced network connectivity. This is accomplished using four technologies:
- VXLAN (Virtual Extensible LAN) creates a dedicated VLAN space for the SRN. In the core network, "leaf switches" connect to the top-of-rack switch, and send & receive network traffic (Ethernet frames) on one or more VLANs. By wrapping all received network traffic inside UDP packets and attaching a VXLAN header, a VLAN's traffic is turned into routable IP traffic, and the number of available VLANs is extended. Instead of 4,096 VLANs (of which over 2,500 are in use on campus), the theoretical limit is approximately 16 million. With Network Engineering's conventions, the SRN has approximately 4,000 VLANs available to use.
- EVPN (Ethernet VPN) provides the resilience originally intended by FabricPath™. Once Ethernet frames have been 'promoted' to IP traffic, they are routed from the leaf switch to the destination (either another leaf switch serving the VLAN, or a router or firewall). Participating core network devices (core switches and routers) are connected to each other with point-to-point links, and use multiprotocol BGP (MP-BGP) to exchange information on what leaf switch (or router or firewall) has which MAC address on which VLAN. EVPN also includes loop-prevention measures, so that Spanning-Tree Protocol (STP)—and its associated delays—are not present in the core.
- EVPN Multihoming is what allows a top-of-rack switch to connect to multiple core switches, for redundancy. In the core, the ports connected to the top-of-rack switch are assigned an identical Ethernet Segment Identifier (ESI), which is communicated to other core switches serving the VLAN. Both the top-of-rack switch and the core switch speak LACP to each other, and traffic going to a particular ESI (top-of-rack switch) will be routed to whichever leaf switch is available. When there are multiple leaf switches available to reach a top-of-rack switch (which is the case in normal operations), Equal-Cost Multi-Path (ECMP) routing is used to send traffic over whichever link has the least load.
With those three terms defined, we can take the diagram from the top of this page and drill down into the "SRCF/RCF Core". It looks like this:
As soon as your network traffic leaves your top-of-rack switch and enters a leaf switch, it is 'wrapped' in a VXLAN packet and routed to its destination:
- Traffic going to another leaf switch in the same data center travels to a spine switch, then to the leaf switch, for unwrapping and delivery to the appropriate top-of-rack switch.
- Traffic going to a leaf switch in another research data center travels up to a SRN router, then to the other data center, for unwrapping and delivery to the appropriate top-of-rack switch.
- For VLANs that are being routed on campus, traffic travels to the SRN router, which unwraps it and sends it to an Interconnect Gateway (IGW). The IGW sends it to a corresponding IGW on campus, towards its destination.
- Traffic leaving your firewalled subnet travels to the SRN router, which unwraps it and sends it to the RC firewall. The traffic then goes back through an SRN router, towards its destination via an aggregation switch.
- Traffic leaving your unfirewalled subnet travels to the SRN router, which unwraps it and sends it towards its destination via an aggregation switch.
From this, we can see how measures have been taken to reduce the impact of a single piece of equipment failing. It is also clear that traffic is routed most efficiently when the SRN router (for unfirewalled networks) or the RC firewall (for firewalled networks) is responsible for 'owning' a particular subnet.
For traffic reaching the SRN routers, the next hop depends on the destination:
- Traffic bound the campus side of a "stretched" VLAN is directed towards one of the two SRN IGWs, for forwarding to the campus side of the VLAN.
- Traffic bound for campus goes to the two crossbar routers on campus, for routing to the appropriate campus area.
- If the traffic is coming from a shady (NATed) network, and is going to the Internet, it goes to campus for NATing.
- If the traffic is going to a user on the VPN, it goes to campus and the VPN gateways.
- For all other traffic, it goes directly to the border routers, to one of Stanford's Internet Service Providers.
The aggregation switches connect the SRN routers to the crossbar routers on campus, and also the ISP routers. With this, we can see how the network has been architected to send outgoing traffic to the Internet as quickly as possible. The only traffic sent back to campus is traffic which needs to go back to campus, either because it is VPN or NAT traffic (and the VPN & NAT gateways are on campus), or because the recipient of the traffic is on campus.
All network equipment is active-active, with the exception of the firewalls: Since the firewalls need to inspect connections (including through the entire TCP setup and teardown process), the firewalls are active-passive. However, the firewalls have a separate out-of-band link for state synchronization, so most IP connections remain up during a firewall failover.
All of these protocols and routing loads require appropriate hardware to handle it. Juniper QFX10002 switches are used for spine and leaf switches, with Juniper MX10003 routers, Palo Alto Networks 5250 firewalls, and Cisco 9336C aggregation switches. This architecture also required careful examination during the design & implementation phases, as well as training for ongoing operation and maintenance. University IT Network Engineering collaborated with Juniper Professional Services on the architecture and design, as well as reviewing the implementation and providing training on the technologies in use.
From this summary, we can see how the SRN architecture has leapfrogged the problematic proprietary FabricPath architecture to a modern, standards-based architecture.
Your Network Architecture
Even a single rack, with a single top-of-rack switch, has a network architecture. It does not need to be complicated, but you do need to think about how the network will work in your rack. The Client Network Architecture Guide exists to guide data center clients through both the switch requirements and network design. The rest of this page talks solely about switch requirements.
Connection Options: Rent a Switch, or Bring Your Own
When you get a rack at SRCF or RCF, you will be allocated two ports on the core network. You may either provide your own "top-of-rack" switch, or you may rent one from University IT.
The available rental options and their prices are described on the Net-To-Switch Rates page. All of the Data center switches options are available for you to rent. The standalone switches provide 1, 2.5, 5, and 10 Gigabit Ethernet over Category 6 twisted-pair cable. The Infrastructure switches provide 10 Gigabit Ethernet over fiber or DAC.
When you rent a switch from University IT, all of the work is handled for you. The only thing you have to do is buy network cables and plug them in. The LAN Engineering and Installation & Maintenance groups will take care of installing the switch, connecting it to the core, configuring it, and monitoring it. The monthly cost includes business-hours configuration, and 24x7 monitoring and replacement.
If you are interested in renting a switch, talk to your LNA to place an order. If you are looking for a higher-bandwidth switch (25, 40, or 100 GbE), talk to your LNA, to see what non-public options are available.
If you decide to bring your own top-of-rack switch, read this page and the architecture guide completely. Your top-of-rack switch must be able to support the following functionality:
- LLDP for port information, including at least the Chassis ID, Port ID, Port Description, and System Name TLVs.
- LACP (802.3ad) port aggregation. Not static aggregation! LACP active mode is required, and support for LACP Fast is preferred.
- 802.1q VLAN tagging/trunking, even if you are only using a single VLAN.
- Fiber transceiver diagnostics, including reading the transmit & receive power (in dBm) for all four wavelengths.
Also, if your switch supports the following features, it must be possible to disable them:
- Forward Error Correction (FEC)
Your switch model also needs to support the appropriate optical transceiver module for your given bandwidth. Specific transceiver types are described in the next section.
If you decide to bring your own top-of-rack switch, it is your responsibility to ensure that your switch and optics meet the requirements (and 'non-requirements') above. Simple unmanaged switches will not work at SRCF. Some lower-end managed switches will also have problems, but others are fine.
Bandwidth and Fiber
If you choose to provide your own top-of-rack switch, two ports will be needed to connect to the core switch, and you will need to provide two transceiver modules.
You will be provided with two 100 Gigabit ports:
- 100 Gigabit: Your switch will need two QSFP28 ports, and you will need to purchase two 100GBASE-LR4 QSFP28 modules.
We suggest purchasing switches which support either 100- or 400-Gigabit uplinks, with the ability to run 100-Gigabit now, and 400-Gigabit later. This will prepare you for the future version of the Stanford Research Network. More details are in the Client Network Architecture Guide. As always, check your switch's documentation to ensure you are using supported transceivers. We cannot help with parts support issues.
What happened to 10 & 40 Gigabit?
As UIT Networking prepares for the next generation of the Stanford Research Network, the focus is moving from 10-Gigabit to 100-Gigabit and beyond. For that reason, slower-speed connections to the SRN are being deprecated, and new connections must be 100-Gigabit. You are free to continue using 10-, 25-, and 40-Gigabit connections inside your rack, but connections to the Stanford Research Network will need to be 100-Gigabit.
If you rent a switch from UIT Networking, and that switch's uplinks are not 100-Gigabit, UIT Networking will take responsibility for updating the switch, when the time comes.
Fiber
Two OS2 singlemode fiber-optic cables will need to be run from the core switches to your top-of-rack switch.
At SRCF, we will run and label the fiber you purchase for free. At RCF, Hosting Services will supply the appropriate cable, run it, and label it for approximately $250 (charged to a PTA).
Next Steps
Assuming you have committed to getting rack space in one of our Research Data Centers, here are the next steps:
- Decide how many switches you need, and if you will rent from University IT, or purchase your own.
- If purchasing your own switches, pick something which meets the top-of-rack switch requirements, and order them. Remember to consult the Client Network Architecture Guide.
- Using our guide as a reference, decide on a network architecture. That includes the number of VLANs, and the number and size of subnets. This is also where you decide if you want to stretch one or more VLANs to Campus.
- For LNAs: When submitting a request for an un-firewalled network or VLAN, remember to ask that it be placed into the Research VLAN Area. Firewalled networks should go on the rc-srtr firewall. If the VLAN will need to be stretched back to campus, talk to us first.
- Once you know your rack ID, send a fresh email to SRCC Support, asking for the allocation of two SRN leaf switch ports. In your request, let us know the Rack ID, the bandwidth, and the VLANs.
- Create/Update NetDB Nodes, submit firewall rule requests, and then move in to your new rack space and network!
If you are not able to complete these steps yourself, and your School cannot help, we may be able to provide assistance. The Stanford Research Computing Center and the Technology Consulting Group have experience in network & cluster design and implementation. Assistance with Steps 1 through 3 will typically involve charges at the University IT Time and Materials rate. Assistance with Step 5 would normally be provided as part of an ongoing support contract.
Assistance with Step 4 is provided free of charge.