The standard practice to build Data Center Infrastructure in the old days
Although the popularity and the adoption of public cloud keeps increasing in the recent years, for most of the enterprises, there are still demands and necessities to build and maintain its own on-premises data center infrastructure for specific reasons.
For nowadays on-premise data center design, building data center infrastructure with a primary data center and a secondary data center is a still common industry practice elsewhere in the globe.
For years, traditional Active/Standby network design with primary data center as production site (PROD) and and secondary data center as disaster recovery site (DR) is one of the very common architectures in the industry although there are already technologies that provide fundamental transformation of these traditional architecture. For example, Software-Defined-Network (SDN) introduced a entire network infrastructure design change.
In traditional Active/Standby network design, usually the DR site is a clone of PROD site with very similar or even exact equipment list plus infrastructure architecture except having different IP address range. Usually, the PROD site and DR site are interconnected by WAN link that separates two Data Centers in discrete Layer-3 Routing Domain and Layer-2 Switching Domain. The network segments of these two data centers are usually linked up by intenal routing protocols like OSPF, EIGRP, IS-IS. Generally, scaling Layer-2 Switching Domain across geographical location may create potential network looping due the phenomenon of industry common practice Spanning Tree Protocol. Cisco has provided very clear elaboration on this topic
Spanning Tree Scalability
https://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/DC_Infra2_5/DCInfra_5.html
Why need Data Center InterConnect (DCI)?
Throughout the recent decade, lots of stunning technologies and new applications, example like virtual machines, containers and dockers have been introduced where the demand for extending or stretching individual Layer-2 Switching Domain keeps increasing due to the demand for application availability and resiliency. Some tech giants like Google, Alibaba may even broaden the topic of availability to Site Reliability Engineering. For example, VMware has introduced various virtualization technologies like VMotion in the ESXi and VCenter solution framework for seamless workload failover across Hypervisors or even across geographical locations. The VMotion technology requires spanning a Layer-2 domain between Hypervisors to achieve seamless workload mobility.
Another use case of stretching Layer-2 Switching Domain across data centers is to ensure a smooth and seamless application and infrastructure migration during a Data Center Relocation. This requires an extended Layer-2 Switching Domain to allow stretching the common IP address range. With this stretched segments, workload can gracefully migrate in multiple maintenance windows without any IP address change on server NIC thus minimizing business impact because corresponding security policy elements and network routing changes are keeping the same at all.
The term Data Center InterConnect is introduced as the technology that connects two or more data centers together over short, medium, or long distances using high-speed network connectivity.
Of course having dedicated connectivity for each system that need interconnection between DC is optimal but it will be costly. Instead, building an additional network layer between DC for LAN extension between Sub-System is a more cost effective solution. In the industry, this layer is called Data Center InterConnect. Throughout the decades, different techniques for Data Center InterConnect (DCI) have been evolved and this document will go through different types of DCI in the market with brief elaborations.
Following illustrate different options of Data Center InterConnect
1st Generation Technique — Extension of VLAN (Before 2008)
1. VLAN Trunk Extension by Standard Layer-2 Switch
This is a low-cost solution and the easiest to implement. Normally, it requires deploying a pair of stackable switches in each DC or a pair switches that support Cisco Virtual Port Channel (VPC) or Multi-Chassis Link Aggregation (MLAG) by Arista, Huawei or Juniper with a pair of DWDM / MetroE links for aggregating into Etherchannel for a loop-free architecture.
However, the drawback of this solution is that sharing a common Layer-2 Switching Domain means sharing a common Spanning Tree domain. Therefore, it is inevitable that only one Layer-2 switch between two data centers can be elected as Root Bridge of the Spanning Tree domain that may cause an interruption of the entire Layer-2 Switching Domain during election of a new Root Bridge. In addition, only one Layer-3 gateway device can claim itself as the Active Role of a particular HSRP group.
Another drawback is scalability issue. In case of stretching Layer-2 Switching Domain over three data centers, by nature of Spanning Tree Protocol, one of the paths as shown in the following diagram will be a blocked path. Therefore, traffic between Secondary Data Center and Tertiary Data Center will switch through sub-optimal path.
2. Eliminating overlapped VLAN ID over DCI by Dot1Q Tunneling (QinQ)
Having duplicated VLAN ID within a corporate network infrastructure may not be the best practice. However, in certain situation, having same VLAN ID in corporate network infrastructure maybe inevitable since some of these network segments maybe dedicated platform to link customer on-premise network infrastructure with services providers equipment like Bloomberg, Thomson Reuters etc.
In following example, VLAN 100 from discrete Segment in DMZ and Internal Zone will be mixed together in the common Layer-2 Switching Domain. This may violates certain regulatory or standards for government related or finance institutions.
The alleviate such potential compliance issue, an additional VLAN tagging technique called 802.1Q Tunneling can be implemented between DCI WAN switch for additional segregation of the original VLAN ID by an outer VLAN ID. Before the dominant of Layer-2 VPN Solutions like VPLS or EVPN, 802.1Q tunneling was a technique often used by Metro Ethernet providers as a layer 2 VPN services offering. The technology ground of 802.1Q tunneling is very simple that the provider will put a outer tag (or called 802.1Q tag) on all the Ethernet packets for segregation of traffic between different Layer-2 VPN customers.
To operate WAN connectivity with 802.1Q Tunneling, the system MTU and system jumbo MTU values do not include the IEEE 802.1Q header. Because the IEEE 802.1Q tunneling feature increases the frame size by 4 bytes when the metro tag is added, you must configure all devices in the service-provider network to be able to process maximum frames by adding 4 bytes to the system MTU size.
However, like ordinary Layer-2 Switch Domain stretching, the drawback of this approach is scalability issue. Again, in case of stretching Layer-2 Switching Domain over three data centers, by nature of Spanning Tree Protocol, one of the paths as shown in the following diagram will be a blocked path. Therefore, traffic between Secondary Data Center and Tertiary Data Center will switch through sub-optimal path.
3. Ethernet over MPLS (EoMPLS)
To recall history of telecommunications and data communications, before the popularity of TCP/IP over Ethernet technology, some other connectivity like T1/E1/V.35/ATM/SDN/Packet-over-SONET etc have been in the market for years and these legacy connectivity have dominant in the WAN connection especially cross-border long-haul connection.
Any Transport over MPLS (AToM) transports is a technique running on edge devices like routers (Cisco 7200 Routers, ASR1000 Router, GSR12000 Routers etc), sizeable switches (e.g. Cisco Catalyst 6500) or even Edge CPE devices from RAD Communications — Megaplex-4100 to transport any Layer 2 packets over an MPLS backbone regardless the backhaul connectivity.
EoMPLS is evolved from AToM transport types. EoMPLS works by encapsulating Ethernet PDUs in MPLS packets and forwarding them across the MPLS network. Each PDU is transported as a single packet. Cisco has introduced EoMPLS around 2001 as an intial framework for extending Layer-2 switching domain by EoMPLS as an overlay tunnel on top of underlay networks.
One can imagine EoMPLS is a virtual patch cord between sites. EoMPLS uses the concept “pseudowire” (“PWE”) to connect tunnel interfaces between sites. Usually the CLI command on Cisco devices is “xconnect”. EoMPLS is a simple LAN extension over IP routed networks. There is no multisite framework and no Layer-2 loop-prevention mechanism when it was invented initially.
Further development of new protocol standards have been introduced to improve the limitations of EoMPLS on MetroEthernet industry standard. Virtual Private LAN Services (VPLS) is one the standard that offer better resiliency and availability but the use cases of VPLS are more on MetroEthernet access networks more than DCI.
2nd Generation Technique — Introducing Network Overlay Concept (From 2008)
Throughout the last decade, server virtualization technology is more mature and application workloads or application platform owners demands for more robust, highly scalable and elastic DCI solution from traditional DCI technologies introduced in the early part this paper.
To cope with this market demand, each networking giant like Cisco, Juniper and Huawei have developed its own DCI standard and protocols.
- Cisco Overlay Transport Virtualization (OTV)
Cisco OTV is Cisco proprietary protocol that breakthroughs traditional Layer-2 VLAN stretching by introducing a new concept called MAC-in-IP and MAC Routing. OTV extends Layer 2 connectivity any across any transport network infrastructure given that it could be a Layer-3 connectivity . OTV uses MAC address-based routing and IP-encapsulated packet in UDP protocol forwarding across a transport network to provide support for applications that require Layer 2 adjacency, such as clusters and virtualization like VMware VMotion or Linux KVM Live Migration.
OTV Support
OTV uses Ethernet over Generic Router Encapsulation (GRE) and adds an OTV shell to the header to encode VLAN information. At the time of writing this whitepaper, Cisco ASR1000 series router, ISR4451 router, ISR4461 router, Cisco CSRv virtual router, some Nexus 9000 switch platform and Nexus 7000 switch with all M-series modules can support OTV feature. The OTV encapsulation is 42 bytes. On some platform like Cisco ASR1000 series router, ISR4451 router, ISR4461 router and Cisco CSRv virtual router, OTV packet can be fragmented that underlay transport network does not need to extend the MTU size. However, for the rest of platform like Nexus 7000 switch with all M-series modules, it cannot support fragmented OTV packet thus underlay transport network must be extended with larger MTU size. The encapsulation of the packets sent to the overlay is performed dynamically based on a Layer 2 address destination lookup. OTV can runs on any Layer-3 underlay network regardless of its routing protocol in-use.
Following document illustrates more details about Cisco OTV
Benefits of OTV
- No effect on existing network design
By nature, OTV’s protocol is MAC-in-IP thus it can transport over IP-based network including MPLS, MetroE, DWDM.
- Failure isolation and site independence
OTV has built-in mechanism to isolate broadcast messages across OTV sites thus flooding messages during a network looping is isolated between OTV sites.
- Optimized operations
OTV nodes can establish adjacency / transport through two modes — including Unicast or Multicast Mode. Unicast Mode requires static adjacency setting for each OTV neighbor per OTV nodes. Cisco recommends Multicast mode because of its flexibility and smaller overhead when communicating with multiple sites. If you are planning only two or three sites, then unicast works just as well without losing any features or functions
- Optimal bandwidth utilization resiliency and scalability
By nature, OTV’s protocol is MAC-in-IP and utilize MAC-Routing concept for DCI between sites. OTV can work with Cisco Virtual Port Channel (VPC) technique thus both OTV nodes in a site can forward traffic other sites. With the beauty of supporting multipath routing by Equal-cost multi-path routing (ECMP), OTV nodes utilize all available path between DCI sites over OTV tunnel.
OTV is an overlay technique that requires an underlay routing protocol for interconnecting the tunnel interface IP addresses between sites. With bidirectional forwarding detection (BFD) technique, OTV can support sub-second failover by enabling BFD on routed interface of OTV tunnel.
- Scalability
OTV blocks ARP flooding by MAC-Routing techniques.
For scalability, OTV can connect maximum of 12 sites on Nexus 7000 platform at NXOS version 8.4(2)
- Transparent migration path
Unlike Software-Defined-Network protocol, fundamental purpose of OTV is designed for Data Center InterConnect to extend Layer-2 Domain across Data Centers. Therefore, OTV can be deployed in traditional Layer-2 VLAN based infrastructure seamlessly without entire architectural transformation by end-to-end Overlay techniques.
In addition, the OTV overlay leverages IS-IS routing protocol as control plane and can support multicast for adjacency with other OTV nodes of different sites. Therefore, insertion or removal of sites can be seamless to other sites.
2. Huawei EVN and H3C EVI — Alternative of Cisco OTV
Huawei and H3C have produced similar technique of OTV called Ethernet Virtual Network (EVN) and Ethernet Virtual Interconnect (EVI) respectively yet there are some major differences in protocol design. However, the outcomes are actually very similar between these overlay techniques. Keeping in mind that all of these overlay technologies are proprietary that cannot inter-operating between equipment from different vendors.
A whitepaper written in from sdnlab.com in simplified Chinese has provided very good comparison between Cisco OTV, Huawei EVN and H3C EVI.
3rd Generation Technique — Transform 3-Tier Network Design to End-To-End DC Fabric by VxLAN (From 2014)
Wikipedia has following elaboration to VxLAN:
“Virtual Extensible LAN (VXLAN) is a network virtualization technology that attempts to address the scalability problems associated with large cloud computing deployments. It uses a VLAN-like encapsulation technique to encapsulate OSI layer 2 Ethernet frames within layer 4 UDP datagrams, using 4789 as the default IANA-assigned destination UDP port number.[1] VXLAN endpoints, which terminate VXLAN tunnels and may be either virtual or physical switch ports, are known as VXLAN tunnel endpoints (VTEPs).[2][3]”
VXLAN introduce a new concept to overcome the limitation of 4K VLAN ID by 16 Million VXLAN Network Identifier (VNID) with the 24-bit address space located in the VXLAN header. The concept of VXLAN is similar to other overlay technique like OTV, VXLAN uses MAC-in-UDP for packet encapsulation.
From Wikipedia
“The VXLAN specification was originally created by VMware, Arista Networks and Cisco.[5][6] Other backers of the VXLAN technology include Huawei,[7] Broadcom, Citrix, Pica8, Big Switch Networks, Cumulus Networks, Dell EMC, Ericsson, Mellanox,[8] FreeBSD,[9] OpenBSD,[10] Red Hat,[11] Joyent, and Juniper Networks.
VXLAN was officially documented by the IETF in RFC 7348. VXLAN uses the MAC-in-UDP packet encapsulation mode that restricts direct access to some of an object’s components.[12]”
The primary purpose of VXLAN is not designed as a solution for overlaying between DC. Instead, it is designed as a standard for transforming data center from silos Layer-3 / Layer-2 architecture into highly scalable spine/leaf like architecture. In most use cases, VXLAN is foundation infrastructure of Software-Defined-Network (SDN).
VXLAN protocol does not govern the control plane for MAC Address Learning across the fabric. In basic VXLAN setup, usually VXLAN Flood and Learn technique is adopted for Endpoint address learning.
In highly scalable VXLAN fabric design, VXLAN works with a protocol standard called Ethernet VPN (EVPN). From Juniper, Ethernet VPN (EVPN) is a standards-based technology that provides virtual multipoint bridged connectivity between different Layer 2 domains over an IP or IP/MPLS backbone network. Like other VPN technologies, such as IP VPN and virtual private LAN service (VPLS), EVPN instances are configured on provider edge (PE) routers to maintain logical service separation between customers. EVN leverages Multiprotocol BGP (MP-BGP) and encapsulated traffic is forwarded between PE routers.
With MP-BGP EVPN, VXLAN can leverage the unique characteristic of EVPN is that MAC address learning between PE routers occurs in the control plane. The local PE router detects a new MAC address from a CE device and then, using MP-BGP, advertises the address to all the remote PE routers. This method differs from existing Layer 2 VPN solutions such as VPLS, which learn by flooding unknown unicast in the data plane. This control plane MAC learning method is the key enabler of the many useful features that EVPN provides.
Without MP-BGP EVPN, VXLAN use the concept of Head End Replication that needs network administrator to manually configure fully meshed static VTEP on every VXLAN leaf-nodes. MP-BGP EVPN, VTEP tunnels setup can leverage BGP and the Route Reflector nodes to establish fully meshed VTEP tunnels dynamically. It simplifies the tasks for insertion or removal of VXLAN leaf nodes during maintenance work.
How VXLAN compares with Cisco OTV?
VXLAN is usually be compared by vendor proprietary DCI overlay protocols like Cisco OTV. Of course VXLAN can serve the purpose for Data Center InterConnect. However network architect who design the data Center network must be aware that with VXLAN itself, it does not provide flood isolation. VXLAN has to work with MP-BGP EVPN for optimal MAC address / Endpoints learning instead of flooding. Instead, Cisco OTV has unified protocol stacks for Overlay plus MAC Routing mechanism by its built-in IS-IS routing as the control plane.