Sunday, April 29, 2012

vSphere 5 Host Network Design - 10GbE vDS Design



This design represents the highest performance, most redundant and also most costly option for a vSphere 5 environment. It is entirely feasible to lose three out of the four uplink paths and still be running without interruption and most likely with no performance impact either. When looking for a bullet proof and highly scalable configuration within the data centre then this would be a great way to go.

The physical switch configuration might be slightly confusing to look out without explanation. Essentially what we have here are four Nexus 2000 series switches that are uplinked into two Nexus 5000 series switches. The green uplink ports in the design show that each 2K expansion switch has 40GbE of uplink capacity to the 5Ks. Network layer 3 routing daughter cards are installed within the Nexus 5Ks and now traffic can be routed within the switched environment instead of going out to an external router. In other words traffic from a host will travel up through a 2K, hit a 5K and then come back down where required. It isn't apparent from the design picture, but Keep-Alive traffic is run between the console ports of the two 5K switches.

It is assumed that each host has four 10GB NICs provided by 2 x PCI-x Dual Port expansion cards. All NICs are assigned to a single virtual standard switch and bandwidth control is performed using Load Based Teaming (LBT) in conjunction with Network IO Control (NIOC) and Storage IO Control (SIOC). A good writeup on how to configure NIOC shares can be found on the VMware Networking Blog and whilst this information is specific for 2 x 10GbE uplinks it also holds true when using four 10GbE connections. LBT is a teaming policy only available when using a virtual Distributed Switch (vDS).

LACP is not used as it wouldn't be a good design choice for this configuration. There are very few implementations where LACP/Etherchannel would be valid. For a comprehensive writeup on the reasons why please check out this blog post. A valid use case for LACP could be made when using the Nexus 1000V as LBT is not available for this type of switch.

In order to gain the performance increase of Jumbo Frames for the storage layer all networking components will need to have Jumbo Frames enabled. The requires end-to-end configuration from the hosts through the network and to the storage arrays. There is definitely a performance increase by incorporating Jumbo Frames and this is outlined in the following blog post. It is important to note that enabling Jumbo Frames on the single switch will allow all traffic to transmitt at 9000MTU. This means that Management, vMotion, FT and Storage will all use Jumbo Frames. VMs will not use Jumbo Frames unless this feature is enabled on the network adapter inside the OS of the VM.

Trunking needs to be configured on all physical switch to ESXi host uplinks to allow all VLAN traffic including; Management, vMotion, FT, VM Networking and Storage. Trunking at the physical switch will enable the definition of multiple allowable VLANs at the virtual switch layer. All VLANs used must be able to traverse all uplinks simultaneously.

When running Cisco equipment there is the potential to use the Rapid Spanning Tree Protocol (802.1w) standard. This means there is no requirement to configure trunk ports with Portfast or disable STP as the physical switches will automatically identify these functions correctly. If running any other type of equipment the safest option would probably be to disable STP and enable Portfast on each trunk port, but please refer to the switch manufacturer manual for confirmation.

Running vCenter and the vCenter database on the same clusters that are managed is going to create a dangerous circular dependency. Therefore it is strongly recommended to make sure that the environment has a management cluster dedicated for vCenter and high level VMs, where the management cluster uses virtual standard switches (vSS). One alternative to a dedicated management cluster would be to run vCenter and it's database as physical servers outside of vSphere.


*** Updates ***

05/05/2012 - Minor update to Jumbo Frames paragraph. Thanks to Eric Singer for his observations.

07/05/2012 - Moved diagram to top of article so that visitors wanting to reference design do not need to scroll down the article to view the diagram. Fixed IP address and VMkernel typos.


Thursday, April 26, 2012

vSphere 5 Host Network Design - 10GbE vSS Design



Going forward my 10GbE designs will be doing more to answer questions around physical network setup and configuration. Therefore you will see more detail in the diagram then I normally give, especially in regards to how the physical switches are uplinked and interconnected. I had some network design input from Cisco engineers on this to ensure that redundancy and throughput are not compromised once vSphere traffic gets onto the physical switches.

There were many discussions around the use of Load Based Teaming and Etherchannel, neither of which is used in the following design. LBT is not used because licensing does not allow for it. For more information on LBT please check out this link.
LACP is not used as it would not be good design practise. There are very few implementations where LACP/Etherchannel would be valid. For a comprehensive writeup on the reasons why please check out this link.

The following design is based around a segmented 10GbE networking infrastructure where multiple physical switches are interconnected using high speed links. All traffic is segmented with VLAN tagging for logical network separation.

It is assumed that each host has four 10GB NICs provided by 2 x PCI-x Dual Port expansion cards. All NICs are assigned to a single virtual standard switch and traffic segregation is performed by pinning each VMkernel to a specific uplink. This is where design for 10GbE diverges from standard design in 1GB configurations. Typically for 1GB setup you would need at least two virtual switches or three when using iSCSI storage; switch1 for Management and vMotion, switch2 for VMs and switch3 for storage.

In order to gain the performance increase of Jumbo Frames for the storage layer all networking components will need to have Jumbo Frames enabled end-to-end from the hosts through the network and to the storage arrays. There is definitely a performance increase by incorporating Jumbo Frames and this is outlined in the following link. It is important to note that enabling Jumbo Frames on the single switch will allow all traffic to transmitt at 9000MTU. This means that Management, vMotion, FT and Storage will all use Jumbo Frames. VMs will not use Jumbo Frames unless this feature is enabled on the network adapter inside the OS of the VM.

Trunking needs to be configured on all uplinks where all ports on the physical switch allow all VLANs through. Trunking at the physical switch will enable the definition of multiple allowable VLANs at the virtual switch layer. It is important to note that the colours used in the diagram show how traffic will flow under normal circumstances. However all VLANs need to be able to use all uplinks in the event of a NIC failure.

If you are running Cisco equipment then you might be able to use the Rapid Spanning Tree Protocol (802.1w) standard. This means that you do not need to configure trunk ports with Portfast as the physical switches will automatically identify this port correctly. If running any other type of equipment the safest option would probably be to disable STP and enable Portfast on each trunk port, but please refer to your manufacturer manual.

Running vCenter and the vCenter database on the same hosts that it manages is not a problem in this design. So you do not need to run a Management cluster but I would say that it is usually a good design decision to do so. If you built this solution and then upgraded to Ent+ and started using virtual distributed switches then a Management cluster would be required.

This design is based around scenarios where Enterprise Plus licensing and network based bandwidth limiting/control is not available. Because SIOC and NIOC are not available in this design there is no way to guarantee bandwidth for particular traffic types. vMotion in vSphere 5 would be quite happy to consume 8Gb of an uplink and in situations where other traffic is running on that uplink it would be constricted by vMotion.


*** Updates ***

05/05/2012 - Minor update to Jumbo Frames paragraph. Thanks to Eric Singer for his observations.