Rick Mur

The Cloud Stitcher

Category: Home lab

Nested Virtualization

A typical Network Virtualization demo is difficult as you need quite some hypervisor hosts to run some VMs on and interconnect them using Overlays. I solve this using nested virtualization. This means that I run a hypervisor running on another. This gives me the flexibility that my physical nodes, or “hypervisor underlay” if you will, can scale easily and I’m independent of them.

My physical cluster consists of 2 nodes running ESXi with vCenter. On top of that I’m running 4 other ESXi hosts divided in 2 “virtual” clusters and 4 KVM hosts as Contrail Compute Nodes.

How does this work?

This technology works using Intel’s VT-x (which is hardware assisted virtualization) and EPT (to virtualise memory allocations). This combination works since the “Nehalem” archnested1itecture (released 2008). The technology is ported to the more “Desktop” oriented CPU’s as well, so there is a good chance your notebook supports it as well. Since the Haswell architecture the nested virtualization works even better as Intel now supports VMCS Shadowing for nested VMs, which creates a data structure in memory per VM (and now supports nested VMs as well, which used to be a software effort).

Memory is the biggest burden in these nested set-ups. CPU performance is always hardware assisted, so nested VMs almost feel the same as regular VMs. The problem is that ESXi itself requires 4GB of memory to run, it requires around 2GB to run properly. So get at least a 32GB server, otherwise you run out of memory very fast!

The amount of nested virtualization is technically unlimited. So you could have 2, 3 or 4 levels deep. Where at some point to use-case becomes very small of course. The same goes for 32-bit and 64-bit machines. As long as your physical CPU is a 64-bit one (32-bit CPU’s do not support VT-X or EPT) you are able to run 32-bit and 64-bit guests inside a nested ESXi installation running on top of ESXi.

 

Set-up

I’m using ESXi as my main hypervisor as it has the best management  tools currently out and most of my customers are using it. This gives me a stable foundation. So this set-up is based on the ESXi version of configuration. I’m using the latest version of ESXi as of this writing (5.5u2). To set-up nested virtualisation properly you have to use at least Virtual Machine hardware version 9 or 10. This means you have to configure this using the vSphere Web Client and therefore vCenter Server (Appliance) is also required to be running in your lab. I would also recommend to give your virtual ESXi at least 2 vCPU’s (no performance difference in allocating cores or sockets) and sufficient memory. Remember that ESXi will consume about 2GB to run for itself and the rest you can allocate to VM’s. Depending on the amount of nested VM’s you want to run I gave 6GB of memory to my ESXi as I’m only running 3-4 guest VM’s on this nested hypervisor.

CPU

The most important setting is when you open up the “CPU” panel in VM settings. There you have to enable “Expose hardware assisted virtualisation to the guest OS”. This will enable the exposure of the Intel VT-x feature towards the VM and will enable ESXi to recognize a virtualization capable host. If this checkbox is greyed out, it means your CPU is not suitable for nested virtualization.

(The same steps are required to install nested KVM inside ESXi, Hyper-V may require some additional tweaking)

Nested2

Networking

After installing ESXi inside the VM you are almost up and running! First of all you probably want to allow VLAN tagging to go towards the nested ESXi and for VLAN tags to come from the vSwitch running inside the nested ESXi. This requires a port group that is set for VLAN Trunking. I’m using a Standard vSwitch, not a Distributed vSwitch (DVS) on my physical cluster, because of the fact that the vCenter server has to be up and running to be able to power on a VM on a DVS. Because I will be playing around with this environment I can’t rely on vCenter being up all the time. So therefore I chose to use Standard vSwitches on the physical cluster. As I will experiment with networking features on the nested ESXi installations. To create a VLAN trunking capable Port Group, you have to create this on each physical host separately and allow a VLAN tag of “all” or “4095”.

Nested3

This will make sure that my network inside the nested ESXi installation is VLAN tagging capable!

Now before this will work fine, we need to change one more setting on our physical cluster ESXi installation.ESXi by default considers itself to be the master of the network (at least inside the ESXi environment). Because ESXi is generating the MAC addresses that the VMs use it doesn’t allow any other MAC addresses to enter the vSwitch coming from a virtual interface. This is a great security feature and definitely helps a lot in preventing all kinds of network attacks to be based from inside the VM. Because we will be having a whole new ESXi installation that is generating it’s own MAC addresses we do want to allow unknown MAC addresses to enter on the virtual interfaces connecting to the nested ESXi VMs. To enable this you have to allow the “Promiscuous mode” on the vSwitch or on the port groups you applied to the ESXi VM. I enabled it on the vSwitch itself, just to be sure not having to worry about it when I create a new port group. Remember to change this setting on all of your physical hosts when using a Standard vSwitch. Make sure to change “Promiscuous mode” to “accept” and that “Forged transmits” and “MAC address changes” are also set to “accept”, but that’s the default.

Nested4

Now Promiscuous mode does impact performance on your ESXi server. ESXi has no concept of MAC learning like a regular Ethernet Switch. This means that when we disable the security control of MAC addresses being assigned to the virtual interfaces (promiscuous mode) we will flood all traffic to all of the ports inside the vSwitch. With regular feature testing you won’t generate much data traffic, but when you do get to a few Mbps of traffic, this could impact CPU performance quite a lot. There is a plug-in that will enable MAC learning on interfaces that you configure it for. This does require a Distributed Virtual Switch to be running in the physical cluster. You can find it, together with implementation details, here: https://labs.vmware.com/flings/esxi-mac-learning-dvfilter

I would not recommend using it in tests with a lot of NFV appliances (virtual firewalls, virtual routers, etc.), as the entries learnt through this filter will not age out. This means that when your nested Guest VMs are moving a lot, it will register that MAC address on each port meaning the performance improvements will go away. If you have a stable environment then this MAC learning feature will be beneficial.  More details are found in this excellent blog: http://www.virtuallyghetto.com/2014/08/new-vmware-fling-to-improve-networkcpu-performance-when-using-promiscuous-mode-for-nested-esxi.html

VMtools

The final optimization that can be done when you are running ESXi as a VM is to run VMware Tools. This will make sure that a graceful shutdown can be done and makes your life a bit easier when it comes to IP addressing and have visibility inside vCenter. The default VMware tools do not support ESXi, you require to have the version installed found here: https://labs.vmware.com/flings/vmware-tools-for-nested-esxi

Summary

When everything is installed I highly recommend creating a separate cluster for these nested hypervisors and keep your physical cluster clean. Now you are ready to deploy VM’s just as you are used to, but now inside a safe environment where you can start breaking things!

 

Happy Labbing!

Home Lab Server

Currently I’m doing a lot of testing at home on Network Virtualization solutions, like VMware NSX, Juniper Contrail, etc. Therefore I was stressing my current single home server quite a lot. Which is a custom build Xeon E3-1230 quad core with 32GB of RAM and 128GB SSD. I built this server according to the specifications found at: http://packetpushers.net/vmware-vcdx-lab-the-hardware/ . This has been a great investment as I’m running nested virtualization for both KVM and ESXi hypervisors and run the testing in there. Due to the fact that for a decent Network Virtualization (NV) set-up you need quite some memory, especially if you look at the memory utilisation of the NV Controller VMs, I had to expand my lab. I chose to extend it with an additional server so I would be physically redundant as well, making it easier to run upgrades on the physical machines.

Requirements

My requirements aren’t difficult as I mainly perform feature testing in my lab I don’t need a lot of CPU performance. There are no “Production” VMs running, everything is there to play around, so downtime is not a problem if necessary.
Other requirements:

  • Average CPU performance
  • Nested virtualization support
  • At least 32GB of RAM, preferably (or upgradable to) 64GB
  • 4 or more SATA3 connections (to grow to a VSA set-up)
  • 2 or more 1Gbps Ethernet NICs
  • Out of band management (IPMI)
  • Low power
  • Small footprint
  • Low noise

Especially the last 2 requirements are important to me. I run the lab on a shelf in a large closet, so I barely want to hear fans and I want to keep the footprint small to make sure I can expand the lab further, without having to sacrifice another shelf.

Bill of materials

The bill of materials was as follows. I will explain the reasoning behind each component in detail. You can click the SKU to purchase this item on Amazon.

Description SKU
Motherboard ASRock C2750D4I
Memory (4x) Kingston DDR3 1600Mhz 8GB non-ECC KVR16N11H/8
Storage Samsung 850 Pro SSD (128GB) MZ-7KE128BW
Case LC Power LC-1410mi
Fan Noctua NF-R8 PWM

I ordered this bill of materials at the Dutch webshop Azerty.nl, take a look at the screenshot below to find the exact part numbers.

HomeServerKit

Processor

I was looking at various options for a good home lab CPU. I first looked at an Apple Mac Mini, it’s a powerful processor, low power and footprint, but the 16GB limit of the system was no option for me. The same goes for the Intel NUC boards. I continued the search for a decent multi-core mini-ITX motherboard that could hold a lot of memory. Going for a Xeon was the only option to give me the option over 32GB. Until I found the Intel Atom Avoton chip. This next generation of Intel Atom processors is a very interesting one for home lab servers. You will also find that the latest generation of Synology NAS systems also runs on this same processor. This chip features a 4-core or 8-core processor which, when looking at single core benchmarks, is not the fastest ever, but the multi-core performance definitely makes up a lot! Especially in highly virtualised environments the multi-core architecture is used very well. I looked at various benchmark tests (http://www.servethehome.com/Server-detail/intel-atom-c2750-8-core-avoton-rangeley-benchmarks-fast-power/) and found that this CPU would give me more than enough CPU performance for the tests that I’m running in my home lab while still be very quiet and low power. The performance averages to about half of the performance that my existing set-up with Xeon E3-1200 V3 would give me.
Then feature wise this CPU gives you everything you want for a virtualization lab, which is:

  • 64-bit CPU
  • Supports VT
  • Supportes nested virtualization

The next best thing was that the CPU only comes soldered to the motherboard and can be passively cooled! Which brings us to the next topic regarding the motherboard.

Motherboard

There are 2 good options for a mini-ITX motherboard that features the Intel Avoton C2750. I only looked at the 8-core model, which is quite a bit more expensive, but will give you double the CPU power (especially in VM environments). There is also a C2758 model available, which does not feature TurboBoost. Which I thought would be beneficial in my lab as I need as much performance as I can. The C2758 features QuickAssist which is used for integrating accelerators as used on embedded systems (like a NAS).

I narrowed my choice down to 2 mini-ITX motherboards I found that also feature all my other requirements for networking and out of band management.

Supermicro A1SAi-2750F (http://www.supermicro.co.uk/products/motherboard/Atom/X10/A1SAi-2750F.cfm)
+ This board features a passively cooled C2750 (some have a fan on board
+ 4x 1GE LAN on-board
Uses SO-DIMM modules (notebook memory)
Marvell NIC, requiring an additional driver in ESXi

I’m usually a fan of Supermicro as all my previous home servers had a Supermicro motherboard. Their support for ESXi is excellent and they have a decent price. The big downside of this board is the use of SO-DIMM modules, which are more expensive than the regular DDR3 DIMM’s.

ASRock C2750D4I (http://www.asrockrack.com/general/productdetail.asp?Model=C2750D4I)
+ Passively cooled C2750
+ Regular DDR3 DIMMs
+ A ton of SATA ports to have the option to build a storage heavy server in the future
+ Intel NICs that have their driver built-in to ESXi 5.5 update 2
Only 2x 1GE LAN on-board

I chose the ASRock based on the many benefits this board has over the Supermicro. It was cheaper, supported cheaper DIMMs and didn’t require an additional driver installed in ESXi 5.5 update 2, making upgrades easier. The many SATA ports on the system make it an excellent board to grow to a VSA appliance in the future when required.

NOTE! Even though the CPU is passively cooled, the board requires you to connect a fan to CPU_FAN1 otherwise the board will not power up.

C2750D4I

Memory

I first tried to find 16GB DIMMs that were affordable. Unfortunately a single 16GB DIMM is currently the cost of 4x8GB DIMMs. My choice was therefore brought back to getting 4x 8GB DIMMs to get a total of 32GB memory in my next server.
I cannot stress enough that you should purchase from the Memory QVL that your motherboard supplier publishes. Any other DIMM may work, may work unstable or may not work at all. Fortunately a couple of Kingston’s affordable line of memory was tested by ASRock, so I didn’t look further and got those. The server is rock solid on these Kingston DIMMs already running for weeks.

Current Memory QVL for the ASRock C2750D4I

http://www.asrockrack.com/general/productdetail.asp?Model=C2750D4I#Memory%20QVL

I omitted the 4GB DIMMs from the table below as I need at least 32GB of RAM in the server.

Type Speed DIMM Size Vender Module
DDR3 1600 non-ECC 8GB ADATA AD3U1600C8G11-B
DDR3 1600 ECC 8GB Apacer 78.C1GER.AT30C
DDR3 1600 ECC 8GB Crucial CT102472BD160B.18FED
DDR3 1600 ECC 8GB innodisk M3C0-8GSS3LPC
DDR3 1600 non-ECC 8GB Kingston KVR16N11H/8
DDR3 1333 ECC 16GB Memphis IMM2G72D3DUD8AG-B15E
DDR3 1333 non-ECC 16GB Memphis IMM2G64D3DUD8AG-B15E
DDR3 1333 ECC 16GB Memphis IMM2G72D3LDUD8AG-B15E

If you would like to purchase the 16GB DIMMs mentioned on this Memory QVL please contact ASRock Sales  ([email protected]) for a quote. They sell and ship these Memphis DIMMs worldwide.


Storage

Synology-DiskStation-DS713+
I’m currently happily using a Synology DS713+ 2-bay NAS with 2 Western Digital RED 3TB disks for over a year. It is my primary source of shared storage for everything. Including all of the VMs. Therefore I don’t need a ton of storage in my server. I may want to play around with VMware VSAN or other VSA options, but for now I’ll keep everything stored on NFS shares on my Synology DS713+. The disks in the NAS are mirrored, meaning I only get IOPS from a single disk. While running 20-30 VM’s on this NAS, I notice that performance is going down. Therefore I chose to use a small 128GB SSD in both of my servers and use VMware Flash Read Cache. This technology lives inside ESXi will cache all the Disk Reads performed in a VM (when enabled for Flash Read Cache) and will also use the SSD as SWAP file instead of a file located in the folder where the VM is stored. This enhances performance a lot in my lab as my VMs are not storage heavy and usually consist of system files of the OS and some database files. When they are first read from the disk they are stored in my server’s SSD and especially Windows and Linux VMs benefit a lot from this!

In the screenshots below you can see my current usage of the Flash Read Cache.

FlashCache2FlashCache1

Case and cooling

The case I chose is not a very exciting one. It’s much bigger than required for a mini-ITX board, but I wanted a low profile case that would fit on top of my existing server and had a decent power supply built in (for cost savings). The case I chose ended up being a great one, as it features a fan that sucks air in to the chassis through the power supply.

As mentioned before the CPU is passively cooled, but as it’s being heavily used in ESXi running 10 or so VMs at any time. I needed additional cooling. In my existing server I’m using Noctua fans as they are amazingly quiet and perform very well! I chose a simple 8mm fan and mounted it right above the CPU heat sink to suck air out of the chassis again. This way it creates a great airflow through the case and my temperatures are very low for a server that is constantly using quite some CPU resources, while still being almost completely silent.

ServerInternals

ESXi and vMotion

As the internal SSD is used only for caching I installed ESXi 5.5 update 2 on a USB key. As no other storage is required to run ESXi, the USB key works fine, as long as logging can be exported to an external datastore (NFS for example).
When I added the new server to the existing ESXi cluster there is of course a big gap in CPU generation and features, as the other server is a Xeon. VMware has a great feature called “EVC”. This feature makes sure that you can limit the CPU features a VM uses so that it’s compatible with all different generations of CPU’s in your cluster while still utilise Live/Hot VM Migration (vMotion). The Avoton CPU has features that are equal to the “Intel Westmere” generation of CPU’s. This means that changing the EVC setting to “Westmere” enabled live vMotion for all my VM’s in my “Physical” cluster.

EVCmode

Summary

As I have DRS enabled in my cluster the system automatically moves load between the different servers. When I’m working on my lab set-up now I have to say that the only way to see which CPU I’m running on is at boot-up of a VM. The Xeon processor has more power and that’s noticeable during the boot-up. After the system is running it’s very hard for me to tell which CPU it’s running on and that is exactly what I wanted to achieve. The Avoton CPU is an amazing system as a home lab when you are purely feature testing. Again this is not a performance beast and you should not run any tests that should compare to a production system. This system is meant for playing around with many different features.

Currently both of my hosts are again running at 80-85% memory utilisation, so it’s heavily used and I couldn’t be happier with it.

If you have any questions on my set-up please comment below!

HomeServerRoom

© 2017 Rick Mur

Theme by Anders NorenUp ↑