Rick Mur

Limitless Networking

Page 3 of 8

What is P4?

Recently I attended a workshop organised by the Open NFP organisation about Data Plane acceleration. The primary goal of the workshop was to get students and researchers (why was I there you may think) familiar with the P4 programming language.

P4 is a programming language created to simplify writing data planes for networking use cases.  Recently the P4-16 spec was released and could be considered a mature version of the language.

Now I’m not a hardcore developer. I know my way around in Python, GoLang and C#, but I never wrote anything more low level like C. P4 is created a little bit like GoLang, where I do not mean it as comparison, but as an architecture. P4 is designed so you only need to focus on the actual networking features that you want to make available on the hardware you are programming it for. Then when you compile it, it will generate runtime code for your hardware. Or as the creators explain it:

At one level, P4 is just a simple language for declaring how packets are to be processed. At another level, P4 turns network system design on its head.

There is no need to worry on low level memory management or other things that would slow down your development for that specific feature that nobody has in their platforms.

Now this is a major advantage. Full freedom in writing anything you’d like right? The downside is that very little (basic) functionality is not present when you start pushing code to a piece of hardware. You really have to do everything yourself. Which I think is  underestimated. P4 does help you in giving you some standard functionality in their spec that vendors should implement when they offer P4 capabilities, like basic switching or load balancing (ECMP) functionality.

Who supports it?

P4 is relatively new and therefore hardware support is not very common. I was able to find 2 products to support the full P4 spec at this moment.

Barefoot Tofino chip

Barefoot is the inventor of the P4 language and they have created a very fast ASIC like chipset. The chipset does not use a proprietary SDK like Broadcom, Cavium or silicon from large network vendors, but is fully open due to the support of P4.

They are partnering with white box vendors to create switches (32 port 100GBE) that are currently shipping.

Netronome Agilio SmartNIC

Netronome was one of the organizers of the workshop I attended and they have a shipping product. Their SmartNIC products are standard PCIe NIC’s with various connections that support offloading data plane functionality to the NIC rather than letting applications or the kernel spend a lot of CPU on it.

These products are not your typical switch or NIC, they basically do not have a default feature set available to use. The Barefoot switch for example, does not come with an operating system (NOS) and the NIC always needs to be programmed before any functionality works.

What about OpenFlow?

The most typically asked question would be: Is P4 the successor of OpenFlow? The answer to that is really simple: No! In some cases P4 may even use OpenFlow to program it’s forwarding table(s). Diagram courtesy of P4.org:

The P4 language is designed to program your networking hardware (or even software components). P4 does not contain a control-plane protocol to learn about forwarding table entries (that could be learned through a traditional or home grown routing protocol).  In other words: you have to write your own forwarding table programming protocol/application or use an existing one (like OpenFlow).

What about existing vendors?

There are rumours that major networking vendors (Cisco, Juniper, Nokia, etc.) could start supporting (parts of) P4 on their own silicon or implement the Barefoot Tofino chip in products, but no announcements have been made at time of this writing. Vendors are starting to open up their operating systems via programmability features (see Juniper’s Extension Toolkit JET for example), but this does not allow to directly program new features into the hardware (which is programmable in some cases like Juniper MX).

Summary

P4 will open up a lot of features that are currently unknown or don’t exist. It allows for a fully open and programmable data-plane. The use case for this technology would, as always, be the web-scale companies, who want to free up as much CPU resources to perform other tasks and leave the networking hardware to perform all networking functions.

For lower scale deployments, my opinion would be that, tools like DPDK, XDP or eBPF would already free up sufficient CPU resources and optimise user space or kernel networking enough for most people.

For more information on all the mentioned acronyms and tools, click the links in this blog.

One noticeable fun fact was that during the workshop I attended, I was able to spot one person on a Mac, two corporate Windows machines and all others running Linux 🙂

VMware NSX 101 – Components

Today I want to explain the basic components and the set-up of VMware NSX. In this case I’m referring to NSX for vSphere or NSX-V for short. I want to explain what components are involved, how you set them up for an initial deployment and what the requirements are.

Versioning

At time of this writing the latest release is NSX 6.1.4. This version added support for vSphere 6, although you cannot use any vSphere 6 feature in this release, there is support for the platform itself only.

vSphere

The first step is of course deploy your ESXi vSphere cluster with ESXi 5.5 or 6.0 with vCenter 5.5 or 6.0. I recommend using the vCenter Server Appliance (VCSA) instead of the Windows version. You will also need a Windows VM where the vSphere Update Manager is installed, this is not available as virtual appliance, only as Windows application. I also highly recommend installing an Active Directory server to manage all of your passwords. You will be installing a large amount of machines with all different usernames and possibly passwords. I recommend picking a very long and difficult one, as all VMware appliances seem to require a different password complexity.

 

After deploying the initial cluster it’s also essential to deploy a Distributed vSwitch across the nodes where you will be deploying virtual networks. For this feature you will need the Enterprise license (or the trial will suffice) when running ESXi 5.5 Update 3 and NSX (before that Enterprise Plus was required).

NSX Manager

The only component you download is the NSX Manager. This OVA file contains everything you will need for deploying NSX. After deploying the OVA through vCenter you are able to log-in to the Web GUI of the NSX Manager. The only setting available is connecting to the vSphere Single Sign-On server and to vCenter. When deploying all components make sure you set NTP servers for everything in the lab. There are many integrations and they all rely on authentication that also takes time into consideration.

 

After connecting the NSX Manager you will no longer need it that GUI. Within the vSphere Web Client a “Networking & Security” tab is now available.

NSX Controller

The next step is to deploy at least 3 NSX Controllers in the cluster. This is automatically done by the Manager after filling in required information. You need 3 for redundancy as the cluster will select a master controller, which is done by using a majority number (quorum). With only 2 nodes, there is no majority and therefore at least 3 are required for redundancy (think of RAID with Parity disks). The controller is the critical piece of the software. Here all ARP requests are handled and all information is shared with the ESXi hosts. The Controller will also take care of multicast replication when the unicast mode is chosen for transport.

Prepare hosts

The following step is to install 3 kernel modules in all hosts. This is done using the VMware Update Manager (the Windows application) and just a single click in the GUI. It will install the NSX module to communicate with the controllers, the Distributed Logical Router (DLR) and the Distributed Firewall. No configuration is necessary as all changes are pushed by the controllers.

NSX Edge

To get traffic to and from the virtual environment you will require an NSX Edge today, as soon as OVSDB is available this could be your MX router as well! The NSX Edge has 2 forms, the first is the Distributed Logical Router that offers basic routing, bridging and firewall features. The second version is a VM that traffic is hair pinned through, like a software router. The second version also offers VPN and Load Balancing features.

Pools

To support the VXLAN transport, the hosts need to get an IP address to use as source address. This can be assigned using DHCP or an IP Pool, more on this in a next blog! Next a Segment ID pool needs to be created to allocate VNI’s for the virtual networks (called logical switches). Today it’s not supported to have more than 4000 segments on NSX, which combines the regular port groups and NSX port groups on the distributed vSwitch.

Transport Zone

The final step is to create a transport zone. This defines a “data center” or a “site” where NSX runs. This limits the broadcast domain. Per transport zone you are able to select wether the NSX Controller should handle multicast replication (Unicast mode) or that the network should handle this (Multicast or Hybrid mode). The recommended choice is Unicast mode so you won’t require multicast routing protocols or IGMP from the network. VMware eliminates the need for smart network hardware even further with this, but the solutions runs best with optimized hardware of course running as IP CLOS Fabric offering “Layer 2 as a service” using network virtualization.

Logical Switches

Now the deployment is ready to have the first VXLAN segments created, which are called Logical Switches. These networks are created as distributed port groups, so you can use the standard tools to connect VM’s to these virtual networks and suddenly you are using VXLAN transport!

 

I hope this article gave you some insight in how NSX is configured and what components it consists of.

 

Happy Labbing!

 

Rick Mur

EVPN Configuration

In my last blog I explained the features and use cases of the EVPN technology. In this blog I want to show how easy it is to configure, enable and expand EVPN. The configuration is focused on the Juniper MX platform, but as Junos is the single operating system across the entire Juniper portfolio, configuration on other platforms (like EX9200) is equal.

Design

The topology is really simple. I’m using 2 routers in this example, so multi-homing is not in scope. Each router has an Ethernet segment connected that consists of multiple VLANs on each side. There is 1 VLAN ID that is not equal on both sides, so this has to be taken care of.

EVPNtopology

Prepare for EVPN

To make sure we can start creating our VPN, we have to ensure the foundation is in place. This means we need IP reachability to the other Data Center routers loopback address and we need BGP with the EVPN address family enabled. The Junos release I’m testing with needs a special knob to be enabled to ensure packet lookups are done in the right way. Other versions will see this knob disappear as it is default moving forward.

routing-options {
    autonomous-system 64999;
    forwarding-table {
        chained-composite-next-hop {
            ingress {
                evpn;
            }
        }
    }
}
protocols {
    mpls {
        interface ge-0/0/1.0;
    }
    bgp {
        group INTERNAL {
            type internal;
            local-address 172.22.250.152;
            family inet-vpn {
                unicast;
            }
            family evpn {
                signaling;
            }
            neighbor 172.22.250.153;
        }
    }
    ospf {
        area 0.0.0.0 {
            interface lo0.0 {
                passive;
            }
            interface ge-0/0/1.0;       
        }
    }
    ldp {
        interface ge-0/0/1.0;
    }
}

Interface ge-0/0/1 is the one connected to the MPLS network and we configured an iBGP neighbor for both the Layer 3 VPN and EVPN address families to exchange layer 2 and layer 3 routes with each other.

We use OSPF as internal routing protocol, but in this simple topology it could’ve been a static route as well.

VLAN Based Service

There are 2 options to configure EVPN. The first option is to configure a separate EVPN Instance or EVI for each VLAN/Bridge-Domain. This brings the best separation of traffic, MAC address advertisements and give full control over flooding of broadcast traffic on a per VLAN/Bridge-Domain basis. The VLAN ID of traffic that is sent between the 2 PE routers will be set to 0.

EVPNoption1A Route-Target and Route Distinguisher is created for each EVI and therefore each VLAN has it’s own MPLS Label.

This could potentially cause scaling issues, as PE routers have a limit on the number of VPNs that can be configured.

The following is an example of the configuration for our topology.

interfaces {
    ge-0/0/2 {
        flexible-vlan-tagging;
        encapsulation flexible-ethernet-services;
        unit 101 {
            encapsulation vlan-bridge;
            vlan-id 101;
        }
        unit 102 {
            encapsulation vlan-bridge;
            vlan-id 102;
        }
        unit 103 {
            encapsulation vlan-bridge;
            vlan-id 103;
        }
    }
}
routing-instances {
    BD1 {
        instance-type evpn;
        interface ge-0/0/2.101;
        route-distinguisher 1001:1001;
        vrf-target target:1001:1001;
    }
    BD2 {
        instance-type evpn;
        interface ge-0/0/2.102;
        route-distinguisher 1002:1002;
        vrf-target target:1002:1002;
    }
    BD3 {
        instance-type evpn;
        interface ge-0/0/2.103;
        route-distinguisher 1003:1003;
        vrf-target target:1003:1003;
    }
}

The commands “flexible-ethernet-services” and “flexible-vlan-tagging” on the interface allow us to use this interface for a variety of services. This means we can use single or double-tagged interfaces and we can configure both layer 2 bridging as layer 3 sub-interfaces on  this single physical interface. One of the very strong features of the MX platform!

As you  can see the VPN configuration is very easy! We just make sure the right interface is in the VPN and we assign a RD and RT value to it and we’re done!

As we have a single-homed scenario, there is no configuration necessary. The ESI (as discussed in my previous blog) is by default set to 0 on single-homed routers and doesn’t have to be different on the other end, because of the single-home setting, which is enabled by default.

VLAN Aware Service

The second option is more flexible to use. This is the VLAN Aware service. This option allows multiple VLANs and Bridge-Domains to cross a single EVPN Instance (EVI). This improves scaling and still allows for proper separation. The VLAN ID is now encapsulated in the packet. To ensure proper ECMP forwarding in the MPLS network an MPLS label is assigned per VLAN ID, so that traffic will be load balanced per VLAN and not per EVI.

EVPNoption2The configuration for this service is a bit more complicated, but still fairly simple and also still allows for VLAN translation! Even MAC address overlap is allowed within this service between different VLANs/Bridge-Domains.

interfaces {
    ge-0/0/2 {
        flexible-vlan-tagging;
        encapsulation flexible-ethernet-services;
        unit 0 {
            family bridge {
                interface-mode trunk;
                vlan-id-list [ 101 102 103 ];
                vlan-rewrite {
                    translate 211 101;
                }
            }
        }
    }
}
routing-instances {
    EVPN {
        instance-type virtual-switch;
        interface ge-0/0/2.0;
        route-distinguisher 1000:1000;
        vrf-target target:1000:1000;
        protocols {
            evpn {
                extended-vlan-list [ 101 102 103 ];
            }
        }
        bridge-domains {
            NETWORK1 {
                domain-type bridge;
                vlan-id 101;
            }
            NETWORK2 {
                domain-type bridge;
                vlan-id 102;
            }
            NETWORK3 {
                domain-type bridge;
                vlan-id 103;
            }                           
        }
    }
}

As you can see we also ensure the VLAN translation is taken care of. We could use VLAN normalization in this case, which means we use a different “unique” VLAN ID within the EVPN service and use VLAN translation on each side to ensure the local side has the right encapsulation.

Summary

This blog showed you how easy and how flexible EVPN configuration on Junos is and how fast we can set-up a optimal Layer 2 Data Center Interconnect. In the following blogs I will discuss multi-homing and Layer 3 integration to ensure all-active routing across Data Centers.

« Older posts Newer posts »

© 2020 Rick Mur

Theme by Anders NorenUp ↑