Rick Mur

Limitless Networking

vSRX policy-based IPsec VPN over GRE (part 2/2): the workaround

After discussing the issue I’m running into in my home lab set-up in the previous post. This post will outline configuration and some final testing to confirm a succesful workaround.

The issue as outlined in the previous post is a combination of having a GRE tunnel that is not the same as the destination IP of the IPsec VPN. So the policy engine has trouble understanding the double encapsulation in ESP packets first and in the GRE tunnel second. As shown in the packet capture, the ESP encapsulation is not performed and packets are sent over the GRE tunnel unencrypted (a behavior of GRE over IPsec, which is a much more ‘normal’ use case for this).

My solutions should ensure that the GRE tunnel is not seen as the next-hop, so the SRX has a chance to encrypt the traffic first.

The goal of the solution is to build a set-up like the diagram below. Where the inet.0 routing table, where the IPsec VPN resides, does not contain the GRE tunnel to connect to the Colo router.

The set-up is still the same where 2 vSRX firewalls connect over 2 vMX routers to each other with a policy based IPsec VPN. This detail is important as the behavior of not encrypting packets is not seen when deploying a route based VPN (with st0 interface).

Virtual Router

To separate some traffic from the rest, we need to create another routing table inside the system. Junos calls this concept a routing-instance, which can be many things. One of them is a virtual router instance type, which is similar to the VRF-lite concept on Cisco platforms.

Let’s first setup this virtual router instance and move the GRE interface and relevant BGP configuration to it.

routing-instances {
    EDGE {
        routing-options {
            static {
                route 2.2.2.2/32 next-hop 10.0.2.1;
            }
        }
        protocols {
            bgp {
                group COLO {
                    type external;
                    export colo-export;
                    peer-as 65000;
                    neighbor 10.0.22.1;
                }
            }
        }
        interface ge-0/0/0.0;
        interface gr-0/0/0.0;
        instance-type virtual-router;
    }
}

The physical WAN interface and the GRE interface are now moved into the routing-instance and the BGP session will now be set-up.

Keep in mind that the security policy and zoning configuration should be adjusted to this new set-up to allow traffic to flow between the interfaces in the virtual-router (from zone X to zone X policy).

The second step is to allow traffic to go from the newly created virtual-router to the default global routing table (inet.0). With route leaking the next-hop would not change, so we need something else. The best tool to solve this is a logical-tunnel interface.

Logical tunnels

The concept of logical tunnel interfaces has been around for a long time. I heavily used them to connect many logical-systems (early version of slicing in routers) on a MX480 together to build out a JNCIE lab setup with only 1 physical MX.

Technically the logical tunnel interface is a loopback functionality inside the system to simulate a hairpin link without having to use physical ports. The logical tunnel works by defining 2 units. These units can each be placed in separate VRF’s, Logical Systems, etc. to bring traffic between these segmented areas and is treated just like any other phyiscal interface.

On platforms with PFE’s (Packet Forwarding Engines), the looping of the traffic is done in hardware. On Trio based MPC linecards this requires enabling of tunnel-services as this will consume bandwidth (this is also the case to enable GRE tunneling).

More details found in the official Juniper documentation

In case of the vSRX and also the smaller physical SRX platforms, the system uses dedicated CPU cores as data-plane where the logical tunnel traffic is handled.

Let’s setup the logical tunnel to allow traffic between the virtual-router and the global table configuration.

interfaces {
    /* Loopback VR to Global */
    lt-0/0/0 {
        unit 1 {
            encapsulation ethernet;
            peer-unit 2;
            family inet {
                address 10.0.222.2/30;
            }
        }
        unit 2 {
            encapsulation ethernet;
            peer-unit 1;
            family inet {
                address 10.0.222.1/30;
            }
        }
    }
}
routing-instances {
    EDGE {
        routing-options {
            static {
                route 192.0.2.0/24 next-hop 10.0.222.2;
            }
        }
        interface lt-0/0/0.2;
    }
}
routing-options {
    static {
        route 0.0.0.0/0 next-hop 10.0.222.1;
    }
}

As seen in the configuration. The logical tunnel interface has unit 1 and unit 2. They are connected to each other using the ‘peer-unit’ command. Unit 1 is then connected to the virtual router and unit 2 ends up in inet.0.

As the BGP session is now moved to the virtual-router, we need to make sure that all traffic is going towards the virtual router using a static default route. Then secondly it’s necessary to ensure the traffic towards the public IP prefix is received in inet.0 so another static route for 192.0.2.0/24 is required in the virtual-router to be sent across the logical tunnel towards inet.0

This results in the following routing tables.

[email protected]> show route | no-more 

inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[Static/5] 00:10:58
                    >  to 10.0.222.1 via lt-0/0/0.1
10.0.222.0/30      *[Direct/0] 00:10:58
                    >  via lt-0/0/0.2
10.0.222.2/32      *[Local/0] 00:10:58
                       Local via lt-0/0/0.2
192.0.2.1/32       *[Direct/0] 02:20:32
                    >  via lo0.0
192.168.1.0/24     *[Direct/0] 02:19:46
                    >  via ge-0/0/1.0
192.168.1.1/32     *[Local/0] 02:19:46
                       Local via ge-0/0/1.0

EDGE.inet.0: 9 destinations, 9 routes (9 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[BGP/170] 00:06:37, localpref 100
                      AS path: 65000 I, validation-state: unverified
                    >  to 10.0.22.1 via gr-0/0/0.0
2.2.2.2/32         *[Static/5] 00:10:58
                    >  to 10.0.2.1 via ge-0/0/0.0
10.0.2.0/30        *[Direct/0] 00:10:58
                    >  via ge-0/0/0.0
10.0.2.2/32        *[Local/0] 00:10:58
                       Local via ge-0/0/0.0
10.0.22.0/30       *[Direct/0] 00:06:39
                    >  via gr-0/0/0.0
10.0.22.2/32       *[Local/0] 00:06:39
                       Local via gr-0/0/0.0
10.0.222.0/30      *[Direct/0] 00:10:58
                    >  via lt-0/0/0.1
10.0.222.1/32      *[Local/0] 00:10:58
                       Local via lt-0/0/0.1
192.0.2.0/24       *[Static/5] 00:10:58
                    >  to 10.0.222.2 via lt-0/0/0.2

We see pretty much the same routing table as in the previous post, with only the addition of 10.0.222.0/30 as transit subnet on the logical tunnel and now being separated into 2 separate tables.

Verification

Now let’s see if we can finally reach host2 from host1 as that did not work in the previous post.

host1:/# ping 192.168.2.2
PING 192.168.2.2 (192.168.2.2) 56(84) bytes of data.
64 bytes from 192.168.2.2: icmp_seq=1 ttl=62 time=4.46 ms
64 bytes from 192.168.2.2: icmp_seq=2 ttl=62 time=3.42 ms
64 bytes from 192.168.2.2: icmp_seq=3 ttl=62 time=3.20 ms
64 bytes from 192.168.2.2: icmp_seq=4 ttl=62 time=2.67 ms
64 bytes from 192.168.2.2: icmp_seq=5 ttl=62 time=3.21 ms
64 bytes from 192.168.2.2: icmp_seq=6 ttl=62 time=2.85 ms
64 bytes from 192.168.2.2: icmp_seq=7 ttl=62 time=2.93 ms
64 bytes from 192.168.2.2: icmp_seq=8 ttl=62 time=3.14 ms
64 bytes from 192.168.2.2: icmp_seq=9 ttl=62 time=3.28 ms
64 bytes from 192.168.2.2: icmp_seq=10 ttl=62 time=2.50 ms
^C
--- 192.168.2.2 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9013ms
rtt min/avg/max/mdev = 2.501/3.166/4.459/0.509 ms
host1:/# 

Finally! Let’s check if the packet capture also shows the same expected result.

With the ESP packets showing correctly when monitoring the ge-0/0/0 WAN interface on the Home vSRX. We can confirm that the workaround works!

Conclusion

This solution seems a bit far fetched, but it does work quite well for my use case at home. I have not had any stability issues and am very happy with it. Still this is quite a complex set-up to troubleshoot so please prevent deploying things like this in production, but if you do run into this corner case of having to use policy based VPNs on a vSRX with a GRE tunnel as underlay. You know how to solve it!

vSRX policy-based IPsec VPN over GRE (part 1/2)

During the implementation of using my own public IP space on my home server I ran into a weird issue of not being able to pass traffic between hosts on either side of the VPN. After investigation it seems to be a combination of factors by using GRE tunneling, a policy based site-to-site VPN rather than a route-based site-to-site VPN and the (v)SRX platform. Please note that I only tested this on the Virtual SRX platform, not on physical appliances. I did test in multiple releases of Junos, but found that it affects all that I tested (latest I tested is Junos 20.1R1.11).

As described in the previous blog post regarding running own public IP space in your home lab behind a consumer internet connection. I run a GRE tunnel to a router that is connected via multiple peering and transit connections which allows me to advertise my own AS and IP prefixes. Then going across the internet I want to run a site-to-site VPN towards another location where back-ups of my homelab are stored.

IPsec over GRE or GRE over IPsec

The 2 variants seem to be used interchangeably if you search for this deployment online, but my use case was very much the first option. I’m running a site-to-site VPN over a GRE tunnel across the internet towards another VPN endpoint. The latter option is a much more deployed option in real life as it enables the user to support other types of network traffic that are traditionally not possible with a IPsec VPN like supporting multicast or MPLS traffic. 

The design I’m trying to implement is not very typical and (should) not (be) found in production networks (as it’s really a workaround). What I’m trying to say is that when GRE and IPsec are typically deployed end-to-end it is between the same endpoints. In my case the GRE tunnel terminates before the IPsec VPN endpoint.

Initial configuration

Let’s first set a baseline in a lab environment to test this out. I’m using my lab server running EVE-NG to set-up a lab with 2 vSRX firewalls at my Home and my Remote site. Then a vMX router simulating my ISP (just a transit for GRE traffic), a vMX router terminating my GRE tunnel and advertising the public IP prefix. Finally 2 Docker containers simulating end hosts to generate some test pings that are in the LAN subnets behind both vSRX firewalls. To keep the blog readable, full configurations are not shown, but only relevant parts.

The set-up is as shown in the diagram 

The ISP and COLO routers have full reachability with each other using OSPF so the loopbacks are reachable as if they are internet hosts. The GRE tunnel is set-up between the Home SRX and COLO router and, just like in the previous blog, using BGP to exchange a public IP prefix so my home router is reachable via that. The Home SRX has a Loopback IP of 192.0.2.1.

interfaces {
    /* ISP Uplink */
    ge-0/0/0 {
        unit 0 {
            family inet {
                address 10.0.2.2/30;
            }
        }
    }
    /* GRE to COLO router */
    gr-0/0/0 {
        unit 0 {
            clear-dont-fragment-bit;
            tunnel {
                source 10.0.2.2;
                destination 2.2.2.2;
                allow-fragmentation;
            }
            family inet {
                mtu 1476;
                address 10.0.22.2/30;
            }
        }
    }
    /* Home LAN */
    ge-0/0/1 {
        unit 0 {
            family inet {
                address 192.168.1.1/24;
            }
        }
    }
    /* Public IP address */
    lo0 {
        unit 0 {
            family inet {
                address 192.0.2.1/32;
            }
        }
    }
}
protocols {
    bgp {
        group COLO {
            type external;
            /* Export public IP prefix */
            export colo-export;
            peer-as 65000;
            neighbor 10.0.22.1;
        }

    }
}
routing-options {
    static {
        route 2.2.2.2/32 next-hop 10.0.2.1;
        route 192.0.2.0/24 discard;
    }
    router-id 3.3.3.3;
    autonomous-system 65001;
}

As shown above in the static routes. The Home SRX only knows how to reach the COLO router via the regular ISP uplink and will rely on a default route imported via BGP over the GRE tunnel for all other destinations.

[email protected]> show route 

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[BGP/170] 05:13:34, MED 0, localpref 100
                      AS path: 65000 I, validation-state: unverified
                    >  to 10.0.22.1 via gr-0/0/0.0
2.2.2.2/32         *[Static/5] 05:13:37
                    >  to 10.0.2.1 via ge-0/0/0.0
10.0.2.0/30        *[Direct/0] 05:13:37
                    >  via ge-0/0/0.0
10.0.2.2/32        *[Local/0] 05:13:37
                       Local via ge-0/0/0.0
10.0.22.0/30       *[Direct/0] 05:13:37
                    >  via gr-0/0/0.0
10.0.22.2/32       *[Local/0] 05:13:37
                       Local via gr-0/0/0.0
192.0.2.0/24       *[Static/5] 05:14:53
                       Discard
192.0.2.1/32       *[Direct/0] 05:14:53
                    >  via lo0.0
192.168.1.0/24     *[Direct/0] 05:13:37
                    >  via ge-0/0/1.0
192.168.1.1/32     *[Local/0] 05:13:37  
                       Local via ge-0/0/1.0

The host containers on the home and remote LAN subnets are now able to reach the internet via a simple source NAT configuration. On the remote subnet this is done via the interface IP address and on the home subnet this is done by using the public IP address (192.0.2.1). Both hosts are able to reach the loopback IP of the ISP router (remember, the Home SRX is directly connected, but only know how to reach it via the GRE tunnel received default route over BGP).

host1:/# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=62 time=4.17 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=62 time=2.83 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=62 time=3.18 ms

Host2:/# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=62 time=12.8 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=62 time=2.02 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=62 time=1.57 ms

IPsec configuration

As this set-up is replicating my own home set-up. I am using a policy based VPN, because the remote side is using a device which does not support a route based VPN (using an st0 interface on Junos or similar on other vendors).

The most important part in the config is the exclusion of the traffic dedicated for the IPsec VPN (192.168.1.0/24 destined for 192.168.2.0/24) of the NAT rule, as I just want the 2 subnets to communicate directly without translation.

Second is the policy part, where the policy is used to trigger the encapsulation of traffic in ESP packets for the traffic between the 2 subnets.

security {
    ike {
        proposal ike-vpnProposal {
            authentication-method pre-shared-keys;
            dh-group group5;
            authentication-algorithm sha1;
            encryblogion-algorithm aes-128-cbc;
            lifetime-seconds 28800;
        }
        policy ike-vpnPolicy {
            mode main;
            proposals ike-vpnProposal;
            pre-shared-key ascii-text blablabla;
        }
        gateway gw-blog {
            ike-policy ike-vpnPolicy;
            address 10.0.3.2;
            external-interface lo0.0;
            local-address 192.0.2.1;
        }
    }
    ipsec {
        proposal ipsec-vpnProposal {
            protocol esp;
            authentication-algorithm hmac-sha1-96;
            encryblogion-algorithm aes-128-cbc;
            lifetime-seconds 3600;
        }
        policy ipsec-vpnPolicy {
            perfect-forward-secrecy {
                keys group5;
            }
            proposals ipsec-vpnProposal;
        }
        vpn vpn-blog {
            ike {
                gateway gw-blog;
                ipsec-policy ipsec-vpnPolicy;
            }
            establish-tunnels immediately;
        }
    }

    address-book {
        untrust {
            address remote-network 192.168.2.0/24;
            attach {
                zone untrust;
            }
        }
        trust {
            address home-network 192.168.1.0/24;
            attach {
                zone trust;
            }
        }
    }
    nat {
        source {
            pool internet {
                address {
                    192.0.2.1/32;
                }
            }
            rule-set trust-to-untrust {
                from zone trust;
                to zone untrust;
                rule vpn {
                    match {
                        source-address 192.168.1.0/24;
                        destination-address 192.168.2.0/24;
                    }
                    then {
                        source-nat {
                            off;
                        }
                    }
                }
                rule all {
                    match {
                        source-address 0.0.0.0/0;
                    }
                    then {
                        source-nat {
                            pool {
                                internet;
                            }
                        }
                    }
                }
            }
        }
    }
    policies {
        from-zone trust to-zone untrust {
            policy vpn2 {
                match {
                    source-address home-network;
                    destination-address remote-network;
                    application any;
                }
                then {
                    permit {
                        tunnel {
                            ipsec-vpn vpn-blog;
                            pair-policy vpn;
                        }
                    }
                }
            }
            policy permit-all {
                match {
                    source-address any;
                    destination-address any;
                    application any;
                }
                then {
                    permit;
                }
            }
        }
        from-zone untrust to-zone trust {
            policy vpn {
                match {
                    source-address remote-network;
                    destination-address home-network;
                    application any;
                }
                then {
                    permit {
                        tunnel {
                            ipsec-vpn vpn-blog;
                            pair-policy vpn2;
                        }
                    }
                }
            }
        }
    }
}

Now the VPN is configured and active we can test reachability

[email protected]> show security ipsec security-associations 
  Total active tunnels: 1     Total Ipsec sas: 1
  ID    Algorithm       SPI      Life:sec/kb  Mon lsys Port  Gateway   
  <2      ESP:aes-cbc-128/sha1 7df48801 2520/ unlim - root 500 10.0.3.2        
  >2      ESP:aes-cbc-128/sha1 eea8273a 2520/ unlim - root 500 10.0.3.2  

The behavior seen is very strange. The Home LAN host cannot access the Remote LAN host, but from the Remote to Home all seems to work fine!

host1:/# ping 192.168.2.2
PING 192.168.2.2 (192.168.2.2) 56(84) bytes of data.
^C
--- 192.168.2.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2044ms


Host2:/# ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=62 time=3.27 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=62 time=3.36 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=62 time=3.67 ms

Weird IPsec over GRE issue

The problem seems to be with how the SRX policy engine encapsulates traffic. The ‘double encapsulation’ of both ESP and GRE does not seem to be working when the GRE and ESP endpoint are not the same IP address. In this case the GRE tunnel terminates well before the IPsec endpoint.

In the session flow output, everything seems to be okay with next-hop correctly showing the GRE tunnel, but no traffic is showing up on the other side.

[email protected]> show security flow session    

Session ID: 162, Policy name: vpn2/4, Timeout: 58, Valid
  In: 192.168.1.2/152 --> 192.168.2.2/1;icmp, Conn Tag: 0x0, If: ge-0/0/1.0, Pkts: 1, Bytes: 84, 
  Out: 192.168.2.2/1 --> 192.168.1.2/152;icmp, Conn Tag: 0x0, If: gr-0/0/0.0, Pkts: 0, Bytes: 0, 

This is clearly seen in a packet capture on the Home SRX ISP uplink interface. Only GRE packets are seen, but they are unencrypted!

When the ping is initiated from the remote side, the ping is working fine. Even the return traffic is correctly encapsulated in the IPsec tunnel with ESP.

Solution(s)

The limitation seems to be the combination of using a policy based VPN with a GRE tunnel as underlay that does not terminate on the same device/IP as the IPsec tunnel. Again the use case for this is limited and I would not see a lot of people running into this. Of course there are always workarounds to solve this!

1. Route Based

The first solution would be to use a route based VPN. When traffic is routed across the IPsec VPN. The next-hop interface in the flow engine then changes from the gr-0/0/0 tunnel interface to the st0 interface and the traffic is correctly encrypted and encapsulated in ESP before being sent over the GRE tunnel.

As mentioned before, my remote side does not support a route based VPN so this does not solve my problem.

2. Looping through the vSRX with a logical tunnel

This is a bit of a far stretched solution that I would not typically recommend using in a production environment, but it works fine and I’ve successfully used this set-up for a few months now.

My goal is that I need another device or virtual device in front of the GRE tunnel, so the next-hop interface changes from the GRE tunnel to something else. I ultimately settled on creating a virtual-router (VRF-lite like) routing-instance on the vSRX. Setting up the GRE tunnel there and looping the traffic back into the default routing-instance using a logical-tunnel.

I will explain the configuration steps involved for this in the next post!

Using my own public IP space on a regular internet connection with a Juniper SRX

Recently I purchased new servers for my home lab (topic for more posts to follow when everything is fully done). I wanted to use these servers to run a number of workloads (mainly virtual machines) and I wanted to use the public IP space I recently got for that. Of course my consumer internet connection does not support setting up BGP and I did not want to spend a lot more money on a ‘business internet’ option.

I was discussing this with a friend and he offered me to use the router in a co-location he has. This allowed me to set-up a GRE tunnel to the router and advertise my prefix. Resulting in the following topology:

In the initial set-up I’m using a virtual firewall, as I’m moving my home lab in a few months and want to deploy this on a physical firewall then. My first attempt was to use pfSense. I’ve been using pfSense in the past a lot with great pleasure. It has great firewall, NAT and VPN features, but routing is not its strongest. I was successful in setting up OpenBGPd, but I found the GUI integration limited. The GRE tunnelling support was where I got stuck. One important missing piece on the pfSense part was that it is not possible to configure IPv6 addresses on the GRE tunnel interface (overlay IP addresses). I managed to get it working on link-local addresses, but this required special static routes to be added and I didn’t think it was an elegant solution. Without having proper overview of the functioning of the system (both on the GRE tunnel and the BGP part) I abandoned the pfSense option.

vSRX

I then deployed a Juniper vSRX firewall. Now this is not a free option, but as I work for Juniper, I really liked having Junos at this critical place in my own home network. I’m using the 60-day trial license that anyone can get, including the latest virtual appliances on the Juniper website (https://www.juniper.net/us/en/dm/free-vsrx-trial/). In the final set-up I’m probably going to use a SRX300 physical appliance that should give me enough forwarding performance for my 500/40Mbps cable connection.

In the past the vSRX was based on a FreeBSD VM running on top of a Linux guest OS. Where the data-plane processes were running on top of Linux to utilise DPDK to accelerate packet processing.

vSRX2 architecture

This enabled the use of much better multi-core processing support in the Linux kernel, while still using the FreeBSD based Junos OS. The downside was that booting the appliance took a lot of time (7-10 minutes). It also required to have ‘Promiscuous mode’ enabled to connect to the fxp0 out-of-band management interface, as this was a virtual NIC with another mac address inside the FreeBSD VM.

The latest version of vSRX called vSRX3 or vSRX 3.0 surprisingly does not use Linux anymore, but is fully based on FreeBSD 11. Making use of the improvements in FreeBSD regarding multi-core processing and support for DPDK. It also means that performance did not change (may have even improved in some cases) when this transition was made and greatly improving boot times (about a minute now). Even with only 2 CPU cores (still separating control- and data-plane processes) the vSRX performs great and is well suited for my use case.

vSRX3 architecture

As there is no virtual machine involved in this architecture. The fxp0 interface is reachable again without having to enable Promiscuous mode in ESXi environments.

My set-up did not require the use of more advanced features like SR-IOV, because my bandwidth requirements are considered quite low (1Gbps is more than enough for my home network!)

Tip: To know the difference if you are running a vSRX3 or a previous generation, take a look at the output of ‘show version’. If ‘vSRX’ is written with uppercase letters you are running vSRX3, if ‘vsrx’ is written in all lowercase, you are running a previous generation.

Fun fact: Did you know the first version of this product was called Firefly Perimeter?

Diagrams and more details found at: https://www.juniper.net/documentation/en_US/vsrx/information-products/topic-collections/release-notes/19.1/topic-98044.html#jd0e101

GRE configuration

The set-up for GRE and BGP is rather simple if you are used to Junos configuration. The GRE tunnel interface is available on the vSRX by default and no configuration is necessary to enable it.

First I configured the GRE interface. The important part here is that I ran into some MTU/fragmentation issues as I am transiting a default 1500 byte MTU infrastructure (also known as the Internet ;). Which is why I enabled the allow-fragmentation and clear-dont-fragment-bit knobs. As I will want to have all my internet facing traffic to use the GRE tunnel, the only routing configuration I need is to know where to find the GRE tunnel destination. This is why a /32 static route is configured.

All IP addressing used in these examples are specified as prefixes used for documentation.

interfaces {
gr-0/0/0 {
unit 0 {
clear-dont-fragment-bit;
description "GRE to Colo";
tunnel {
source 203.0.113.1;
destination 198.51.100.1;
allow-fragmentation;
}
family inet {
mtu 1476;
address 192.0.2.253/31;
}
family inet6 {
mtu 1476;
address 2001:db8:1::1/64;
}
}
}
}
routing-options {
static {
route 198.51.100.1/32 next-hop 203.0.113.2;
}
}

As the GRE interface is just a regular interface, we should configure this interface to belong to a security zone. In this case it’s my untrust zone as I connect directly to the Internet. I allow BGP traffic to access this interface as we will configure this later. I do not include any security policy configuration here, as they are just default (allow all traffic from trust to untrust).

security {
flow {
tcp-mss {
all-tcp {
mss 1340;
}
}
}
zones {
security-zone untrust {
interfaces {
gr-0/0/0.0 {
host-inbound-traffic {
protocols {
bgp;
}
}
}
}
}
}
}

One key knob here is that I have to specify a TCP MSS value that is much lower than normal. This is because I need to account for a GRE+IP header overhead, but I also have an IPsec tunnel configured that adds overhead. There are special commands here to just change the MSS size for GRE tunnels, but they are designed for the GRE over IPsec use-case. This is IPsec over GRE.

In my case an MSS size of 1366 works, but this is not an easy job to determine, as the ESP overhead on a packet varies on the packet size. I tried an ICMP ping with DF-bit set and up to 1378 bytes, my ping reply came back. This means 1378 + ICMP (8b) + IP (20b) = 1406 bytes, or 94 bytes of total overhead

MSS: MTU – IP (20b) – TCP (20b) = 1460 bytes – All-overhead (94b) = 1366 bytes

Cisco has a great IPsec overhead calculator https://cway.cisco.com/ipsec-overhead-calculator/

Online I found varying safe numbers. I’ve set it to 1340 to also accommodate the additional 20 bytes on an IPv6 header. Please comment below your recommendation for MSS size!

BGP configuration

Finally it’s time to set-up some dynamic routing. In this case, as I wanted to advertise my own Autonomous System (AS) as well. I wanted to originate BGP from my house and the router in the co-location to act as transit provider for me. So let’s setup a pretty standard EBGP configuration. Again BGP is included in any vSRX license.

protocols {
bgp {
group colo-v4 {
type external;
import colo-ipv4-in;
family inet {
unicast;
}
export colo-ipv4-out;
peer-as 65001;
neighbor 192.0.2.254;
}
group colo-v6 {
type external;
import colo-ipv6-in;
family inet6 {
unicast;
}
export colo-ipv6-out;
peer-as 65001;
neighbor 2001:db8:1::2;
}
log-updown;
}
}
routing-options {
router-id 192.0.2.1;
autonomous-system 64999;
}

As BGP is already specified as host-inbound-traffic in the security zones configuration, no additional policy is required. This does mean that all BGP traffic is allowed. Of course it’s best practice to limit the source address of the BGP traffic to only the neighbors configured. For that the vSRX supports regular control plane security by applying a firewall filter configuration to the Loopback interface (no IP address required if you don’t want it). 

Of course policies are the most important part of any BGP configuration. I’m not interested in receiving a full BGP table so a default route will do fine for both IPv4 and IPv6. Second I need to advertise my own IP space from my own AS. To originate the prefixes I created 2 static routes to advertise the entire subnet, so I can use that later.

policy-options {
policy-statement colo-ipv4-in {
term allow-default {
from {
route-filter 0.0.0.0/0 exact;
}
then accept;
}
then reject;
}
policy-statement colo-ipv4-out {
term my-subnet {
from {
protocol static;
route-filter 192.0.2.0/24 exact;
}
then accept;
}
then reject;
}
policy-statement colo-ipv6-in {
term allow-default {
from {
route-filter ::/0 exact;
}
then accept;
}
then reject;
}
policy-statement colo-ipv6-out {
term my-subnet {
from {
protocol static;
route-filter 2001:db8:abc::/48 exact;
}
then accept;
}
then reject;
}
}
routing-options {
rib inet6.0 {
static {
route 2001:db8:abc::/48 discard;
}
}
static {
route 192.0.2.0/24 discard;
}
}

RIPE

Now establishing a dynamic routing configuration is one thing. To have your prefix be accepted by any upstream networks, you have to make sure your administration is in place. As I live in Europe, my IP resources come from RIPE. 

By the way: If you want to learn more on RIPE, RIPE NCC and Routing Security, make sure you listen to the episode of the Routing Table podcast we recorded with Nathalie Trenaman on these topics!

You have to make sure there is a ROUTE object created for your prefixes which also specifies what the origin AS will be for these prefixes. While you are there and signed in anyway, make sure you also create ROA objects so RPKI will mark your prefixes as valid when they originate your AS!

Most providers will have some form of filtering applied to their peers, which means that I also had to make sure that my upstream provider accepted my AS and prefixes. This meant that I had to be included in the AS-SET object of my upstream provider.

I typically use the bgpq3 tool to verify if my objects are correctly set (no guarantees, as your providers can use different tools to collect this information)

NAT

Now to finally make use of the prefix now it’s advertised and accepted. I set-up a simple source NAT rule so my egress traffic uses an IP address in my own subnet as source for browsing the web.

The SRX has a default source NAT rule in place to accept all traffic from the trust zone going to untrust and use the interface IP address as source IP. This I want to have changed to an IP from my own subnet, which requires a pool to be created (even it’s only 1 IP that everything is translated to).

security {
nat {
source {
pool internet {
address {
192.0.2.1/32;
}
}
rule-set trust-to-untrust {
from zone trust;
to zone untrust;
rule all-trust {
match {
source-address 0.0.0.0/0;
}
then {
source-nat {
pool {
internet;
}
}
}
}
}
}
}
}

Done!?

Now I can browse the internet from my own IP space from my own home internet connection!

Stay tuned for another blog about the funny limitation I ran into when I configured an IPsec VPN. That required having to reconstruct quite a bit of this set-up to work around it!

 

« Older posts

© 2020 Rick Mur

Theme by Anders NorenUp ↑