OCVS - To the Internet and beyond…

Series: OCVS Internet Access

OCVS - To the Internet and beyond...

Okay, so we have our Oracle Cloud VMware Solution (OCVS) up and running, we can connect to the vCenter, NSX Manager etc. and have started to deploy workloads inside the SDDC, that’s awesome! But, now we want to be able to connect to the Internet from those workloads, and, just maybe, connect to them from the Internet. In this post, we’ll look at the steps needed to enable our workload VMs to access the Internet. Then, in the second post in this series, we’ll err… break that, hurriedly fix it (while hoping nobody noticed it was us), and enhance the Internet access to allow us to publish services to the Internet from the SDDC.

Strap in, lift-off can be bumpy!

Introduction

In VMware Cloud on AWS (VMC), the hosted SDDC lives in its own part of the AWS world, and has fairly direct access to the Internet. This makes access to the Internet from VMs in the SDDC pretty straightforward. Unfortunately, it can make accessing native AWS services, more complicated. In contrast, OCVS is a first-class citizen in Oracle Cloud Infrastructure (OCI), and has direct access to all the native services, which is great. Although the flip side of that “great” is that OCVS needs to use the native OCI Internet access services to reach the err… Internet. This makes the task of giving general Internet access to our workload VMs, slightly more complex than in VMC, and the task of delivering inbound Internet connections to a workload VM, even more so. Don’t worry, it’s not difficult when you know the steps, just not super obvious how you go about it.

In this short series, we’ll take a look at native OCI Internet access, then we’ll transfer that into our OCVS SDDC model and finally we’ll look at converting the Internet access to allow inbound connections too.

TL;DR - Just take me to the how-to!

OCI Internet access

OCI offers two types of Internet access, the NAT Gateway and the Internet Gateway. We’ll look at the NAT Gateway first. We’ll illustrate its use with a simple diagram, then call out a couple of important notes. In the diagram below we see an OCI virtual machine on an OCI subnet. Its route to the Internet follows the flow shown.

In order to reach the Internet, we will need:

  1. An Outbound Rule in the Subnet’s associated Security List which permits our flow.
  2. A Route Rule in the Subnet’s associated Route Table matching our Internet destination and pointing to:
  3. An OCI “NAT Gateway” which provides our Internet Access.

If you’re following along at home, your Security Rule will need to match your specific requirements. It could be as simple as allowing any outbound traffic, or as specific as only allowing connections to a single destination.

Subnets currently use Security Lists, which are like perimeter or “Edge” firewalls, so anything already inside the Subnet, between its hosts is allowed. VLANs, which we’ll look at later, use Network Security Groups, which are more like the NSX Distributed Firewall and can filter traffic between hosts controlled by the NSG.

The way the NAT Gateway works is nice and simple. We use a Route Table rule, like the one below, to direct traffic heading for the Internet using the 0.0.0.0/0 destination. We set the Target Type to “NAT Gateway”, and select the NAT Gateway deployed in our SDDC. Here’s an example of a Route Rule like that, in a Route Table, targeting a NAT Gateway (which is helpfully named “NAT Gateway”).

Workloads on any Subnet or VLAN using a Route Table with this rule in will, (subject to the Security List or Network Security Group rules) be able to access the Internet.

Here’s a list of NAT Gateways in one of our test OCVS tenancies, again showing the helpfully confusingly named, “NAT Gateway”. As you can see from the conveniently highlighted area, the NAT Gateway has its own Public IP Address, in this case 132.226.122.232.

Let’s take another look at our flow diagram, but this time, we’ll add the IP addresses to network packets as they whiz from left to right…

In (4) we can see the packet has the VM’s private source address of 10.1.1.1 and the destination public address of 8.8.8.8. As packets pass through the NAT Gateway in (5), it swaps the Source Address (using Source NAT, or SNAT of course) in the packets to that of its own Public IP address, 132.226.122.232. The NAT Gateway keeps track of which private source addresses are talking to which public addresses so that it can reverse this process on packets returning from the Internet.

Okay, now we have all the pieces in place for the simplest of our examples, we’ll move on to the OCVS version of this endeavour.

OCVS SDDC Workloads Internet Access

Let’s start off by revisiting the flow diagram, but this time, we’ll add in the SDDC components. We’ll have a workload VM on an NSX-T Logical Segment which is connected to a Tier-1 Gateway. The Tier-1 Gateway is automatically uplinked to a Tier-0 Gateway, which is connected to OCI’s networking layer through a VLAN. From there, things start to resemble the native flow diagram from earlier in the post. In picture form, it looks like this.

This time, in order to reach the Internet, we will need:

  1. Our workload VM connected to a Tier-1 Segment.
  2. Unless we’re heading to another segment, we will take the T1’s default route to it’s parent Tier-0.
  3. For the Internet, we will take the T0’s default route to the VLAN’s next-hop gateway.

Now, back on familiar territory:

  1. An Outbound Rule in the VLAN’s associated Network Security Group which permits our flow.
  2. A Route Rule in the VLAN’s associated Route Table matching our Internet destination and pointing to:
  3. An OCI “NAT Gateway” which provides our Internet Access.

Earlier in the post it looked like the traffic flowed through the Route Table as if it were an actual “router”. In fact, the Route Table is associated with the source network (the Subnet in the OCI example, the VLAN in this one), and that helps steer the packets across the VCN routing underlay. The dotted line in the diagram above is meant to show that association. (4) - (6) are pretty much the same as we saw in the OCI example, so we’ll skip their screenshots and concentrate on the SDDC steps.

This would be great if it worked - #FAIL!

All good so far, except what we’ve shown in the diagram won’t work. The issue is the OCI NAT Gateway! As we saw earlier, all we need to do is send traffic to it and the NAT gateway does most of the hard work. But, it does rely on seeing that traffic come from a recognizable OCI device, which it identifies by the OCI device interface’s Oracle Cloud IDentifier (OCID).

Unfortunately, SDDC VMs routing through the Tier-0 don’t have an OCID, but as we’ll see shortly, in order to send traffic into any device connected to a VLAN, OCI needs to loan that device an OCID. The T0 is one such device, and, Source NATing SDDC VMs to the Interface address of the T0 (with its loaned OCID), lets us work around the NAT Gateway limitation. So, all we need to do is ensure all traffic from the SDDC heading for the Internet gets SNAT’d behind the T0 as it leaves the SDDC. Let’s go and do that now.

Finding the NSX Source NAT Address

Connecting a VM to a network segment and watching it’s traffic reach the T0 isn’t particularly new, interesting or the point of this post, so let’s dive straight into the NATing stuff at step (3). We’ll need to figure out what IP address to NAT to, and where to put the NAT. The Tier-0 VIP is built by the OCVS Provisioning Service, so we shouldn’t need to check that it’s been built correctly, but we’ll take a look at it on the NSX-T and OCI sides so we can see how the bits join together. First, here’s the NSX-T VIP configuration:

We got here (using the NSX-T Policy Interface but it’s a very similar path in the ‘Manager’ view too) by selecting Networking Tier-0 Gateways Gateway “Tier-0 HA VIP Configuration, and clicking on the “1” link. As you can see in the diagram above, in the left-hand red oval (which will be easier to read if you click on the image for the full-size view), the NSX-T HA VIP for this interface (it’s actually the only external interface on the standard OCVS build) is 10.76.9.130.

As you can see in the right-hand red oval, this Tier-0 has two external interfaces configured as a single HA pair. The two interfaces are called NSX-Edge-Uplink-1 and NSX-Edge-Uplink-2. Confusingly, both of these interfaces connect to an OCI VLAN called “Uplink-1”, and, OCI also has a second (currently unused) VLAN called “Uplink-2”. Try not to confuse these interfaces with the similarly named VLANs!

Okay, dire confusion warnings aside, let’s take a look at the OCI view of the Uplink-1 VLAN:

OCI VLANs are Layer 2 constructs, so, from a Layer 3 IP addressing perspective, OCI doesn’t really need to care what goes on in there (a bit like the rest of the world and Las Vegas), except when it needs to send something into the VLAN (again, like Vegas…). To allow OCI to keep track of a Layer 3 IP address inside the VLAN, we assign an External Access (V)IP. This anchors the IP address of some interface of a device inside the VLAN to an OCID. The rest of OCI can then use the OCID to find that device interface even in an opaque Layer 2 VLAN. Here’s the “VLAN Details” view of the Uplink-1 VLAN:

As you can see (again, more easily if you click on the image), “nsx-edge-up1-vip” has the IP address of 10.76.9.130, just as we saw in the NSX view! Now, doubly sure of the IP address, we can set about building the NSX-T NAT rules.

Configuring the NSX NAT Rules

We’re most of the way there, just a couple of loose ends to tie up. In the OCI example, our traffic came from a native OCI VM whose interfaces already have OCIDs, so the NAT Gateway can easily track who sent what to where. This allows it to return responses to the right originator. Our native vSphere VMs on the other hand, don’t. Not only are they from address space two router/gateways deep inside the SDDC, but they pop out into OCI in a VLAN (like waking up in an Uber on I-15 and seeing the Hoover Dam).

The only IP address OCI knows about in that hot-mess, is the one from the VIP. So, in order to help the NAT Gateway find its way back to the source VM, we have NSX-T NAT the traffic on the Tier-0 as it leaves for OCI, and then OCI does the same as the traffic leaves for the Internet. This might seem a lot of effort compared to other Hyperscalers, but remember, the SDDC is right in the heart of OCI, so the upside is direct access to the native OCI services. Let’s revisit the flow-diagram, but this time, with some packet labels again:

In (7) we can see the packet has the vSphere VM’s private source address of 10.76.124.10 and the destination public address of 8.8.8.8. As the packet passes through the Tier-0 Gateway in it we see the Tier-0 swaps the original source address to its own 10.76.9.130. As packets then pass through the OCI NAT Gateway in (9), it swaps the Source Address (from the dot-130 address of the Tier-0), to that of its own Public IP address, 132.226.122.232. The Tier-0 and OCI NAT Gateways both keep track of which private source addresses are talking to which public addresses, so that they can reverse this process on returning packets.

We know from the OCI example that the NAT Gateway takes care of that NAT for us, but we have to do some configuration work inside NSX-T to enable the Source NAT there. Let’s take a look at that config in the NSX Manager GUI. Here it is, in the Policy UI:

Let’s take a look at the NAT rule(s) we have configured on the NSX-T Tier-0. Choosing Networking NAT, and then selecting our Tier-0 in the Gateway selector we see the rules. We have four in this test environment, but in a production environment there could be more. The rule which we’re most interested in right now is the “SNAT of Last Resort” at the bottom of the list. What we want to do, is make sure that traffic from our SDDC workloads, when heading for the Internet, is Source NAT’d behind our Tier-0 VIP address. The important parts of that rule are:

Name Action Source Destination Translated
SNAT of Last Resort SNAT 10.76.124.0/24 0.0.0.0/0 10.76.9.130
  • The Name and Action fields are hopefully self-explanatory.
  • The Source in our lab example is the 10.76.124.0/24 subnet we have allocated for the workloads we’ll build inside the SDDC. In our example this is only small, as we don’t need much space in this particular lab. In practice, that will most likely require more sources. In an ideal world, we would allocate a single, large subnet to the SDDC which we could pre-arrange the SNAT for, and then consume parts of as we deploy SDDC subnets from within it. In a less ideal world, we would add non-contiguous subnets as we build each workload segment, and need to repeat this rule for each of the discrete source subnets we deploy.
  • The Destination is the catch-all “quad-zero” address which is the CIDR equivalent of “Any”.
  • The Translated field is the address we want to replace the Source with, in this case the T0 VIP Address.

You can’t see the priorities of the rules without clicking the small “>” arrow next to each rule, but this rule should come last. That’s because, with just this rule in place, traffic heading for the Internet will match the quad-zero route, but not just traffic bound for the Internet, all traffic leaving the SDDC will be affected, and Source NAT’d!

This might be what we want, but usually isn’t. Normally, we want to NAT Internet traffic, and Route on-net (the VCN, the rest of OCI and to all our on-prem locations) traffic. In order to do this, we need to tell NSX not to SNAT traffic to those destinations, and we need to do that before it gets ahead of itself and slaps down the Internet SNAT. This is where the additional rules in the NAT table screenshot come in, and in particular, their priorities.

In this lab, we have three “No_SNAT” rules, one for each subnet in RFC1918. This tells NSX not to SNAT traffic from our SDDC when it is heading towards private addresses in any of the RFC1918 space, typically what we find in customer networks. It is of course possible to find a customer using non-1918 addresses, and if that were the case, we’d just duplicate more No_SNAT rules with those destination addresses in, in addition to, or instead of, these three. To save squinting at the picture, here’s those four rules in table form.

Name Action Source Destination Translated
Don’t NAT 1918-172.16/12 No SNAT 10.76.124.0/24 172.16.0.0/12 Any
Don’t NAT 1918-192.168/16 No SNAT 10.76.124.0/24 192.168.0.0/16 Any
Don’t NAT 1918-10/8 No SNAT 10.76.124.0/24 10.0.0.0/8 Any
SNAT of Last Resort SNAT 10.76.124.0/24 0.0.0.0/0 10.76.9.130

The only differences between these and the SNAT rule are the Action of course, and the Any in the Translated field. The latter just means we’re not actually translating, and Any original destination will make it through.

If we use any OCI native services, we can take advantage of the Wizard in the SDDC Details page (below) to set up the Service Gateway.

If we gave OCI a workload subnet to build inside NSX when we deployed the SDDC, the NSX NAT table in the Manager UI will already have lots of No_SNAT rules in. These tell NSX not to SNAT when our SDDC addresses try to connect to the OCI public addresses for those native Services. This is important, because, although those services have what look like public, Internet address, traffic does not use the Internet to reach them, and OCI wants to see the original source addresses of the SDDC VMs so that it can return traffic to them. Oracle use public address space for the services to (try and) avoid clashes with customer addresses.

NSX NAT Rule Priorities in the Policy and Manager UIs

NAT rules in NSX-T have a numeric priority, with lower numbers having a higher priority (so being actioned first). As we’ve seen, having the SNAT of Last Resort… well… last, is key. If all our rules are created in the Policy UI, we just make sure that the priority of the Last Resort one is the highest number (and hence, lowest priority) of all the rules.

However, if we have rules created in the Manager UI (or via its API equivalent like the Service Gateway rules) they have separate priority numbers. NSX combines the two views by adding 1024 to the Priority value of the Policy UI rules. This allows any rule created in the Manager UI with a priority of less than 1024 to take precedence over the rules in the Policy UI. We can create our rules in either UI, but we should ensure that the SNAT of Last Resort comes last, or we will see unintended and unwanted consequences.

More detail than that is probably beyond the scope of this post, but checking the NAT view in both Policy and Manager UI on the SDDC NSX Manager should help explain things.

The TL;DR Summary

To summarize or, if you were impatient and jumped here for the short version, here it is.

  • Make sure the Route Table associated with the Uplink-1 VLAN has a 0.0.0.0/0 Route Rule pointing to a NAT Gateway in your VCN.
  • Work out the IP address assigned to the NSX Tier-0 HA VIP.
  • At the bottom of the NSX NAT Rules, after everything else, add No_SNAT rules matching your OCVS SDDC addresses, with a destination of either your own private addresses, or, as a catch-all, the three reserved ranges from RFC1918.
  • At the very bottom of the NSX NAT Rules, even lower that the three No_SNAT rules above, add a SNAT rule matching your OCVS SDDC source addresses, with a destination of 0.0.0.0/0 and a Translated Address of the T0 VIP above.
  • Check the Network Security Group associated with your Uplink-1 VLAN to ensure it will allow your required Internet destinations (which could of course be “Any”).
  • Check the firewall rules within NSX allow your required Internet access too.

And you should be good to go. If you have any questions about that, it might be worth jumping back to the top and reading the rest of this post.

Conclusions

Over the last quarter of an hour or so, we saw how native OCI VMs can access the Internet through a NAT Gateway. We saw that the SDDC as a whole, follows the same steps to reach the Internet. We also saw that to enable the SDDC’s workload VMs to use the NAT Gateway, we had to hide their real addresses behind the Tier-0 Gateway’s HA VIP address. As always, Routing and Security played their part, but once we got all the bits lined up, using the NAT Gateway from the SDDC was pretty straight forward even if we had to be careful with the order of the NSX NAT rules.

That’s great, but, what if as well as having our VMs go out to the Internet, we need to offer services to the Internet which means unsolicited inbound connections? How does that work? We’ll expand on what we learned here and cover that in the second post in the series. But until then, if you are, thanks for reading this far, and go get yourself a well deserved drink! - I’d maybe go for hot-chocolate and a lie down in a dark room…

Closing Comments

I’d like to thank Jason McKenzie for his help with this post. The improved readability of some key sections are thanks to Jason’s feedback whereas the mistakes and typos, as always, are mine…

If you want to grab a copy of Jason’s awesome Getting Started with OCVS Ebook, head over to this post on the VMware Cloud Blog!

If you have any questions about the content of this post, or about Oracle Cloud VMware Solution, please drop them in the comments below.

Update: Dec 7th 2021
Cross-posted, with permission, to the VMware Communities blog here - (although I think it’s more colo(u)rful here on NT.B).

Feel free to share this post...

See also

comments powered by Disqus