Multi-Tenant WAN Access to Shared Resources: Part 1 - The Problem

A number of Cloud Services require the transfer of large volumes of data. In the Cloud Provider world, that could be uploading VMs in the form of OVA files, ISO disk images, or sending backup data to a DRaaS service. Customers can connect their on-prem SDDCs to the Cloud Provider with only an Internet connection, and make use of these great services. How easy is that! In locations where the Internet is readily available, fast and reliable, this is great. But what about locations where that’s not the case? Well, that’s where the trusty WAN steps in. Using a Communication Provider’s services, customers can get direct network links to their Cloud Provider’s datacenters which, while more costly, may offer the speed and reliability which local Internet services lack.

Excellent, problem solved! We’ll just connect our services to our customers’ WANs and go back to watching Netflix, right? Hang on, surely it can’t be that simple? Of course not, for one thing, if the Internet is poor, how will we watch Netflix????

In this post, we’ll explore the nature of the problem in connecting a service designed to face the Internet, to the Wide Area Networks of multiple customers. In the second post in the series we’ll explore a possible solution. Before we start, let’s clear up a couple of points. First, I used “WAN” in a fairly general way. For the purposes of this post it doesn’t matter if the WAN is a L3 MPLS/IP-VPN, L2 Ethernet Private Line or even a legacy Frame Relay or X.25 network. Actually, a Frame Relay or X.25 network may present their own challenges, but, I digress. Secondly, by direct I meant that a customer’s on-prem network is extended into a Cloud Provider’s facility in such a way that the Provider’s DC looks like a site on the customer’s WAN. The fact that short of dark fiber/fibre between locations, the network links are unlikely to actually be direct isn’t the point in this context.

Okay, pedantic explanations aside on with the post…

If you don’t need all the background to the issue, you can jump straight past the tl;dr intro, to the solution by clicking here.

Introduction - The good, the bad and the ugly.. complex

From a security point of view, services or portals, API gateways etc. do have a tough time on the Internet. But, what’s good about the Internet is that everybody on it has to use unique IP addresses. Okay, you can spoof a source address as part of a DoS attack, but to use the Internet you have to play by the rules and use an IP address (or addresses) that nobody else is using. If you’re reading this post I’m guessing you know, that other than a few, typically large, and typically long established companies, most organizations don’t use public Internet addresses inside their networks. Instead, they all use the same set of addresses reserved for use inside private networks and detailed in rfc1918.

As a slight aside, not all organizations do. Some, often unknowingly, use public Internet addresses which they think are private addresses. That’s another story, and one which we can’t fix in a blog post… sadly…

And there, in a nutshell is the problem. If you place your service on the Internet and get a connection from a user or another computer making an API call, it will come from a unique public address which you can then answer, and reply to, without difficulty. But, let’s say we also want to connect that service, maybe it’s some kind of portal, to a customer over their private network too. This presents a couple of problems and we’ll look at them from the perspective of the Customer and then the Service Provider.

The Customer is always right…

As the customer, I want to connect over my WAN. That means I need to connect to an IP address that works on my network, even though the service lives in my Provider’s network. If I use its public Internet address, my network will send the connection, well… over the Internet, and that kind of defeats the object. Here’s what I want. Don’t worry, I’m sure this diagram will fill up as we go…

I might be able use the same private address that the provider uses inside their network, but there is a good chance they might be using addresses that are already in use on my network so that might not always work… But let’s say the address doesn’t clash. Now I need to work with the Provider to teach my WAN that those particular addresses need to be routed to their network. We call this injecting or learning routes, and that means I’m going to need access to some Networking skills (and a good Project Manager) to get things aligned, just so. In the diagram you can see that I use 10.1.x.x, 10.2.x.x and 10.3.x.x in my network and the Provider uses 10.4.x.x. So, all I have to do is teach my network where 10.4.x.x lives and we’re in business!

But, even if I can use the provider’s real address, the Provider, as we’ll see next, might still have some challenges with that.

The Service Provider is bigger, so should win in a brawl…

As the Provider, we will have deployed the service before trying to connect it to any customers, and as Public addresses are in short supply, it’s likely that the service will have been configured with those same RFC1918 addresses that we mentioned earlier, let’s say ‘10.1.x.x’. So when we try and offer the service over a customer’s network, they just have to connect to the addresses we used.

To do this, those networking skills (and that good Project Manager) come in to play to add routes to the customer’s WAN to tell it to send traffic destined for ‘10.1.x.x’ to a WAN router in our Provider datacenter, where it breaks out into the network and heads for the service portal, or API endpoint or whatever.

The first problem comes (as we noted in the last section) if the customer is already using those same ‘10.1.x.x’ addresses somewhere in their network as you can see in the image below. We’ll look at solutions in the next post, but I guess for now it just serves us right for not using something more obscure like ‘10.113.x.x’! In simple terms, when this happens each network will route traffic to the nearest ‘10.1.x.x’ addresses as the have no way to know which ‘10.1.x.x’ addresses we meant them to route to. When that happens, the Provider’s service just seems like it’s not working, when in actual fact, the issue is that the customer is sending their requests to their ‘10.1.x.x’ addresses and not ours.

Let’s say though, that for now, we got lucky and the customer isn’t using our addresses anywhere (we’ll use the ‘10.4.x.x’ addresses from the first example), and can get to work using their networking skills (and that good PM) to route their traffic to our service. When they do connect to our service, it would only be polite for us to reply to them, right. So we just send our reply back to the address they connected from, let’s say Alice in Accounts at “Customer A”. Alice’s computer has an IP address of ‘10.100.100.1’ so we try and send our reply back to that address. But, unfortunately, the computer used by Peter in Personnel (HR doesn’t start with a “P”) in the Provider also has an address of ‘10.100.100.1’ so our network (the purple “router” in the diagram below) sends the reply we wanted to go back to Alice, to Peter instead.

So problem two is that if the customer connects to the provider from any address on their network which is in use anywhere in our Provider network, the service can’t reply so the whole thing fails. We better add that to the list of things to fix!

Again, for now, let’s pretend we don’t use Alice’s ‘10.100.100.1’ address in our Provider network, we then need to use our networking skills (and another good PM) to make sure we can route traffic from our service platform to Alice’s address, back to the “Customer A” router so it can find it’s way back to Alice. Here’s what that looks like in a picture, where our Provider network knows that 10.1.x.x. 10.2.x.x, 10.3.x.x and 10.100.x.x all live behind the blue Customer-A router in our datacenter.

So, as long as our customer doesn’t use an address we are using, it’s okay. Woohoo!

Now, if the service is successful, we’ll hopefully have more than one customer (which in the Provider world is known as “a good thing”), so let’s introduce Bill from Billing (where else!) in “Customer B”. He uses his shiny new computer to connect to the Provider’s portal to check something and (miraculously) his company doesn’t use, so can also route to, those ‘10.4.x.x’ addresses that the Provider uses. Great, obviously we want to reply to Bill (as we’re nothing if not polite) so we carefully craft our reply and send it off to his ‘10.100.100.1’ address. Hang on, that address seems familiar, right? It’s the same one as Alice was using earlier. Oh no! Things were going well until this point. Now, how will our awesome service know which ‘10.100.100.1’ address it should reply to?

Examining this problem more closely, we can extend our earlier problem two statement to “if the customer connects to the provider from any address on their network which is in use anywhere in our Provider network or, anywhere in any of our (connected) customers’ networks, the service can’t reply so the whole thing fails.

Summary

Connecting a Provider service, portal, API gateway etc. to the Internet is dangerous, but fairly straightforward. Connecting that same service to one or more Customer Wide Area Networks is not as dangerous because we know who our customers are, and, have a business relationship with them. That means they’re less likely (unless their network has been attacked or compromised ) to attack our network. But, the simple fact that most of our customers and, potentially, our own networks, all use the same bunch of IP addresses makes the picture below much more difficult to actually build than this (heavily simplified) drawing suggests.

Okay, with the problems faced by both the Customer/Tenant and the Provider highlighted, in Part 2 of this series we’ll look at the building blocks of a possible solution, and how they address each of the issues we identified in this post. See you there!

Posts in this series:

Feel free to share this post...

Introduction - The good, the bad and the ugly.. complex

The Customer is always right…

The Service Provider is bigger, so should win in a brawl…

Summary

See also