Azure Site to Site VPN Fails to Transmit Data–Tales of NAT Traversal

Azure Site to Site VPN Fails to Transmit Data–Tales of NAT Traversal

  • Comments 6
  • Likes

imageYuri Diogenes and I have recently been working on a project that includes connecting an on premiseCloud and Datacenter Solutions Hubs network to a Windows Azure Virtual Network. In order to do this, you need to establish a site to site VPN connection between your corporate network and the Azure Virtual Network. To do this in a supported fashion, you need to use an approved VPN device and have a public IP address. But you know how it is when you’re trying to test things – you don’t always follow support statements and you try a few workarounds so that you can “kick the tire’s” of the new product and technology.

And so that’s what we tried to do. Yuri used a TMG firewall located behind a NAT device on the on premises network to connect to our Azure Virtual Network. While TMG is not a supported network device, Richard Hicks demonstrated that you can create a site to site connection to an Azure Virtual Network using TMG. You can find that article HERE. Again, this configuration is not supported in the customer preview and it won’t be supported in the future. But, given that fact that we don’t own a supported device, we work with what we have.

In this article Yuri will show that indeed the site to site VPN is established and includes log file entries to highlight important issues. However, its makes more than just a tunnel that is up to pass data. Take a read and see what you think! Thanks! –Tom.image


Recently I was working on a document where I had to build a lab in order to validate a series of assumptions. This lab required cross-premises connectivity with Windows Azure, in other words: allowing resources that were located on-premises to access virtual machines located on Windows Azure and vice-versa.

For testing purpose (since it is not supported by Windows Azure) I used Forefront TMG as my VPN gateway, this was easily accomplished by using this great article written by my friend Richard Hicks. All good, VPN site to site established and my Windows Azure portal was showing this result:

image

The gateway connectivity was established as shown above, however I noticed some weird behavior that consisted of some KB of data in and nothing out. At first glance, I didn’t realize that this could be a problem, however once I started to test the resources (a simple ping from a VM located on Azure to the ProdDC1 located on-premise) I received a timeout. Odd…..weird…what’s going on? Luckily I was using Windows Server 2008 SP2 on TMG and I was able to enable IKE Logging using a procedure that I documented long time ago on this post. The result is shown below (consider XXX.XXX.XXX.XXX the valid IP of my router – which was doing NAT-T to my TMG):

[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|Received packet
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|Local Address: 192.168.1.160.4500 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|Peer Address: XXX.XXX.XXX.XXX.4500 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|iCookie 5f4f98ebb5fc8fb5 rCookie 4fd35b13948ab70b
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Exchange type: IKE Quick Mode Length 268 NextPayload HASH Flags 1 Messid 0x00000031
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|mmSa: 0x00000000029BB8B0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Create QMSA: qmSA 0000000004050150 messId 31
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Processing QM.  MM 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Process Payload HASH, SA 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Process Payload ID, SA 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Process Payload ID, SA 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Process Payload SA, SA 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|QM propNum 1, transformNum 0, peerSpi 2308443503
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|QM transNum 1
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|PROTO: ESP Algo 12
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_ENCAPSULATION_MODE: 3
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_KEY_LENGTH: 128
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_HMAC_ALG: 2
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_TYPE: 1
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_DUR: 3600
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_TYPE: 2
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_DUR: 102400000
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|QM propNum 2, transformNum 0, peerSpi 2308443503
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|QM transNum 1
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|PROTO: ESP Algo 3
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_ENCAPSULATION_MODE: 3
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_HMAC_ALG: 2
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_TYPE: 1
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_DUR: 3600
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_TYPE: 2
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IPSEC_LIFE_DUR: 102400000
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IsRecvPolicyTunnelPolicy: TRUE
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Looking up QM policy for IKE
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|QM localAddr : 10.0.0.0.0 Mask 255.255.255.0 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|QM peerAddr : 172.16.0.0.0 Mask 255.255.0.0 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Policy
GUID: {b476013b-cc93-4a45-86de-3649e39c5ec0}
LUID: 0x8000000000000029
Name: ISA VPN S2S tunnel to network Fabrikam Cloud
Description: (null)
Flags: 0x00000000
Provider: <unspecified>
Provider data:
Type: IKE Quick Mode Tunnel
Proposals: 1
-- 0 --
  Lifetime:
    Seconds: 3600
    Kilobytes: 102400000
    Packets: 2147483647
  PFS group: None
  SA transforms: 1
  -- 0 --
    Type: ESP-Auth & Cipher
      Auth transform:
        Type: SHA1
        Config: HMAC-SHA1-96
        Crypto module: <unspecified>
      Cipher transform:
        Type: AES-128
        Config: CBC-AES-128
        Crypto module: <unspecified>
Flags: 0x00000000
Local tunnelEndpoint: 192.168.1.160
Remote tunnelEndpoint: XXX.XXX.XXX.XXX
Normal idle timeout (seconds): 300
Idle timeout in case of failover (seconds): 60

[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Accepted proposal.  Prop: 1 trans: 1
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Created new QM SA context              217
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|GetSpi
SA context              217
Local address: 192.168.1.160
Remote address: XXX.XXX.XXX.XXX
Mode: Tunnel Mode
Filter ID: 0x8000000000000029
Remote Port: 0x0000
UDP Encapsulation:
  Local port: 4500
  Remote port: 4500

[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Got SPI from BFE 1296515672
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Local address : 10.0.0.0.0 Mask 255.255.255.0 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Peer address : 172.16.0.0.0 Mask 255.255.0.0 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Process Payload NONCE, SA 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct IKEHeader
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct HASH
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct SA
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct NONCE
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct ID
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct ID
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Sending Packet
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|iCookie 5f4f98ebb5fc8fb5 rCookie 4fd35b13948ab70b
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Exchange type: IKE Quick Mode Length 220 NextPayload HASH Flags 3 Messid 0x00000031
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Local Address: 192.168.1.160.4500 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Peer Address: XXX.XXX.XXX.XXX.4500 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|IF-Index: 10
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Created new TimerContext 0000000004054840, type 6
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|Received packet
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|Local Address: 192.168.1.160.4500 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                0|XXX.XXX.XXX.XXX|Peer Address: XXX.XXX.XXX.XXX.4500 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|iCookie 5f4f98ebb5fc8fb5 rCookie 4fd35b13948ab70b
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Exchange type: IKE Quick Mode Length 60 NextPayload HASH Flags 3 Messid 0x00000031
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|mmSa: 0x00000000029BB8B0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Processing QM.  MM 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Process Payload HASH, SA 00000000029BB8B0 QM 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct IKEHeader
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct HASH
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct CONNECTED NOTIFY
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Construct NOTIFY
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Adding inbound SA. mmSa 00000000029BB8B0 qmSa 0000000004050150
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Local Address : 10.0.0.0.0 Mask 255.255.255.0 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|Peer Address : 172.16.0.0.0 Mask 255.255.0.0 Protocol 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|AddImpersonateHash 00000000040522F0 entryCount 2 isImpersonate 0
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|SA context              217
[0]0100.0654::00/00/0000-00:00:00.000 [ikeext]                a|XXX.XXX.XXX.XXX|SA bundle

As you can see all required parameters were correct, no error and no problem during the negotiation. After researching on the web for similar problem I found this thread, which has the following requirements (as per Steve Espinosa - from Microsoft Texas) for a network device to work with Windows Azure Virtual Network:

  • VPN device must have a public facing IPv4 address
  • VPN device must support IKEv1
    • Diffie-Hellman in "Group 2" mode
    • Perfect Forward Secrecy = Disabled
  • VPN device must be able to establish IPsec Security Associations in Tunnel
    mode
  • VPN device must be configurable for an MSS of 1350 for the tunnel
  • VPN device must support NAT-T
  • VPN device must support these encryption protocols:
    • AES 128-bit encryption function
    • SHA-1 hashing function
  • VPN device must fragment packets before encapsulating with the VPN
    headers
  • VPN device must support a 50 character pre-shared key. While a shorter or
    longer key can be programmatically created, this functionality is not currently
    exposed in the Windows Azure Portal.
  • For IKE phase 1 negotiation, set validity to 28800 seconds.
  • For IKE phase 2 negotiation, set SA lifetime to 3600 seconds or 102400000 kb (~100GB), whichever comes first

The reason why I highlighted this item is because this is what I didn’t have it. Everything else was correct. Lesson learned: your VPN device MUST have a public facing IPv4 address otherwise the site to site VPN connection won’t work (although you might think it is working if you just look to the Azure Portal).

Yuri Diogenes
Senior Technical Writer
Server and Cloud Division Solutions Team


There you have it! The problem was related to the fact that we were trying to put the on premises VPN gateway behind a NAT device. Apparently you can’t do that – which is interesting because one of the requirements is that the on premises VPN gateway needs to support NAT traversal. That is required because the Azure Virtual Network is behind a NAT device. But apparently, the NAT-T support is not bidirectional.

Question: Will this be an issue for you? Will it affect your ability to test the site to site connection between your on premises testbed and the Azure Virtual Network? Or are you awash with public IP addresses that you can assign for testing? Let us know – we’d like to get a clue for what the demand might be for bidirectional NAT-T.

HTH,

Tom

Tom Shinder
tomsh@microsoft.com
Principal Knowledge Engineer, SCD iX Solutions Group
Follow me on Twitter: http://twitter.com/tshinder
Facebook:
http://www.facebook.com/tshinder
image


Go Social with Building Clouds!
Building Clouds blog
Private Cloud Architecture Facebook page
Private Cloud Architecture Twitter account
Building Clouds Twitter account
Private Cloud Architecture LinkedIn Group
Cloud TechNet forums
TechNet Cloud and Datacenter Solutions Site
Cloud and Datacenter Solutions on the TechNet Wiki

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • What is the packet overhead like with TMG site-to-site VPNs?  Excessive overhead can cause some serious problems that are intermittent and difficult to trouble shoot.  Many applications have varying size packets and when some of those packets don't get through because of a maximum MTU that's too small, it's hard to figure out what went wrong.

    This is why I've always stuck with generic IPSEC tunnels which offer the lowest packet overhead.

  • Not having support for a vpn "device" behind NAT does make it a bit more challenging for testing, but it's not insurmountable.  Supporting NAT would make it much easier to build several labs by not requiring multiple public IP's.  That said, I think most environments already have a public facing device that is capable of IPsec site-to-site VPN and the problem is more around the support statement of what devices can be used.

    As you (or Richard) pointed out, other products like TMG are capable of establishing the connection, but without official support, it's a hard sell to replace a corporate firewall in order to stand up a proof of concept connection to Azure.

  • Hi Shannon,

    thanks for the feedback! So from your experience, there's no problem with requiring a public IP address for the VPN gateway. Do you think it will make it difficult for those who are interested in a POC for early testing, or is a trivial to get this test bed setup with the networking groups?

    thanks!

    Tom

  • In most cases it is not Trivial for someone to get access to a Public IP in order to stand up a POC, so it does make it more difficult for would be users.  If it were possible to support bi-directional NAT-T I think it would help drive more testing and adoption.

    Speaking of NAT-T, I was able to use Windows Server 2012 RRAS to establish the Site-to-Azure VPN by flipping a Registry key that enables NAT-T.  I still needed a public IP of course, but I arranged for that long ago in order to do DirectAccess testing.  Lucky me!

    blog.concurrency.com/.../site-to-azure-vpn-using-windows-server-2012-rras

  • Hi Shannon,

    Thanks! Let me take a look at the blog post!

    Tom

  • Hi, is there a chance to chnage the MTU size of the Azure to accept more then 1350(lets say 1500 including ESP Header),?