VMs Are Unable to Connect in the NSX Environment is an article that shows a simple workflow to debug a VM connection issue.
In this article, for instance, we are considering both VMs placed in the same NSX segment (VM A and VM B) but running on different ESXi hosts (different Host Transport Nodes). The NSX segment name is “NSX Web Segment”.
In the following picture, we have a workflow that can be used to troubleshoot a communication failure between both VMs:
Step 1: Are the virtual machines configured correctly?
To begin with, a good starting point is to check the Guest OS network configuration. Access the Guest OS VM and check the network configuration (check IP, netmask, default gateway, and configured DNS servers). For example:
Additionally, check the assigned port group to confirm if the VM is using the right NSX segment:
Tip: Check those details in both VMs (the source and the destination VM).
Step 2: Is the segment configured correctly?
Each NSX segment has some profiles. Are there any custom segment profiles applied to your profile?
This is a thought-provoking question and needs to be checked. Some profiles such as “Spoof Guard” or “Segment Security” can block the VM communication for some reason. If you have a custom profile, it is a good idea to review each one.
Additionally, access the NSX Manager Appliance and check if the NSX segment is “up” (we can administratively disable an NSX segment). So, here, check if the NSX segment is “up”:
Tip: If you are using some custom segment profile, change it temporarily to a default segment profile just for troubleshooting purposes.
Step 3: Can the source and destination TEPs communicate?
Each host transport node has dedicated vmnics for the NSX environment. It’s interesting to check if each interface is “up”. Check it on both ESXi hosts (the source and the destination ESXi host).
We can use the command “net-dvs” to show what uplink interfaces have been used by NSX and use the esxcli command to check the status of each vmnic interface, as we can see in the following example:
net-dvs | grep -i nsxUplink
esxcli network nic list
The “esxtop” command can be used to check what vmnic interface is the active interface for NSX traffic. After executing the esxtop, press “n” to access the network view. Look in the column “USED-BY” for the interfaces “vmk10” and “vdr-vdrPort” (both interfaces are used by the NSX). In this case, for instance, both interfaces are using the vmnic4 physical interface:
If vmnics for NSX is “up”, check if the vmkernel interface (vmk10) has an IP address in your TEP network. Try to ping each ESXi host using this vmkernel interface and using the configured MTU size:
esxcfg-vmknic -l
To ping another TEP interface from the source ESXi host, the following command can be used (-s 1572 is the minimum MTU size required for NSX works):
vmkping -S vxlan -s 1572 -d <DESTINATION_TEP_IP>
Tip: Perform those tests in both ESXi hosts (the source and the destination ESXi hosts).
Step 4: Perform packet captures
Furthermore, inside the NSX Manager Appliance UI, there are powerful tools to perform some tests.
The first one is the Live Traffic Analysis. This tool monitors live traffic at a source or between a source and destination along with packet capture.
The second is the Traceflow. This tool injects packets into the network and monitors their flow across the network, allowing you to identify issues or disruptions:
Another powerful way is to capture the traffic with the tcpdump command line. Here, we can capture the sample of the traffic in different places through the infrastructure between the source and the destination VM.
To capture the traffic leaving the vNIC of the source VM (sa-web-01), for instance, the following command can be used on the source ESXi host (the ESXi host that runs the source VM).
1- Get the client VM name (in this case, for instance, the VM name is sa-web-01):
nsxcli -c get ports | egrep -i "Client|sa-web-01"
2- Executing the following command to capture the ICMP packets leaving the VM vNIC interface:
nsxcli
start capture interface sa-web-01.eth0 direction input expression ipproto 0x01
Where:
— sa-web-01.eth0 = Client interface name. Change this name to your client interface name.
Do the same thing for the destination VM on the destination ESXi host.