NSX DLR – Canceled changes
Affected Versions: 6.3.6, 6.4.1
Symptoms
The GUI shows either the following errors or symptoms:
The Status of the DLR’s is missing.
Something went wrong with the underlying resource. Please contact your Administrator.
DLR Deployment Status is missing.
Research
Syslog outputs
CLUSTERNAME vmkernel: cpu13:1756676)WARNING: vdrb: VdrCloseConnection:497: SYS:Could not aquire portset for port 33554440 status: Not found CLUSTERNAME vmkernel: cpu2:35416)WARNING: vdrb: VdrProcessRouteUpdateMessageDeleteRoutes:382: CP:[I:0xbad0003] Route Delete: Unable to Delete prefix = 0x801fea9, prefixLength = 30 status: Unknown status CLUSTERNAME vmkernel: cpu47:45037)WARNING: vdrb: VdrCPProcessVdrInstanceMsg:2570: CP:Instance Create: Failed for [I:0x186a0] status: Address already in use
NSX-Manager – Power-NSX Cluster Health Status
=> The Host relates to CLUSTERNAME Cluster and the Cluster looks fine O-------------------------------------------------------------------------------O Name HAEnabled HAFailover DrsEnabled DrsAutomationLevel Level ---- --------- ---------- ---------- ------------------ CLUSTERNAME True 1 True FullyAutomated featureId status --------- ------ com.vmware.vshield.vsm.nwfabric.hostPrep GREEN com.vmware.vshield.firewall GREEN com.vmware.vshield.vsm.vxlan GREEN com.vmware.vshield.vsm.messagingInfra GREEN O-------------------------------------------------------------------------------O
NSX-Manager CLI – Edge Health Status
HOSTNAME> show edge all NOTE: CLI commands for Edge ServiceGateway(ESG) start with 'show edge' CLI commands for Distributed Logical Router(DLR) Control VM start with 'show edge' CLI commands for Distributed Logical Router(DLR) start with 'show logical-router' Edges with version >= 6.2 support Central CLI and are listed here Legend: Edge Size: Compact - C, Large - L, X-Large - X, Quad-Large - Q Edge ID Name Size Version Status edge-1 ESG-1 L 6.3.6 GREEN edge-2 ESG-2 L 6.3.6 GREEN edge-3 ESG-3 Q 6.3.6 GREEN edge-4 DLR-1 C 6.3.6 YELLOW edge-5 DLR-2 C 6.3.6 YELLOW edge-6 ESG-4 Q 6.3.6 GREEN edge-7 ESG-5 Q 6.3.6 GREEN edge-9 ESG-6 C 6.3.6 GREEN
ESXi Host – netCP Agent monitor restart
During the restart the CLI outputs the following notification:
[root@Host:~] etc/init.d/netcpad status netCP agent service is running [root@Host:~] etc/init.d/netcpad restart watchdog-netcpaMonitor: Terminating watchdog process with PID 35422 netCP agent service monitor is stopped watchdog-netcpa: Terminating watchdog process with PID 35391 Failed to release memory reservation for netcpa Failed to release memory reservation for netcpa Failed to release memory reservation for netcpa Failed to release memory reservation for netcpa Failed to release memory reservation for netcpa netCP agent service is stopped Memory reservation set for netcpa Reload security domains netCP agent service starts netCP agent service monitor is started [root@Host:~] etc/init.d/netcpad status netCP agent service is running
Check NSX Edge Storage
To check the NSX Storage you need the VMware Support Login.
DCA-ESG01>show system storage Filesystem Size Used Avail Use% Mounted on /dev/root 444M 366M 55M 88% / tmpfs 497M 496M 1M 99% /run /dev/sda2 43M 2.2M 38M 6% /var/db /dev/sda3 27M 413K 25M 2% /var/dumpfiles /dev/sda4 32M 1.1M 29M 4% /var/log
Solution
The following article match the given circumstance: KB57003
„tmpfs partition on DLR’s and Edges in NSX 6.3.6 and NSX 6.4.1, configured with HA can get full, preventing any configuration changes (57003)“
=> NSX Edges in HA-Mode from Version 6.3.6 to 6.4.1 are affected
=> Workaround The Active Edge VM need to be rebooted to get them back into a working state.
=> My experience is to resync the Edge increase the chance to get the Edge back.
=> This issue is resolved in 6.4.2