NetScaler High Availability – Sync FAILED

Sometimes sh!t happens! ¬†Today I’d like show you how to troubleshoot a NetScaler High Availability (HA) Sync Failure issues. Resolution – right at the bottom of the article ūüôā

Issue: On the NetScaler GUI | System | High Availability | “Synchronization State FAILED”

There could be more than one reason as to why this happens…

  1. Layer 2 Problems (MAC Moves, or Loops) – A server guy hooking up the appliance to switches with 2 cables in a redundant way without turning ON Link Aggregation (ether channels).
  2. RPC Node Passwords mismatch – ¬†NetScaler uses RPCNode passwords for system-to-system communication. This doesn’t need to be same as your NSROOT password. You could also use a system generated one.
  3. Tagging NSVLAN and forgetting to do the same on Switch Side – this will lead to drop of Heart Beat packets!!
  4. NetScaler HA Synchronization Service isn’t running (NSNETSVC)
  5. HA Ports blocked РUDP Ports 3003, Non-Secure TCP Ports 3010 or Secure TCP Port 3008 Р Follow Citrix article for more info http://support.citrix.com/article/CTX109687.  Communication initiates from NSIP not SNIP for HA Packets.
  6. Appliance Firmware and Model Mismatch!

Troubleshooting (Some common culprits)

NetScaler HA Synchronization Service isn’t running (NSNETSVC)?

Here i’m looking at the running process (or daemons) for nsnetsvc and as you can see it returns a successful value.

 root@ns# ps auwx | grep -i nsnetsvc
root       1083  0.0  0.3 63068 26316  ??  Ss    9Feb15 830:29.25 /netscaler/nsn           etsvc -S -C
root      13897  0.0  0.0  9096  1136   0  S+    1:18PM   0:00.00 grep -i nsnets           vc

Are the TCP/UDP Ports blocked? Use NSTCPDUMP

I’m running a NSTCPDUMP to the secondary HA node, to confirm if the ports can be reached. You could telnet as well (since NSIP is used for HA; ¬†Remember SNIP is used for back-end services)

root@ns# nstcpdump.sh -c 8 -nn host 10.99.99.17
reading from file -, link-type EN10MB (Ethernet)

13:42:56.053129 IP 10.99.99.16.3003 > 10.99.99.17.3003: UDP, length 272
13:42:56.053130 IP 10.99.99.16.3003 > 10.99.99.17.3003: UDP, length 272
13:42:56.053130 IP 10.99.99.16.3003 > 10.99.99.17.3003: UDP, length 272
13:42:56.112268 IP 10.99.99.17.3003 > 10.99.99.16.3003: UDP, length 272
13:42:56.112299 IP 10.99.99.17.3003 > 10.99.99.16.3003: UDP, length 272
13:42:56.112328 IP 10.99.99.17.3003 > 10.99.99.16.3003: UDP, length 272
13:42:56.253119 IP 10.99.99.16.3003 > 10.99.99.17.3003: UDP, length 272
13:42:56.253119 IP 10.99.99.16.3003 > 10.99.99.17.3003: UDP, length 272

Note: My tcpdump filters are -C (how many packets), -nn (display all port and Ip addresses in numerical form) and host (to specify a Destination Node i.e. Secondary HA Node). You can also combine “host XX.XX.XX.XX and port 3003”

As you can see from the dump, my nodes are able to communicate successfully using UDP 3003¬† and Telnet. Then what else could be the problem as my appliances are on the same firmware, no L2 Loops (LA channels exist), default NSVlAN etc… Could it be RPCNode Password??

 

Cause: RPCNode Password was invalid

Now, I drop into the shell and ¬†view the auth.log’s ¬†(cat /var/log/auth.log)

Apr 20 12:37:26  ns sshd[11510]: Accepted password for #nsinternal# from 10.99.99.17 port 18412 ssh2
Apr 20 12:37:26  ns sshd[11511]: Failed password for #nsinternal# from 10.99.99.17 port 37456 ssh2
Apr 20 12:37:26  ns sshd[11511]: Accepted password for #nsinternal# from 10.99.99.17 port 37456 ssh2
Apr 20 12:37:26  ns sshd[11510]: Received disconnect from 10.99.99.17: 11: disconnected by user
Apr 20 12:37:26  ns sshd[11511]: Received disconnect from 10.99.99.17: 11: disconnected by user
Apr 20 12:37:32  ns sshd[11519]: error: Invalid username or password
Apr 20 12:37:32  ns sshd[11520]: error: Invalid username or password
Apr 20 12:37:40  ns sshd[11531]: error: Invalid username or password
Apr 20 12:37:40  ns sshd[11532]: error: Invalid username or password

 

Resolution: Reset the RPCNode Password.

From NetScaler GUI | System | Network | RPC Р right-click on your NSIP (primary node) and type in a Password (or system can auto-generate)

Type in the same password for Remote HA node IP (on the primary appliance itself). Save configuration. Verify the HA Status (Synchronisation State should be SUCCESS)

Update 20/04/2015 –¬†I’d to logon to Secondary Appliance as well and reset the RPC password as above (both NSIP and Remote Node IP ONLY)

 

Hope this helps in your journey of getting grips with NetScaler troubleshooting…

 

NetScaler Appliances negotiates to “Half-Duplex” for 100 Mbps Speed only…

Today I’d like to share another interesting gotcha while setting up NetScaler Link-Aggregation (LA) channels with network switches. My customer had all their perimeter network infrastructure “managed” by a 3rd Party Supplier and these “managed” switches in the DMZ, were only capable of doing 100-Full-Duplex ether-channels!!

On the surface, this seemed like a relatively easy task and all I’d to do was to set the NetScaler’s Link-Aggregation channel to 100 Mbps (Speed) and FULL (Duplex) setting. But guess what – NetScaler didn’t play ball! It was negotiating at 100-HALF speed/duplex.

I must admit that this trivial task took me to verge of pulling all my hair out! ūüôā

Was it the "dodgy cable", or the "Service Provider's invalid Ether-Channel Config" or "NetScaler"??

Apparently, it was the NetScaler configuration. By default, the NetScaler does “Auto-Negotiation” and even if you explicitly set an interface to FULL DUPLEX and 100 Mbps SPEED Setting – it doesn’t apply unless you pass the parameter (-AUTONEG DISABLED).

 
> set interface LA/1 -autoneg DISABLED -speed 100 -duplex FULL

Hope this helps in your exciting journey of deploying Citrix NetScalers.

P.S. The AUTONEG parameter can't be passed while using >Set Channel LA/1 commands.

 

Using OpenSSL to split .pfx files to .pem format (Citrix NetScaler)

Whilst on the field, integrating NetScalers – one of the most common tasks would be to install the relevant SSL certificates on the appliance. This would be either for SSL Offloading, Secure Management (HTTPS for GUI) or to deploy a wildcard for your NetScaler Gateway FQDN (aka Access Gateway).

Note: The Certificate Signing Request (CSR) can either be done on the NetScaler itself or any IIS or webservers.

Let’s assume this is an IIS server and the logical process flow will be –

  1. Task – Generate CSR with relevant FQDN (*.wildcard.com)
  2. Submit it to the Certificate Authority (for external – Go Daddy etc..)
  3. Import the SSL Cert file into IIS and bob’s your uncle.

Hang on, now how do I get this onto the NetScaler!? Just export it… but it’s got a private key and it’s in “.pfx” format!

OpenSSL to the rescue!

The Citrix NetScaler has got a built-in utility called OpenSSL. You’ll need to either use the NetScaler GUI or WinScP to copy the .pfx to the appliance. The default cert location on the NetScaler is /nsconfig/ssl

Screengrab -WinScp

Default directory of SSL Certs on a NetScaler

Once the .PFX file has been uploaded to the above directory. Follow these steps –

  • Drop into the NetScaler shell and change directory to /nsconfig/ssl

 > shell
root@VPX-SiteA# cd /nsconfig/ssl
root@VPX-SiteA# pwd
/nsconfig/ssl

  • Extract the WildCard cert first

root@VPX-SiteA# openssl pkcs12 -in COMPANY-WILDCARD.pfx -clcerts -nokeys -out COMPANY-WILDCARD.pem

Enter Import Password:

MAC verified OK

 

  • ¬†Next, extract the Private Key (if there is one!) and enter your private key password that was set originally.
root@VPX-SiteA#openssl pkcs12 -in COMPANY-WILDCARD.pfx -nocerts  -out COMPANY-Privkey.key
Enter Import Password:
MAC verified OK
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
  • Now your ready to import these two files (.PEM and .KEY) into your NetScaler
Installing the WildCard Cert and its Private Key on the NetScaler

Installing the WildCard Cert and its Private Key on the NetScaler