Category: Networking

F5 BigIP health checks mark host resource down although it’s up

A couple of times I have happened to run across a strange issue on some F5 Big-IP LTM clusters where one of the node’s marks some resources as down although they are actually up. Which can cause quite a lot of confusion and trouble.

At least in the cases that I have seen TMM seems to start interpreting the output of health checks backwards for some hosts. In the logs you can see that the health check returned the host is up and that host was marked as down.  I have had it happen a couple of times with the 11.x series LTM software and it has also happened with the 12.x versions even with the latest patch levels. But I have not seen it happen with the 13.x version(yet).

So in order to get around the issue I have usually just restarted the TMM process on the affected device and all has gone back to normal after it.

Basically to restart the TMM just log in to the device using SSH and issue the following command:

tmsh restart /sys tmm

Beware that restarting the TMM will cause the device to stop processing traffic. So, in case you are having the issue on a device processing the traffic and are running a Big-IP cluster just do a fail-over first if you already haven’t done it.

Like with many other issues the phrase “have you tried turning it off and on again” comes to mind and saves the day.

Check Point 1400 series SMB device VPN debug log fast rotation work-around

If you have ever had to debug VPN-s on a Check Point SMB device you might have noticed that they rotate their logs every 1MB, which means that sometimes You might actually miss the information You were looking for.  At least for me it was a problem trying to get debug level information on some VPN issues that occurred randomly. 

So in order to get the required output I added a 32GB SD-card to the firewall to extend its small storage made some symlinks and wrote a few little script to get all the output I required for debugging.

So on to the details. After you have mounted your SD-card you have access to it on the path:

/mnt/sd

Before You enable debugging You should make symbolic links for the ikev2.xmll and ike.elg files so that you wouldn’t run out of space on the built-in flash.  You can do that by using the following commands:

touch /mnt/sd/ikev2.xmll && touch /mnt/sd/ike.elg
ln -s /opt/fw1/log/ike.elg /mnt/sd/ike.elg
ln -s /op/fw1/log/ikev2.xmll /mnt/sd/ikev2.xmll

Now enable debugging like you usually would(cp support site SK):

vpn debug trunc
vpn debug on TDERROR_ALL_ALL=5

And here is the script I used to copy the logs to the SD-card as they were rotated:

!/bin/bash
while true
do
fmtime=$(stat -c %Y /opt/fw1/log/sfwd.elg.0)
curtime=$(date +%s)
diff=$(echo $curtime-$fmtime|bc)
if test $diff -le 1
then
cp /opt/fw1/log/sfwd.elg.0 /mnt/sd/sfwd.elg-$fmtime
fi
sleep 1
done

So basically, it checks if the sfwd.elg.0 file has changed every second and copies the changed file to the SD-card. I actually also experimented using logger to send the log to a central server via syslog. Using logger just didn’t work. It sent the first one fine, but then the other changes afterwards were just dropped and I opted for the copying. 

Fixing Smart Dashboard crashing after receiving “Disconnected_Objects already created by another user” error

Today I happened upon an error Smart Dashboard after it randomly crashed and refused to start again. After the crash it started always showing me the error “Disconnected_Objects already created by another user” and crashing again. Quick lookup on Check Point’s support site gave me the idea that SmartMap cache might be corrupted.  So here is a quick copy paste of the commands needed to reset the Smart Map cache in R77.30 on Gaia.

mkdir -p /var/tmp/SmartMap_Backup/
cpstop
cd $FWDIR/conf/SMC_Files/vpe/
mv mdl_version.C /var/tmp/SmartMap_Backup/mdl_version.C
mv objects_graph.mdl /var/tmp/SmartMap_Backup/objects_graph.mdl
cd $FWDIR/conf/
mv applications.C /var/tmp/SmartMap_Backup/applications.C
mv CPMILinksMgr.db /var/tmp/SmartMap_Backup/CPMILinksMgr.db
cpstart

After doing that I was able to start Smart Dashboard again and continue working! 🙂

If you are running your management server on Windows are actually are using Multi-Domain-Server you can find the commands needed to do the same on those systems in “sk92142” which is about “SmartDashboard crashes when loading SmartMap data, after upgrading the Security Management Server “

Check Point unable to delete IKE/IPSEC SA on a SMB device cluster

On a Check Point SMB 1400 series appliance cluster with R77.20.75 installed I happened to run in to an issue where after changing the peer Gateway’s IP address the VPN did not want to come up again and VPN TU showed me a SA’s relating to the old peer IP address. VPN TU delete command did not remove them. Also disabling the VPN community/removing the gateways from it did nothing, still the stubborn SA’s remained, even waiting for the timeouts to occur did nothing.

What in the end actually removed the stuck SA was doing “cp stop” “cp start” on both of the devices with manual fail over in between. After that VPN TU didn’t show the stuck SA any more and the VPN started working again with the peer’s new IP address.

Check Point R77.30 management interface crypto hardening (WebUI and SSH Cipher change)

By default the management interfaces (WebUI/SSH) of a Check Point firewall are using crypto settings that are not that great (MD5 and SSLv3, etc are enabled), but fortunately it is possible to change them.

SSH daemon is configured like in a normal Linux Distribution by just editing the /etc/ssh/sshd_config, Check Point in its support site also recommends you also modify the ssh client configuration located in /etc/ssh/ssh_config.  Basically in order to change the encryption algorithms available when connecting to the firewall using ssh add the following lines to the aforementioned configuration files using the vi command in Expert mode:

Ciphers aes256-ctr,aes256-cbc,aes128-ctr,aes192-ctr,aes128-cbc,aes192-cbc
MACs hmac-sha1

After modifying the config file restart the SSH server using the following command:

 service sshd restart

If everything is fine then your connection survives and if for some strange reason your ssh connectivity breaks and you can’t log back in you can undo the previous changes by using the terminal access that you can get in the WebUI.

Now that the SSHD settings have been changed, lets start changing the Cipher suites available for HTTPS used for WebUI. Just connect to command line using SSH and do the following in Expert mode.

  1. Backup the current file /web/templates/httpd-ssl.conf.templ:
    [Expert@HostName:0]# cp /web/templates/httpd-ssl.conf.templ /web/templates/httpd-ssl.conf.templ_ORIGINAL
  2. Edit the current /web/templates/httpd-ssl.conf.templ file:
    [Expert@HostName:0]# vi /web/templates/httpd-ssl.conf.templ
  3.  Find the line containing the SSLCipherSuite parameter and change the values behind it for example to ECDHE-RSA-AES256-SHA384:AES256-SHA256:!ADH:!EXP:RSA:+HIGH:+MEDIUM:!MD5:!LOW:!NULL:!SSLv2:!SSLv3:!eNULL:!aNULL:!RC4
  4. Close the editor by using :wq!  , the ‘!’ in the end will override the fact that the file has read only permissions.
  5. Update the current configuration of HTTPD daemon based on the modified configuration template:
    [Expert@HostName:0]# /bin/template_xlate : /web/templates/httpd-ssl.conf.templ /web/conf/extra/httpd-ssl.conf < /config/active
  6. To activate the configuration changes restart the HTTPD daemon by using the “tellpm” command:
    [Expert@HostName:0]# tellpm process:httpd2
    
    [Expert@HostName:0]# tellpm process:httpd2 t

To find out what you actually want to use as the SSLCipherSuite value you can use the cpopenssl to see what algorithms will be available with which value. Example:

[Expert@HostName:0]# cpopenssl ciphers -v 'ECDHE-RSA-AES256-SHA384:AES256-SHA256:!ADH:!EXP:RSA:+HIGH:+MEDIUM:!MD5:!LOW:!NULL:!SSLv2:!eNULL:!aNULL:!RC4' | sort -k1

Expected output:

AES128-SHA SSLv3 Kx=RSA Au=RSA Enc=AES(128) Mac=SHA1
AES256-SHA SSLv3 Kx=RSA Au=RSA Enc=AES(256) Mac=SHA1
DES-CBC3-SHA SSLv3 Kx=RSA Au=RSA Enc=3DES(168) Mac=SHA1

Renewing F5 BigIP LTM expired device certificates

Every once in a while it is necessary to renew the device certificates on your BigIP devices which are used in the connection for the Web UI(XUI). It’s easy enough to do using the web interface. When the certificate hasn’t expired yet just log in to the Web UI using any web browser you like, but when the certificate has already expired Edge/Chrome/Firefox won’t let you in (no there is no “proceed” button, since the management interface is using strict settings), but Internet Explorer will still work. If you don’t have Internet Explorer available, it can also be done via the command line interface.

To renew the device certificate using the web interface just log in to the management interface and go to the page: System ›› Device Certificates : Device Certificate ›› Device Certificate and click on the Renew button. There you can choose whether you want to create a new self signed certificate or generate a certificate request to your company internal CA, or some external CA if you prefer.

In a clustered environment after you renew the certificate on one device, you need to sync the configurations between the devices before proceeding to update the others. If you don’t do config sync in between you may end up having to renew the previously already renewed certificates again, as config sync will push the old certificates back to active state on the other devices, since it doesn’t have info on the peer’s new certificates.

When renewing device certificates using the command line you will need to use openssl to generate the new rsa private key and certificate request and then use tmsh to activate the newly created key/certificate pair.

OpenSSL command example for generating a new RSA key and creating a certificate request:

openssl req -out CSR.csr -new -newkey rsa:2048 -nodes -keyout privateKey.key

OpenSSL command example for generating a new self signed certificate:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout privateKey.key -out certificate.crt

The newly created private key should be placed in the /config/http/confd/ssl.key/ directory and the newly created certificate should be placed in the /config/httpd/conf/ssl.crt/ directory. After you have placed them there, the command to activate new key/certificate pair using tmsh is:

tmsh modify /sys httpd ssl-certkeyfile /config/httpd/conf/ssl.key/new-private.key ssl-certfile /config/httpd/conf/ssl.crt/new-certificate.crt

 

Policy Based Routing resulting in no ARP replies from gateway

One might think that when applying Policy Based Routing it will not affect ARP (Address Resolution Protocol) because they are considered to be things working on different layers. PBR clearly should affect only Layer 3 routing decisions and ARP is running somewhere below layer 3.. There are many nice discussions on the internet whether ARP is a Layer2 or Layer3 protocol and some people tend to say its Layer 2,5.

As it turns out PBR can affect ARP. If you for example wish to re-route every packet originating from the 192.168.1.0/24 network and make a policy route stating that everything from source net of 192.168.1.0/24 be routed to lets say to the GW 172.16.1.1 with out specifying any port or protocol. What will happen is that, ARP requests that use broadcast work, but unicast ARP requests won’t get replies any more – at least from Check Point firewalls. So you would need to either make 2 rules stating that it would affect TCP and UDP only based on your needs or follow Check Point supports guide lines: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk84480

Why using VMware vMotion on an active F5 BigIP LTM VE cluster member can be a bad idea

Although F5 states that starting from version 11.5 it supports vMotion to move a BigIP LTM VE instance between physical hosts (K15003222) some times it still can cause issues even in the newer 12.x series software. To those that didn’t want to click on the link and read what F5 has to say about it here are their recommendations for using vMotion:

  • You should perform a live migration of BIG-IP VE virtual machines on idle BIG-IP VE virtual machines. Performing a live migration of the BIG-IP VE system while the virtual machine is processing application traffic may produce unexpected results, such as dropped connections.
  • Using the vMotion feature to migrate one member of a high availability (HA) pair should not cause a failover in most cases. However, F5 recommends that you thoroughly test vMotion migration for HA systems, as you may experience different results, depending on your environment.

Well having tested it I have to say that yes, moving an active member is a bad idea since it can have “nice” side effects in certain cases. I like their unexpected results statement, namely I have seen one BigIP LTM instance drop half it’s inbound connections after vMotion in a way that even after a reboot/upgrade to a newer patch level it still drops connections from certain IP addresses in a way that they don’t even show up in tcpdump and no half the connections don’t go to the standby node they just vanish.. and as soon as you force that device to standby on the other node they re-appear.  So be very careful on what you migrate during the night, as unexpected things might happen…

But atleast in my case using vMotion on the BIG-IP VE virtual machine again, this time in standby mode and then making it active again got traffic flowing normally again.

Insane amount of IKE SA’s on a SMB device caused by DPD and errors in logs

It seems that Check Point 1400 series SMB devices don’t handle Dead Peer Detection (DPD) that well when suddenly an external partner decides to enable it on a 3rd party firewall. Namely what happens is that you end up with tens of thousands of IKE SA’s on your little Check Point box and “Traffic Selector Unacceptable” errors in your logs.

Although in my case it didn’t cause any problems besides me being unable to see the output of the “VPN TU” command , since the IKE SA’s of the DPD flooded my console and the Embedded Gaia VPN TU utility decided not to show me it’s entire output and even crashed a few times. Ended up calling the other side and telling them to disable DPD. Hope they fix DPD support in some newer software release…

CheckPoint to Amazon AWS VPN connection issue

When trying to create a VPN tunnel between a CheckPoint firewall and Amazon managed VPN service I happened upon a unpleasant surprise.

Namely when using stronger crypto methods than defined by default in the guides by CheckPoint or Amazon you will run in to an issue, that the CheckPoint device will start dropping traffic after Phase2 key exchanges for a ~5 minute time period. To be more exact the traffic from Amazon to the hosts/networks behind the CheckPoint GW will start failing and connections started from behind the CheckPoint device will continue working as before. Namely Amazon VPN service refreshes it’s keys 5 minutes before the lifetime set in the VPN properties and CheckPoint close to 30 seconds. It actually wouldn’t be a problem if Amazon would use the same parameters as were used to initially establish the tunnel, but it doesn’t. It will actually use DH group 2 to initiate key exchange after which the CheckPoint device will start dropping the traffic coming in from the Amazon service with the following error:

encryption failure: Packet was decrypted with methods which are different from the methods according to the security policy - Gateway and Peer use different DH groups

After talking to both CheckPoint and Amazon support, I can say that the only thing you can do to remedy this is actually setting the DH group to 2  for PFS.

Although Amazon in its documentation(here) states it supports a bunch of different DH groups, and yet it defaults to DH group 2 when initiating the connection it self. To be honest, to me it seems a bit strange that the AWS VPN actually mirrors the encryption/integrity settings of the previous negotiation, but doesn’t remember the PFS settings and defaults to DH group 2. When talking to support services the only thing that AWS support suggested was to force the CheckPoint device to exchange keys before the AWS service does. Unfortunately you cannot do that according to Check Point support services, as there is no such setting available and that timer is around 30s+- some random number of seconds prior to the end of the life time set in the VPN properties.