We’ve faced an outage with our vCenter server lately, and I’d like to share my experience and the fix with you. There’s a known bug in ESXi 6.7 update 3, and this bug makes the host to flood the SEAT database of the vCenter server, and this leads to stopping the VPXD service, which means that the vCenter server is down, In this article I will discuss with you the symptoms, and the steps to fix the issue.

The following KB from VMware illustrates the issue with vSphere 6.7 U3:KB 74607.

I’d strongly recommend that you act proactively and apply the fix patch in your environment to avoid facing the same issue.

The symptoms

  • By examinig one of your ESXi 6.7 update3 hosts, you should find errors similar to these under the events section.

001

  • The vCenter server will not be available, and you will get an error similar to this when browsing into it

503 Service Unavailable

  • by exploring the VAMI (vCenter Appliance Management INterface), and navigating to the storage section, you will probably find disk 8 is full, or almost full, and that’s why the VPXD service is unable to start. Also, you can see this by logging into a shell console of the vCenter server via SSH, ans issue the following command:

df -h

002

 

You can further investigate this by checking which top 10 tables are filling out the database by following these steps:

  • SSH into vCenter server
  • Run the following command to login to the DB

/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

  • Then, run the following command to perform a query.

SELECT COUNT(EVENT_ID) AS NUMEVENTS, EVENT_TYPE, USERNAME FROM VPXV_EVENT_ALL GROUP BY EVENT_TYPE, USERNAME ORDER BY NUMEVENTS DESC LIMIT 10;

This command may take a few minutes to get the result  according to the size of your DB.

004

 

Fixing The Issue

The quickest solution is to extend the storage capacity for the SEAT disk, which is disk number 8 in vCenter server 6.7 in order to make the server work again. You can double-check the disk number by checking the SEAT disk number under the VAMI web console. To extend the disk, follow these steps in here.

Then, you will need to make the ESXi hosts stop generating these logs that flooded the database, there’s a permanent fix by applying a patch (ESXi 6.7 Patch Release ESXi670-201911001) to all of your ESXi 6.7 U3 hosts (find this fix patch here), or, apply a temporary fix by disabling WBEM service across your hosts, but this is only a temp, and it’s enabled again if you rebooted a host. I’d recommend that you go with the permanent fix.

Now, you need to clear these logs from the SEAT database to reduce it’s size back to he normal suze by truncating it, but you need to be very cautions when performing this, and it’s better to be having some experience with PostgreSQL databases, and take a backup, or at least a snapshot from your vCenter server prior to applying this. I’d strongly recommend that you ask VMware technical support to help you doing this with specific scripts that they have, it’s better to ask for their help than corrupting the database.

After purging these logs, the SEAT size should look like this:

003

 

I hope that this has been informative, and I’d like to thank you for viewing.