During my Cloud Director 10.4.1 to 10.5 upgrade, I hit a repeated issue during the database upgrade step.
Unable to upgrade the database: org.postgresql.util.PSQLException: ERROR: could not open shared memory segment “/PostgreSQL.839369758”: No such file or directory
Or sometimes this:
Unable to upgrade the database: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend
I dug through many, MANY logs. At first I thought it was the resources on my VCD cell, so I increased CPU and RAM allocations and the upgrade was successful. But it made no sense. 32GB RAM required for such a small lab?
I reverted the upgrade (thank snapshots, Batman!), and found on a 3rd attempt it failed with the same messages. No rhyme or reason. At least a dozen times I tried the upgrade with slight changes in resourcing, checking the logs, etc and not being able to find the explicit cause. Some attempts were successful, without any changes to my steps (I wrote down each attempts towards the end as I was going crazy).
I spent days trying to figure it out. I wasn’t very trusting of the results Google was giving me, as this was a packaged appliance and there are always configurations made that may or may not align to some random persons suggestion on StackOverflow.
Looking at some internal docs, an engineer found (on Stack Overflow of all places) that the first error above (and most likely the second is related) was due to a systemd config that limits inter-process communication. Configuring systemd to allow IPC and restarting the VCD cell worked, REPEATEDLY!
- On your VCD cell, open
/etc/systemd/logind.conf
. Every line will be commented out (it was on my cell). - Look for
#RemoveIPC=yes
. - Uncomment it and change it to
RemoveIPC=no
. - Restart the cell:
shutdown -r now
.
References
- Internal source (which was legendary Staff Engineer Guruprasad Karanth ), which lead to: https://superuser.com/questions/1117764/why-are-the-contents-of-dev-shm-is-being-removed-automatically/1179962#1179962
- Restore of a Wordpress deployment is stuck redirecting the port
- Backups and Restores using Velero in TKGm 1.6.1
- Unable to upgrade the database: org.postgresql.util.PSQLException: ERROR: could not open shared memory segment: No such file or directory
- Upgrading Cloud Director 10.4.1 to 10.5
- Installing and Configuring Velero in TKGm 1.6.1 on vSphere