Lock files for VNC-based apps not being cleaned up

For VNC-based apps, a lock file is placed at /tmp/.X*-lock on the compute node on which the session runs (where * is the display number). When sessions end normally, these lock files are not removed. Once all the display numbers (1 to 99) are exhausted, it is no longer possible to create new VNC-based sessions until old lock files are removed manually. Attempting to create a session gives errors “vncserver: no free display number on node-name”.

We are using the turbovnc v. 2.2.4 package (from turbovnc repo) on RHEL 7.7.

How can we find/correct this problem?

I looked into this for a while and it seems it’s an issue of how the user exists and if vncserver can clean it up itself. If I ‘logout’ of the vnc session, it cleans up. If I delete the session (have the scheduler delete the job) it removes the file.

Now I’m wondering how this works well for us. We do have a few files hanging about in tmp, but it’s unclear how to replicate this, for the most part the default behavour in the job script and vncserver works well for us. If the session dies, it generally removes the temp file for us. We’re running TurboVNC Server (Xvnc) 64-bit v2.1.90 (build 20180822), so we run a lower version than you

Looks like we have hooks like vnc_clean to override one liner given (vncserver -list | awk '/^:/{system("kill -0 "$2" 2>/dev/null || vncserver -kill "$1)}'). This runs both before the job runs and after it exists, so maybe before vncserver starts is the right place to find what you’re looking for.

Here’s where in your cluster.d file you could set it up so that it gets enabled on all your VNC related jobs.

v2:
  batch_connect:
      basic:
        script_wrapper: "module restore\n%s"
      vnc:
        script_wrapper: "module restore\nmodule load ondemand-vnc\n%s"
        vnc_clean: |
          # here you can do something more elaborate than what's given.

As a first pass, here’s something that would work that seems to work in sh (without having to make bash arrays).

Now, a user can only remove their own files, but something like this will ensure that if a session doesn’t exist, but the file does, the user will remove the file. An admin can sudo run this to initialize the node to a clean state, then the users will start to clean up after themselves over time.

I made it to sort of inline some things, so if you wanted to make it larger or add comments you may write it out to a file and set command to execute that file like vnc_clean: 'sh /opt/ood/vnc_cleaner'.

x11_files=$(ls /tmp/.X*-lock)
x11_sessions=$(vncserver -list | awk '/^\:/ { print $1 }' | tr -d :)

for f in $x11_files
do
  file_number=$(basename $f | sed -E 's/.X|-lock//g' )
  rm_file=true
  for session in $x11_sessions; do
    [[ "$file_number" == "$session" ]] && rm_file=false
  done
  [[ "$rm_file" == true  ]] && rm -f $f 2>/dev/null
done

We’ll look into this for our system, thanks!

In the sample clusters.d file you show, it appears that VNC is installed as a module. We have the package installed outside of the module system and the clusters.d file sets the path to it:

  batch_connect:
    basic:
      script_wrapper: |
        module purge
        %s
    vnc:
      script_wrapper: |
        module purge
        export PATH="/opt/TurboVNC/bin/:$PATH"
        export WEBSOCKIFY_CMD="/usr/bin/websockify"
        %s

Could this have any effect on cleanup for VNC?

No I don’t think so.

The only thing I can think is the version (you run 2.2.4 and we run 2.1.90) and they introduced a bug during that time? But I find that unlikely.