LSF job termination on VNC logout

Hi All,
Why is the corresponding LSF job not getting terminated when logging out from an interactive desktop session? Logging out stops X and VNC server on the worker node, but won’t stop the corresponding LSF job. The job would still show up in OOD, but if one tries to reconnect to that same session it fails.
See screenshots below:


It works as expected with Slurm, but not with LSF.
Does anyone have an idea?
Thanks and best regards

It’s unclear to me why that would be the case in LSF. Can you tell me what processes’ (if any) are still running on the compute node when you logout? I mean processes attributed yourself, from that job. And maybe a pstree from the top process (one with the lowest pid).

It seems to me LSF thinks that something is still running. Why it thinks that I suppose is what we’d have to determine. I’m not familiar with how LSF works if there’s a daemon on the compute node or what. Mabye a lsof 2>/dev/null | grep $USER | grep deleted (list of all deleted files that still have an inode) will show something? That’s kinda guessing in the dark though, I’d hope the process bit would show us more.

This could be due to LSF tracking a VNC process that has not ended. You can issue the following command:

bjobs -l jobid

You should find a list of PPID’s and PID’s. On the system in question, look for the pids and ppid’s using the ‘ps -ef | grep pid’ command.

With VNC, you have to be careful as it likes to daemonize processes. Which LSF will continue to track. The way you avoid this is by preventing VNC from damonizing the commands by enabling:

LSB_RESOURCE_ENFORCE=“cpu gpu memory”

In your lsf.conf, then restart everything. However, after that VNC may not be happy.

If on the other hand either ‘bjobs -l’ shows no pids or ppids, or those pids and ppids don’t exist, you may have stumbled on an LSF bug. In that case, restart the sbatch on the host, then open a ticket with support or apply the latest LSF service pack.