We’re running Open OnDemand 1.7 with slurm.
Occasionally we get users reporting their interactive jobs are no longer available/listed in the portal. The job is still running in slurm and you can see the jobid in the job tracker.
We’ve found that when this happens, the interactive job data is missing from:
We can see files for other jobs in there, but the “missing” jobs have no matching entry in there.
How does that directory get managed? My supposition is that at some point, the user web-server process is reaped, and when they return, maybe slurm is slow to respond and so something partially cleans the directory… Does that sound plausible? Or is there something else in play here?