Rotaugenlaubfrosch,
This sounds similar to an issue I had when setting up OOD with LSF, in my case it was a race condition, if LSF didn’t show a submitted job via ‘bjobs’ immediately after submission, the startup procedure would never continue.
My solution was to make a wrapper script for the bjobs command to have it sleep for a few seconds before running the actual ‘bjobs’ command, then pointed to it in cluster.yml:
job:
bin_overrides:
bjobs: "/path/to/bjobs/wrapper"
The delay was enough so that the launched job was found, and the startup worked afterwards: