LSF multi-cluster environment deleting panel

jeff.ohrstrom · July 16, 2020, 7:46pm

You’re fine. My explanation was a bit wrong, so I went ahead and tested option 1 and confirmed the behavior.

Here is the flow of finding a job and querying for it, so please refer back to it if needed:

When OOD finds a job with cluster_id = rka (on any site, rid, rka, my cluster at OSC, wherever) it will attempt to create an adapter for this cluster id (rka). It looks for a file called rka.yml (because the cluster_id was rka. cluster_id is the filename. The filename is the cluster_id. This is true both when you create the job and when you go back to query for it.) and tries (tries!) to create the library.

(option 1)
If it can’t find the cluster configuration file (like you’ve logged into rid and you can’t find the rka configuration) it’ll get confused and create a panel for this job in an “Undetermined State”. It has a delete button, but it won’t work and it says to contact support. OOD can’t delete the job because it doesn’t know how to. On rid it has no idea how to interact with the rka cluster, if it’s SLURM or Torque or whatever.

(option 2).
If it does find the file rka.yml it’ll read the configuration and use it.

(option 2 - bad)
Since this is LSF, it look for v2.job.cluster to be populated. If it’s not populated, it won’t use the -m option. This is problematic because it can successfully execute the bjobs command and LSF says “that job doesn’t exist” (because you end up querying rid for an rka job), so it deletes it.

(option 2 - good)
Since this is LSF, it look for v2.job.cluster to be populated. If it is, it’ll use it as the -m argument when running bjobs . If the rka.yml file has v2.job.cluster: "rka" it will submit a bjobs command with -m rka. This means you’ll be able to view RKA jobs on RID.

Topic		Replies	Views
RStudio interface disappears and user cannot re-connect with existing session Get Help	5	950	May 26, 2022
"Unrecognized option -x509cert" Get Help	5	606	May 26, 2022
<App> failed to start waiting for port Get Help ondemand2 , question	7	228	September 19, 2023
Problems starting VNC on OOD with LSF manager Get Help	9	1529	May 26, 2022
Interactive sessions stuck at starting stage Get Help	6	1503	May 26, 2022

LSF multi-cluster environment deleting panel

Related Topics