LSF multi-cluster environment deleting panel

You’re fine. My explanation was a bit wrong, so I went ahead and tested option 1 and confirmed the behavior.

Here is the flow of finding a job and querying for it, so please refer back to it if needed:

When OOD finds a job with cluster_id = rka (on any site, rid, rka, my cluster at OSC, wherever) it will attempt to create an adapter for this cluster id (rka). It looks for a file called rka.yml (because the cluster_id was rka. cluster_id is the filename. The filename is the cluster_id. This is true both when you create the job and when you go back to query for it.) and tries (tries!) to create the library.

(option 1)
If it can’t find the cluster configuration file (like you’ve logged into rid and you can’t find the rka configuration) it’ll get confused and create a panel for this job in an “Undetermined State”. It has a delete button, but it won’t work and it says to contact support. OOD can’t delete the job because it doesn’t know how to. On rid it has no idea how to interact with the rka cluster, if it’s SLURM or Torque or whatever.

(option 2).
If it does find the file rka.yml it’ll read the configuration and use it.

(option 2 - bad)
Since this is LSF, it look for v2.job.cluster to be populated. If it’s not populated, it won’t use the -m option. This is problematic because it can successfully execute the bjobs command and LSF says “that job doesn’t exist” (because you end up querying rid for an rka job), so it deletes it.

(option 2 - good)
Since this is LSF, it look for v2.job.cluster to be populated. If it is, it’ll use it as the -m argument when running bjobs . If the rka.yml file has v2.job.cluster: "rka" it will submit a bjobs command with -m rka. This means you’ll be able to view RKA jobs on RID.