Hi all,
we moved to OOD 1.8 and are testing LSF multi-cluster.
Based on a previous discussion: LSF multi-cluster environment deleting panel - #26 by fenz
We set different cluster configurations and wanted to pass the “-clusters” option when submitting the job to start the app but this seems not working. I’ll start first writing what I think is the issue.
This code is used to get the “cluster” option to pass to LSF: ood_core/batch.rb at master · OSC/ood_core · GitHub
and here it where the “value” is used: ood_core/batch.rb at master · OSC/ood_core · GitHub
It seems “cluster_name” get passed to “-m” option for any LSF command.
The problem is that LSF is not “exactly” consistent in how using the “same” option in different commands.
In “bsub” command “-m” option is used as:
bsub -m "host_name@cluster_name[ | +[pref_level]] | host_group@cluster_name[ | +[pref_level | compute_unit@cluster_name[ | +[pref_level]] ..."
BSUB -m option: IBM ref
So you can have “@cluster_name” but you specify an “host” in a cluster and not an entire cluster.
The correct option for bsub to specify a cluster would be:
bsub -clusters "all [~cluster_name] ... | cluster_name[+[pref_level]] ... [others[+[pref_level]]]"
BSUB -clusters option: IBM ref
That’s what happens is bsub command.
Instead in “bjobs” command the -m “option” would be the right one for the cluster:
bjobs -m host_name ... | -m host_group ... | -m cluster_name ...
BJOBS -m option: IBM ref
since it takes either the host_name or host_group or cluster_name.
And now our problem.
If we don’t specify the “-clusters” option in our “submit.yml” we get an error like “bad host specification” (since it tries something like “bsub -m cluster_name”) and if we do specify the “clusters” options we get an error like “can’t use -m option with -clusters” (since it tries something like “bsub -m cluster_name -clusters cluster_name”).
For the “multi-cluster” specifically I guess it would be better to use “-clusters cluster_name” in case of “bsub” command and “-m cluster_name” in case of “bjobs” command. But I’m not sure if this will break anything else.
Any thoughts?