We have cluster config set host to a cluster login node, and both Clusters -> Shell Access, and Job Composer -> Open Terminal, open a terminal on this login node (as expected). From the file browser, it is possible to open a terminal on the OnDemand host. What are the intentions for which hosts users will have shell access to? Is it possible to configure OOD to only offer shell access to the cluster login node, and not the OnDemand host (possibly with file browser also using the login node)? We don’t have a current need for multi-cluster OOD, but I would like to better understand the OOD-cluster interaction – since we had to configure Slurm on the OOD host to make it similar to a login node for the cluster, how could it be like a login node for more than one cluster (managed by different Slurm instances)?
The file browser defaults to opening the shell to
/pun/sys/shell/ssh/default and then default ssh host for the shell app is localhost but can be changed by setting
DEFAULT_SSHHOST in /etc/ood/config/apps/shell/env. https://osc.github.io/ood-documentation/master/customization.html#set-default-ssh-host
The shell app doesn’t yet read the cluster configs which is unfortunate. Those used to be separate web apps in separate repos, and we recently merged into a single repo to make development easier. I opened https://github.com/OSC/ondemand/issues/358 to track that.
how could it be like a login node for more than one cluster
Of course if the client binaries could submit to multiple clusters, then you could just use that. At OSC we currently use Torque + Moab and a single set of Torque client binaries can be used to submit jobs to multiple clusters, even if the Torque servers vary slightly by version.
If you can’t do that, then another approach is to use a ssh wrapper scripts around sbatch squeue etc. The wrapper scripts would actually execute the commands via ssh on another host, such as a login node for that cluster. See https://osc.github.io/ood-documentation/master/installation/resource-manager/slurm.html. You would have a set of wrapper scripts for each cluster and use bin_overrides to specify which wrapper scripts to use in each cluster config. I opened https://github.com/OSC/ood_core/issues/170 to track making this approach easier, without the need for wrapper scripts.
There is a site where both the file systems and the schedulers for each cluster are completely different. At this time this model doesn’t work well with OnDemand, which prefers a shared file system between the clusters and the web node, so that site opted to just stand up separate OnDemand instances for each cluster.