We got the Host Adapter functional in our OOD, and it’s working as it should, which is great. However, to make it better functional in our setup, it’d be good to consider a few enhancements and settings modifications that I’ll describe below.
Before I do that, let me describe our setup. We have 8 “interactive” hosts, called Friscos (frisco1-8) that are independent from all clusters, but which use the same OS image as a cluster interactive node. They are set up so that users can do interactive work without having to run a job. These 8 hosts don’t have the same hardware. We enforce usage limits on these nodes with Arbiter, https://gitlab.chpc.utah.edu/arbiter2/arbiter2, so, we don’t anticipate the need of OOD’s enforcement options.
Round-robin hostname requirement. It would be good to not make it mandatory (as it seems now). Our Friscos don’t round robin. Now if I make submit_host: “frisco1”, the host adapter job only goes to frisco1, not to frisco2-8
Make it possible for user to choose the host to run on. E.g. we could have a pull down menu that would list the ssh_hosts from the host adapter clusters.d/yml file.
Or, perhaps even in the current host adapter implementation, it’d be possible to override the choice of the host to ssh into with an input from the form.yml?
Any thoughts on limiting number of sessions per user on each host, or, would this be too complicated to put to OOD?
The containerized host launch works great, especially with the system directories bind mounts, but, because we run in a container, we don’t see other user processes on the (shared) system. We usually recommend people on Friscos to check the load on the system with tools like “top” to get an idea how busy the system is. They won’t see the whole system load with “top” when it’s run from inside of the container, which may make them think the system is more free than it actually is. I can’t think of a tool that would be able to see the host processes from inside of a container.
This is not a big issue for us since we enforce the usage with Arbiter, but, it will be confusing to users that will try to use Friscos via OOD the same way they were used to with direct SSH.
Now a couple of questions:
What’s the difference between site_timeout and bc_num_hours?
How to customize the descriptions on /var/www/ood/apps/sys/bc_desktop to be specific to the linux host adapter