Launch app on specific host skipping the scheduler

Hi All,

I have a cluster with various fileservers that are sometimes used for compute instead of the general compute nodes in slurm that I would like to be able to launch jupyter notebooks on. I suppose I could put them in slurm but given the “linux_host” adapter, I was wondering if there was a way to do i without adding these fs’s to slurm?

Yea I think the linux host adapter may suite this use case, because it’s meant for login nodes or at least nodes that are not a part of the general compute infrastructure.

It does depend on a few things like tmux and singularity on the destination server, so just keep that in mind.

@jeff.ohrstrom The “cluster” setting in an app’s form.yml is immutable by the form page right? i.e. I cannot dynamically change its value with a form field/javascript?

No, you have a couple of options here in 1.8+.

Adding additional clusters in cluster attribute will give you a dropdown which you can interact with through javascript.

There’s a little complexity here though given how different adapters require fields. That is, if you were to say set a slurm cluster A and a linux host adapter B they’d want very different things in the native field of the submit.yml.erb. You can check this out as a reference on how to toggle these. At one point we had a Slurm cluster and a Torque cluster at the same time, so we accessed this information through OodAppkit.clusters[cluster].job_config[:adapter] and submitted different native args based on that.

1 Like

I think this is exactly what I needed. Key was the fact that simply adding multiple clusters to the clusters var in the form.yml was already built in!

Thanks for the submit.yml.erb snippet as well, that is how I will handle the native bits when submitted to slurm and nothing when sent to a linux_host adapter!

@jeff.ohrstrom I am getting pretty far with this but am now stuck with the apps launching correctly (I can see jupyter running as my user on the target host) and I can manually enter the url so the reverse proxy works but the apps in the portal go right to completed with no connect button.

I am guessing this is because the connector is failing to communicate with the process but I cannot seem to figure out what is blocking it? Any ideas?

Here’s a troubleshooting section. I’ve noticed a similar behaviour and have that section here, where ‘it just exists immediately’. There are steps to debug, but as an off the top guess, I’d scrutinize the submit_host and the ssh_hosts. ssh_hosts should be any host the submit_host can DNS resolve to.

Hi @jeff.ohrstrom

Do I need to run an app intended for a linux host in a specific container like the wiki states by adding singularity_container: /usr/local/modules/netbeans/netbeans_2019.sif line to the native override in the submit.yml.erb in an app?

No, I think we run a base centos:7 image and just mount in everything we need. We really only use it for process management more than anything else.

So you could either use a basic image and mount in what you need (like we do for code-server) or you can have a specific image that holds what you need and have fewer mount ins. Totally up to you.

Ok cool.

Still trying to figure out why these jobs are going straight to completed even though the singularity container and its internal process are running fine. I can even manually enter the uri to redirect to the jupyterlab that is running on the node.

I can’t quite find where in the code how the state is being determined. I see ood_core/status.rb at master · OSC/ood_core · GitHub is handling state info for other parts to query but I don’t see the logic that does the actual test?

Do you happen to know off the top of your head what is being tested/queried on the node to determine state? I’m guessing its checking the PID of the singularity command?