I have a cluster with various fileservers that are sometimes used for compute instead of the general compute nodes in slurm that I would like to be able to launch jupyter notebooks on. I suppose I could put them in slurm but given the “linux_host” adapter, I was wondering if there was a way to do i without adding these fs’s to slurm?
There’s a little complexity here though given how different adapters require fields. That is, if you were to say set a slurm cluster A and a linux host adapter B they’d want very different things in the native field of the submit.yml.erb. You can check this out as a reference on how to toggle these. At one point we had a Slurm cluster and a Torque cluster at the same time, so we accessed this information through OodAppkit.clusters[cluster].job_config[:adapter] and submitted different native args based on that.
@jeff.ohrstrom I am getting pretty far with this but am now stuck with the apps launching correctly (I can see jupyter running as my user on the target host) and I can manually enter the url so the reverse proxy works but the apps in the portal go right to completed with no connect button.
I am guessing this is because the connector is failing to communicate with the process but I cannot seem to figure out what is blocking it? Any ideas?
Here’s a troubleshooting section. I’ve noticed a similar behaviour and have that section here, where ‘it just exists immediately’. There are steps to debug, but as an off the top guess, I’d scrutinize the submit_host and the ssh_hosts. ssh_hosts should be any host the submit_host can DNS resolve to.
Do I need to run an app intended for a linux host in a specific container like the wiki states by adding singularity_container: /usr/local/modules/netbeans/netbeans_2019.sif line to the native override in the submit.yml.erb in an app?
Still trying to figure out why these jobs are going straight to completed even though the singularity container and its internal process are running fine. I can even manually enter the uri to redirect to the jupyterlab that is running on the node.