OOD 1.8, apps aren't loading, looks like ssh challenge error?

Fresh OOD 1.8 with Dex installation on RHEL 8.2 with PBS. Users can login to OOD. Shells can be accessed via OOD. From the command line (via OOD and regular terminal), qstat works for root and for users.

When I try to start a job via an interactive session app, I get this error in /var/log/ondemand-nginx/user/error.log. The IP address is correct. The error shown to the user in browser is essentially the same.

App 2078 output: [2020-09-17 12:06:23 +1000 ] ERROR "ERROR: OodCore::JobAdapterError - Warning: Permanently added 'pbs.domain.com.au,129.x.x.x' (ECDSA) to 
the list of known hosts.\r\nuser@pbs.domain.com.au: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password,hostbased)."

Any idea what I’ve done wrong?

I found this post which encouraged adding the keys to /etc/ssh/ssh_known_hosts. Having now done that for the host pbs.domain.com.au, I just get the balance of the error:

user@pbs.domain.com.au,: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password,hostbased).

Shouldn’t Dex be looking after this? In 1.7 we used basic auth, and all this was looked after for us.

I’m looking at my working system (OOD 1.7, CentOS 8.2, Basic Auth via PAM and SSSD) and I’m wondering if my new set up is failing because SSSD translated through to the cluster - it’s the same auth system we use there. Dex is configured against the same AD, but the cluster doesn’t know about Dex. I presumed in my mind that it wouldn’t need to - but maybe it does?

I don’t think this has to do with auth or PAM or SSSD or Dex.

First I want to make sure this is expected. That is, that you expect OOD to ssh into pbs.domain.com.au and submit the job from that node and not from the OOD web server node (some folks do this so they keep env and binaries on another host, they ssh into this host and run qsub and so on).

If that is the case - that you want to ssh into another node to execute the qsub command - then you probably need to add host based authentication between OOD webserver and this remote host. Otherwise every single user will have to generate their own keys and that’s a big pain. There are lots of resources on the web on how to do this, just google ‘host based authentication’. You already allow it (because hostbased is listed there) now you just need to setup trust by creating and adding the keys.

If this is not the case - that you want to execute qsub on the OOD web server - then you need to checkout what you’re doing in bin_overrides (in the cluster.d config file) with a wrapper script that seems to be sshing. Or you could be using submit_host.