User does not exist - Re occuring for some users

Hi Folks,

Down to one final issue preventing is from going live. Sometimes when a valid user logs on to OOD, they get the following message instead of their OOD home page. It never happens to me, seems like some users are impacted repeatedly.

Error – user does not exist: fredfoo
Run nginx_stage --help to see the full.list of command options.

I seem to be able to temporarily fix this by running the following from the log

Jul 16 17:21:22 vmpr-res-cluster1 sudo: apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Jul 16 17:21:23 vmpr-res-cluster1 sudo: apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Jul 16 17:21:24 vmpr-res-cluster1 sudo: apache : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

Running this fixes it, at least for a shot time:

/opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri

We are using mod_authnz_pam, pwauth, sssd to IPA and finally AD which is our source directory service up stream.

Happens with all browsers

  1. Can someone pls help me understand where in the login process things may be failing and where to look?

  2. Are there any known issues?

  3. Is there any option to increase relevant debugging

  4. These users can ssh into the login node that OOD runs on via SSH without issue.

  5. Stopping and starting the services does not fix the issue.

  6. CentOS 7.7 Ood version 1.6.22, dash v1.35.3

Thx

Unfortunately, I think that error bubbles up from nginx, though I can’t actually find that string anywhere. I tried to replicate and got something slightly different. If you check the file /var/lib/ondemand-nginx/config/puns/$USER.conf you’ll see something like user jeff 'jeff'; as the first line. /var/log/ondemand-nginx/$USER/error.log may have something in there. I believe this is thrown when nginx tries to start process’ as this user.

My guess is that you’ll see errors thrown from sssd or pam too in journalctl or /var/log/messages.

The fact that these users can ssh gives me pause and that you temporarily fix the issue even more so. We’ve seen ‘user not found’ type issues before but it’s generally because the LDAP queries are misconfigured. Our libraries just bubble up what errors we come across. And ‘not found’ is more typical of an LDAP where “doesn’t exist”, though similar, is from some other library.

I would suggest these questions for debugging: what’s the difference in authentication from SSH and OOD (mod_authnz_pam and pwauth are additional hops?). This always works for you, and sometimes for these other users. Could there be caching somewhere that’s failing? What system errors are being thrown (/var/log/httpd/, sssd, ipa, mod_authnz_pam)?

In fact, I can’t even replicate your issue, because if I try on a test instance I get can't find user for foo, which is sightly different. We were somehow able to find the user through getpwnam but failed at some other location. Maybe another argument for caching in some layer?

So far, nothing seems to crop up in the logs. I am wondering if we are intermittently loosing an Arg in the command line as the error refers is to run --help (see example in prev message)

regards,

Christopher Welsh
On his mobile.

Hi Jeff, wondering if I can show you or your team an example of it live for an affected user? Perhaps I could share my screen at a pre arranged time?

Yes we can meet, you can email me directly at johrstrom@osc.edu to set it up.

In the interim I’d still say look through all the logs in your auth stack. Not only for errors, but also just to rule the layers out as having some issue just to see what layer says what.

Make a list of what layer sees what. Clearly you’re able to authenticate through apache and apache believes they’re a real user.

It’s when we go to start processes as another user that we run into issues. You can run this command and it works. Presumably you run this as root. What happens when you’re apache. sudo -u apache sudo /opt/ood/nginx_stage/sbin/nginx_stage pun -u fredfoo -a https%3a%2f%2fresearch-cluster.petermac.org.au%3a443%2fnginx%2finit%3fredir%3d%24http_x_forwarded_escaped_uri.

Maybe strace on this on this will tell us where it’s failing.