Issues with nginx error (98: Address already in use)

Okta.

Current config:

oidc_uri: /oidc
oidc_provider_metadata_url: https://iastate.okta.com/.well-known/openid-configuration
oidc_scope: "openid profile email"
oidc_settings:
  OIDCPassIDTokenAs: "serialized"
  OIDCPassRefreshToken: "On"
  OIDCPassClaimsAs: "environment"
  OIDCStripCookies: "mod_auth_openidc_session mod_auth_openidc_session_chunks mod_auth_openidc_session_0 mod_auth_openidc_session_1"

I plan to keep digging into it later today

Figured it out.

Needed this:

user_map_match: '^([^@]+)@.*$'

Jeff, was just wondering if there are any updates on this issue at all? I thought we had it figured out by changing our authentication to basic LDAP but now it seems like new users are getting this issue again. One person was able to log in successfully but now keeps getting the nginx issue, another unable to login at all with the nginx issue. If there are any other suggestions we’d appreciate it. Otherwise we’re working great but for that handful of users they’re hitting a dead end.

Hi

Just curious if there is any similarity between the users that are not working properly? You added the user mapping to fix the original issue. However, there are these handful of folks that aren’t working properly.

Thanks,
-gerald

Gerald, that’s something we are stuck on, there is no similarity between the users that are not working properly that we can think of. It’s kind of strange because one user was able to log in without issue, then later that day returned to our OnDemand instance and got the nginx error and could not log in again after that. We have had this happen to a handful of users and usually they are just not able to sign in at all with the nginx error. This shows us that at least in the case of that particular user, authentication seems to be working as it should, though something seems to get in the way after that. Jeff was so kind as to look at this with us a couple of months back, not sure if there are any additional debugging steps that we can take a look at that we didn’t explore at the time.

We can take a look. Jeff will be back tomorrow. I will see if he can join a call tomorrow with you and I. I have been with Osc for a short time and am still learning the intricacies of the system. I will reach out you off of discourse to setup a time to meet.

Thanks,
-gerald

There’s no update from our side, though we can meet again. I have availability today and then next week.

IIRC we seemed to indicate it was a permission issue. The way it recognizes it’s booted is checking for the sock file which does exist - it just can’t see it. Nginx always boots but we’re then unable to recognize that it has. Let me google around to see if we can up some log entries. Something at the kernel level could be telling us that this is being denied.

@tdockendorf do you have any ideas? You can see my note above about what behavior we’re seeing. Our lua module to check if the socket file exists consistently fails for some users (or maybe just consistently for the Nth user to logon?).

I cannot get auditctl -w /var/run/ondemand-nginx to work correctly in a container, but I’m guessing that’s the direction we need to head in. We should see some denial somewhere.

Usually can just do like grep <something> /var/log/audit/audit.log like a path to look for or dump entire file into allow to see if anything pops up: cat /var/log/audit/audit.log | audit2allow. I don’t think you will be able to get SELinux working inside a container, you’d need to spin up something like a Vagrant VM or mess with things on OSC’s dev machine which has SELinux in permissive mode so really easy to switch to enforcing. I don’t recall, is SELinux enabled on systems where this is an issue? If not then nothing about audit logs would be useful in those cases.

Ya reading earlier in the thread they’d disabled it. Do you know of any other mechanism to triage this? I tried re-writing that lua too (thought maybe it was a library issue) but that didn’t work. Auth seems to be entangled in here too, but I can’t quite tell how.

I guess a spot check to so validate the apache user can see this directory and/or the files within it.

sudo -u apache ls /var/run/ondemand-nginx/<USER>

Hey Trey.

I’m setting up a meeting for Monday with us and the customer to see if we can help them. Would you like to be included?

Thanks,
-gerald

I’m not sure how useful I’d be, all I can think to check is easier to convey with text. I’d try these types of checks:

# lsof /var/run/ondemand-nginx/tdockendorf/passenger.sock 
COMMAND   PID        USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
nginx   69515        root    8u  unix 0xffff8c2560f87b40      0t0 26010073 /var/run/ondemand-nginx/tdockendorf/passenger.sock
nginx   69516 tdockendorf    8u  unix 0xffff8c2560f87b40      0t0 26010073 /var/run/ondemand-nginx/tdockendorf/passenger.sock

Also this:

ls -la /proc/*/fd | grep passenger

Maybe it’s not really related but I was getting the same error and for me it turned out to be due to some usernames having mixed case in AD so the solution in this thread fixed it Forcing lowercase username after login - #3 by ssivy .
This was only happening when using DEX with LDAP/AD instead of the apache basic auth. Somehow the basic auth was either ignoring case or forcing everything lowercase whereas DEX doesn’t seem to have any option to do this.

Cheers,
Chris

We got another ticket on Github with this issue and the username didn’t look right.

That could be it, that the names are being returned funny. What happens when the lua log level to debug. You should be able to see how the users are being mapped and if they’re real linux users or not.

lua_log_level: 'debug'

That may also explain why authentication is in the mix, because the authentication sets the REMOTE_USER which we then use to map to the linux user.

We’re trying out the username fix and seeing how things go. So far so good and those users we were previously having trouble with are now able to log in and I haven’t heard any other complaints yet. We don’t have a lot of users to really test this out with in production so it might be one of those things were we just have to monitor it over time and see if this resolves it or not but so far, so good. Thank you for the suggestions!

I have some updates from another user in this github issue. @rgas20 I’d encourge you to set lua_log_level: 'debug' in ood_portal.yml and see how your users are being mapped. I’d just like to be sure that we all have something that works and not just by coincidence.

I’d specifically look at how the previously erroneous users are being mapped. The user in this github issue seems to be mapping correctly - so it’s a slightly different issue (maybe?) but now that I’ve seen this issue two or three times I may start to understand what it actually is and what we need to do to fix it.

So any help in context around what your actual issue was and how it got resolved would be greatly appreciated. Especially if it was just as simple as lower-casing the usernames! :laughing:

Sorry about the delay, just got a chance to look through the logs. I’m afraid I don’t have too much to offer, I grepped for the user who was having issues back around the time she reported it, and it seemed to be mapping her username correctly at the time she had issues. We only have a handful of users at the moment and so far, they can all use it (including the one who previously had the issue). If we get another one, I’ll report back in.

No problem. I’ll have a patch coming in the next couple weeks - at least usernames with domains. Just to be clear - your issue was solved by lowercasing the username?