Nginx Error: Address already in use

We are doing a fresh install of the latest version of OOD and running into issues when trying to log users in.

For context, we are running CentOS 7, OOD 1.8, with the Dex LDAP connector.

LDAP is able to find the user just fine; however, we are consistently getting this error when a user is logged in:

Error -- nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/xxxx/passenger.sock failed (98: A    ddress already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/xxxx/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/xxxx/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/xxxx/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/xxxx/passenger.sock failed (98: Address already in use)
nginx: [emerg] still could not bind()

Any feedback here would be appreciated.

Thank you in advance!

I am having the same issue with the campus installation at my university. The issue seems unmanageable, with some uesrs reporting problems, sometimes cookie clearing fixes it, sometimes clearing the socket on the backend clears it, and sometimes there is nothing I can do to help a student.

@daroachgb

Try these commands and $USER replaced with your Linux username:

/opt/ood/nginx_stage/sbin/nginx_stage nginx_clean --user=$USER
/opt/ood/nginx_stage/sbin/nginx_stage nginx --user=$USER
1 Like

Thank you for the response here Mario. We have tried this process a few times and still come up with the same error.

I think the crontab entry to clean nginxs is helpful to prevent this. It’s here at /etc/cron.d/ood. Is there any syslog entry for nginx_clean that seems to indicate it’s not running correctly?

To be clear, does anything resolve this issue temporarily? I mean are you able to nginx_stage clean and allow the users to login? Or is is the case that no user can ever login, even after cleaning?

Jeff,

Nothing resolves this issue unfortunately. We have tried running that command multiple times and are stuck at this stage.

Best regards,

Juan

here’s my best guess as to what’s happening: The first nginx starts and crashes, but doesn’t release the socket file or zombies.

(1) I would try to find out what the process is still using it. lsof will give you this. See if that process is still running or what’s going on with that.

[jeff@518bd97864ee /]$ lsof /var/run/ondemand-nginx/jeff/passenger.sock 
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
nginx   379 jeff    6u  unix 0x00000000ff78144c      0t0 2523864 /var/run/ondemand-nginx/jeff/passenger.sock type=STREAM

(2) I would also check if there are coredumps in /var/lib/systemd/coredump or if there are any messages in journalctl related to crashes.

(3) I would do a spot check on ulimits. You could be hitting some limit. Specifically max # of files or processes.

[johrstrom ~()] 🐰  ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63375
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

(4) After all that, we could employ strace to dig into what it’s doing (and failing to do)
Here’s a simple nginx wrapper that you could reconfigure the nginx_bin configuration of your /etc/ood/config/nginx_stage.yml file.

(you could parse out a user name from the -c /var/lib/ondemand-nginx/config/puns/<USER>.conf option if you like to specify a new filename. I put the $(date +%s) here to get unqiue files, given your going to get 1 that we’re intersted in - the very first - and the others will throw the error in the message you’ve given there about the address being already in use).

#!/bin/bash

/bin/strace -o /tmp/nginx_strace_$(date +%s).out /opt/ood/ondemand/root/usr/sbin/nginx $@

Jeff,

Thank you for all of these debugging steps. We are going through another fresh install and will revisit these steps once we come across this issue again.

Will get back to you with our results.

Best regards,

Juan

Thank you Mario. This did not resolve my error, but in combination with my end user clearing their cache and cookies they were able to log in.

The behavior is very inconsistent for me, as sysadmin I fear when I get user emails on this issue. Sometimes clearing cache cookies works, sometimes your suggested nginx stage tricks work, sometimes restarting the whole web portal VM works.

1 Like