Getting "Address already in use" error

Hello,

I just upgraded OnDemand (from 1.3.7 to 1.8.20). However, when I access the portal, after authentication, I get

Error – nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/XXXXX/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/XXXXX/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/XXXXX/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/XXXXX/passenger.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/var/run/ondemand-nginx/XXXXX/passenger.sock failed (98: Address already in use)
nginx: [emerg] still could not bind()

So far, I have tried:

  1. a completely new browser
  2. /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean --user=$USER
  3. Completely removing the user’s folder under /var/run/ondemand-nginx
  4. Downgrading to 1.8.12

No luck so far.

Here is some data that says 2 processes are competing for the socket file which probably is the cause. One of them is running as root and the other looks like the child process running as the user.

[root@XXXXXX ~]# lsof /var/run/ondemand-nginx/XXXXXX/passenger.sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 3310 root 8u unix 0xffff9d9ee2ef6640 0t0 49148 /var/run/ondemand-nginx/XXXXXX/passenger.sock
nginx 3311 XXXXXX 8u unix 0xffff9d9ee2ef6640 0t0 49148 /var/run/ondemand-nginx/XXXXXX/passenger.sock
[root@XXXXXX ~]#

Thanks.

Hi and welcome! What are the processes 3310 and 3310? I take it their nginx processes’ but I’m wondering if they’re defunct or running.

When you complete step 2 (/opt/ood/nginx_stage/sbin/nginx_stage nginx_clean --user=$USER) I would search for these processes’ if they’re still up or what. Also be sure it removes the socket file.

[root@faa43c1fc202 ~]# sudo /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean --user jeff
[root@faa43c1fc202 ~]# ls /var/run/ondemand-nginx/jeff/ -lrt
total 0
[root@faa43c1fc202 ~]# ps -elf | grep nginx
0 S root         636     431  0  80   0 -  2297 -      17:59 pts/0    00:00:00 grep --color=auto nginx

I confirmed that even if you force kill that nginx master process, it’s smart enough to remove the socket file. But even if it doesn’t, you can try to force kill the nginx master (and any child processes’ if they exist) and force remove the socket file.

[root@faa43c1fc202 ~]# ps -elf | grep nginx
5 S root         670       1  0  80   0 - 23026 -      18:00 ?        00:00:00 nginx: master process (jeff) -c /var/lib/ondemand-nginx/config/puns/jeff.conf
5 S jeff         671     670  0  80   0 - 26668 -      18:00 ?        00:00:00 nginx: worker process
0 S root         727     431  0  80   0 -  2297 -      18:00 pts/0    00:00:00 grep --color=auto nginx
[root@faa43c1fc202 ~]# kill 670
[root@faa43c1fc202 ~]# ps -elf | grep nginx
0 S root         741     431  0  80   0 -  2297 -      18:01 pts/0    00:00:00 grep --color=auto nginx
[root@faa43c1fc202 ~]# ps -elf | grep jeff 
4 S jeff           1       0  0  80   0 -  2974 -      17:51 ?        00:00:00 /bin/bash /entrypoint.sh
4 S jeff         420       0  0  80   0 -  3007 -      17:52 pts/0    00:00:00 /bin/bash
0 S root         743     431  0  80   0 -  2297 -      18:01 pts/0    00:00:00 grep --color=auto jeff
[root@faa43c1fc202 ~]# ls /var/run/ondemand-nginx/jeff/ -lrt
total 0

I suspect this is causes by some resource constraint. Here’s steps to debug and try to get a strace for starting up nginx.