Maintenance mode shown when you Restart Web Server from Help tab RHEL 8.3

I have a new test instance of Open OnDemand running version 1.8.20-1.el8 on RedHat 8.3. Everytime I use Help => Restart Web Server, the instance behaves as if it is set in maintenance mode. There is no /etc/ood/maintenance.enable file but it redirects as if there is; showing the Maintenance Mode html page. If I turn off the maintenance mode capability in the ood_portal.yml file it then shows a 503 error. I cannot get it back out of maintenance mode without rebooting the server. I have tried restarting httpd but it keeps showing the Maintenance page.
If I start a new browser window or use a different browser altogether, I am allowed to login (using DEX and ldap with DUO). It still shows the maintenance page.

How is this happening? How do I avoid it or at least get out of this perceived maintenance mode?

I would look in the apache logs to see which request is resulting in the 503 error.
Also check that the PUN actually restarted. I think apache will return a 503 error if it
tries to proxy a request to the PUN and it isn’t available.

This is the sum total of the error logs from the instance.

[root@ondemand1 log]# tail -2 httpd/error.log
[Mon May 10 12:17:38.524199 2021] [proxy:error] [pid 25998:tid 140111228401408] (111)Connection refused: AH02454: HTTP: attempt to connect to Unix domain socket /var/run/ondemand-nginx/jasw8470/passenger.sock (*) failed
[Mon May 10 12:17:38.524255 2021] [proxy_http:error] [pid 25998:tid 140111228401408] [client 172.21.35.69:39340] AH01114: HTTP: failed to make connection to backend: httpd-UDS, referer: http://ondemand1.rc.int.colorado.edu/pun/sys/dashboard

[root@ondemand1 log]# tail ondemand-nginx/error.log
2021/05/10 12:17:38 [notice] 44723#0: signal process started

[root@ondemand1 log]# tail ondemand-nginx/jasw8470/error.log
[ N 2021-05-10 12:17:38.3858 44460/T8 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2021-05-10 12:17:38.3858 44460/Ta Ser/Server.h:902 ]: [ServerThr.2] Freed 0 spare client objects
[ N 2021-05-10 12:17:38.3858 44460/Ta Ser/Server.h:558 ]: [ServerThr.2] Shutdown finished
[ N 2021-05-10 12:17:38.3858 44460/Tc Ser/Server.h:902 ]: [ServerThr.3] Freed 0 spare client objects
[ N 2021-05-10 12:17:38.3858 44460/Tc Ser/Server.h:558 ]: [ServerThr.3] Shutdown finished
[ N 2021-05-10 12:17:38.3859 44460/Te Ser/Server.h:902 ]: [ServerThr.4] Freed 0 spare client objects
[ N 2021-05-10 12:17:38.3859 44460/Te Ser/Server.h:558 ]: [ServerThr.4] Shutdown finished
[ N 2021-05-10 12:17:38.3859 44460/Tg Ser/Server.h:902 ]: [ApiServer] Freed 0 spare client objects
[ N 2021-05-10 12:17:38.3859 44460/Tg Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2021-05-10 12:17:40.0998 44460/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished

Hey sorry for the delay. I wonder if the maintenance page is just being shown by default for any 503. Can you turn that feature off and see what the behaviour is after that?

# /etc/ood/config/ood_portal.yml
use_maintenance: false

Also here’s a similar topic where I describe the triaging steps.

I’d wonder off hand if this is a new VM, what your ulimits are like. I’ve shown in that topic what ours are.

OK last message in this burst I promise. This looks like it’s an selinux issue. If you have selinux enabled you can install the ondemand-selinux RPM to create a policy that’s compliant with all the stuff we need to do.

Already have ondemand-selinux installed and created policy to allow ondemand to function.

ood.te

module ood 1.0;

require {
type var_lib_t;
type sssd_conf_t;
type unreserved_port_t;
type selinux_config_t;
type ood_apps_t;
type rhsmcertd_t;
type ood_pun_t;
type node_t;
type httpd_t;
type automount_tmp_t;
type sshd_key_t;
type var_run_t;
type gpg_exec_t;
type ssh_keysign_exec_t;
class tcp_socket { name_connect node_bind };
class file { append create execute execute_no_trans getattr map open read write };
class sock_file { create getattr setattr write };
class netlink_route_socket { bind create getattr nlmsg_read };
class dir { add_name create read setattr write };
}

#============= httpd_t ==============

#!!! This avc is allowed in the current policy
allow httpd_t sssd_conf_t:file read;
allow httpd_t sssd_conf_t:file open;
allow httpd_t var_run_t:sock_file { getattr write };

#============= ood_pun_t ==============
allow ood_pun_t automount_tmp_t:dir read;
allow ood_pun_t ood_apps_t:file append;

#!!! This avc can be allowed using the boolean ‘ondemand_use_torque’
allow ood_pun_t self:netlink_route_socket { bind create getattr nlmsg_read };
allow ood_pun_t selinux_config_t:dir read;

#!!! This avc can be allowed using the boolean ‘domain_can_mmap_files’
allow ood_pun_t ssh_keysign_exec_t:file map;
allow ood_pun_t ssh_keysign_exec_t:file { execute execute_no_trans open read };
allow ood_pun_t sshd_key_t:file { open read };

#!!! This avc can be allowed using the boolean ‘ondemand_use_torque’
allow ood_pun_t unreserved_port_t:tcp_socket name_connect;
allow ood_pun_t var_run_t:dir { add_name create setattr write };
allow ood_pun_t var_run_t:file { create open read write };
allow ood_pun_t var_run_t:sock_file { create setattr };

#============= rhsmcertd_t ==============

#!!! This avc is allowed in the current policy
allow rhsmcertd_t gpg_exec_t:file execute;
allow rhsmcertd_t gpg_exec_t:file { open read };

#!!! This avc is allowed in the current policy
allow rhsmcertd_t node_t:tcp_socket node_bind;
allow rhsmcertd_t var_lib_t:file { getattr read write };

If I turn off use_maintenance I get a generic 503 error screen.

This is a brand new RHEL 8.3 VM

Using the troubleshooting steps from the article.

lsof before “Restarting the Web Server”

[root@ondemand1 ~]# lsof /var/run/ondemand-nginx/jasw8470/passenger.sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 2682 root 8u unix 0xffff9b308c707980 0t0 37053 /var/run/ondemand-nginx/jasw8470/passenger.sock type=STREAM
nginx 2683 jasw8470 8u unix 0xffff9b308c707980 0t0 37053 /var/run/ondemand-nginx/jasw8470/passenger.sock type=STREAM

After (when the error is occuring) no output
[root@ondemand1 ~]# lsof /var/run/ondemand-nginx/jasw8470/passenger.sock
[root@ondemand1 ~]#

ulimits match closely to the limits shown in your article with
[root@ondemand1 ~]# ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62900
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 62900
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Log from /var/log/ondemand-nginx/jasw8470/error.log seems to still point at selinux

2021/05/10 14:39:38 [alert] 2682#0: unlink() “/var/run/ondemand-nginx/jasw8470/passenger.pid” failed (13: Permission denied)
2021/05/10 14:39:38 [emerg] 2682#0: unlink() /var/run/ondemand-nginx/jasw8470/passenger.sock failed (13: Permission denied)

[ N 2021-05-10 14:39:38.5161 2670/T8 age/Cor/CoreMain.cpp:671 ]: Signal received. Gracefully shutting down… (send signal 2 more time(s) to force shutdown)
[ N 2021-05-10 14:39:38.5161 2670/T1 age/Cor/CoreMain.cpp:1246 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected…
[ N 2021-05-10 14:39:38.5162 2670/T8 Ser/Server.h:902 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2021-05-10 14:39:38.5162 2670/T8 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2021-05-10 14:39:38.5162 2670/Tc Ser/Server.h:902 ]: [ServerThr.3] Freed 0 spare client objects
[ N 2021-05-10 14:39:38.5162 2670/Ta Ser/Server.h:902 ]: [ServerThr.2] Freed 0 spare client objects
[ N 2021-05-10 14:39:38.5162 2670/Tc Ser/Server.h:558 ]: [ServerThr.3] Shutdown finished
[ N 2021-05-10 14:39:38.5162 2670/Ta Ser/Server.h:558 ]: [ServerThr.2] Shutdown finished
[ N 2021-05-10 14:39:38.5162 2670/Te Ser/Server.h:902 ]: [ServerThr.4] Freed 0 spare client objects
[ N 2021-05-10 14:39:38.5163 2670/Te Ser/Server.h:558 ]: [ServerThr.4] Shutdown finished
[ N 2021-05-10 14:39:38.5164 2670/Tg Ser/Server.h:902 ]: [ApiServer] Freed 0 spare client objects
[ N 2021-05-10 14:39:38.5164 2670/Tg Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2021-05-10 14:39:39.2200 2670/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished

Regenerated selinux policy and have been able to restart the web server cleanly. If you get stuck in between you can simply remove the pid and sock files which will allow the server to restart.

New te file below:

module passenger_sock2 1.0;

require {
type sshd_key_t;
type httpd_t;
type selinux_config_t;
type sssd_conf_t;
type var_run_t;
type automount_tmp_t;
type gpg_exec_t;
type ood_apps_public_t;
type ood_pun_t;
type node_t;
type var_lib_t;
type ssh_keysign_exec_t;
type ood_apps_t;
type unreserved_port_t;
type rhsmcertd_t;
class tcp_socket { name_connect node_bind };
class file { append create execute execute_no_trans getattr map open read unlink write };
class sock_file { create getattr setattr unlink write };
class netlink_route_socket { bind create getattr nlmsg_read };
class dir { add_name create read remove_name setattr write };
}

#============= httpd_t ==============

#!!! This avc is allowed in the current policy
allow httpd_t ood_apps_public_t:file map;

#!!! This avc is allowed in the current policy
allow httpd_t sssd_conf_t:file { open read };

#!!! This avc is allowed in the current policy
allow httpd_t var_run_t:sock_file { getattr write };

#============= ood_pun_t ==============

#!!! This avc is allowed in the current policy
allow ood_pun_t automount_tmp_t:dir read;

#!!! This avc is allowed in the current policy
allow ood_pun_t ood_apps_t:file append;

#!!! This avc is allowed in the current policy
allow ood_pun_t self:netlink_route_socket { bind create getattr nlmsg_read };

#!!! This avc is allowed in the current policy
allow ood_pun_t selinux_config_t:dir read;

#!!! This avc is allowed in the current policy
allow ood_pun_t ssh_keysign_exec_t:file { execute execute_no_trans map open read };

#!!! This avc is allowed in the current policy
allow ood_pun_t sshd_key_t:file { open read };

#!!! This avc is allowed in the current policy
allow ood_pun_t unreserved_port_t:tcp_socket name_connect;

#!!! This avc is allowed in the current policy
allow ood_pun_t var_run_t:dir { add_name create remove_name setattr write };
allow ood_pun_t var_run_t:file unlink;

#!!! This avc is allowed in the current policy
allow ood_pun_t var_run_t:file { create open read write };
allow ood_pun_t var_run_t:sock_file unlink;

#!!! This avc is allowed in the current policy
allow ood_pun_t var_run_t:sock_file { create getattr setattr write };

#============= rhsmcertd_t ==============

#!!! This avc is allowed in the current policy
allow rhsmcertd_t gpg_exec_t:file { execute open read };

#!!! This avc is allowed in the current policy
allow rhsmcertd_t node_t:tcp_socket node_bind;

#!!! This avc is allowed in the current policy
allow rhsmcertd_t var_lib_t:file { getattr open read write };

Do the audit logs have any denials about this in them? Quickly checking the logic of the code we’re in sudo /opt/ood/nginx_stage/sbin/nginx_stage trying to send a stop signal to the nginx that’s running. Those logs [alert] 2682#0: unlink() are from nginx itself. So we’re executing nginx stop (with some other args and so on) as root itself.

Here is what the permissions for these files look like for me.

[root@b524ac0c1d39 ~] ls /var/run/ondemand-nginx/jeff/ -lrt
total 4
srw-rw-rw-. 1 root root 0 May 11 13:35 passenger.sock
-rw-r--r--. 1 root root 4 May 11 13:35 passenger.pid

Audit logs were reporting denials until I created the new SElinx policy that handled those errors. See the above .te file for the policies. The ones that were added are the sock_file entries

File permissions are the same

[root@ondemand1 ~]# ls -lrt /var/run/ondemand-nginx/jasw8470/
total 4
srw-rw-rw-. 1 root root 0 May 11 09:28 passenger.sock
-rw-r–r–. 1 root root 6 May 11 09:28 passenger.pid

@jasw8470 thanks for figuring this out a bit!

Looks like this is the addition we need, to be able to unlink?

I don’t quite know how to translate that into an selinux policy

I’m guessing it’s somewhere around here? @tdockendorf please advise.

Something like class sock_file { create getattr setattr unlink write }; just loads those things so they can be used by actual policy allows, it’s sort of like a Python import or Ruby require.

Based on the comments from generated policy it looks like only these items maybe not covered by existing policies but hard to say

allow ood_pun_t var_run_t:file unlink;
allow ood_pun_t var_run_t:sock_file unlink;