Handling Cluster Maintenance


#1

Just wanted to poll the community on how folks handle cluster-wide maintenance on their OOD hosts. Do you manually edit apache conf to serve a static site? Is there a built-in way to toggle OOD logins to show something about the maintenance? Before we start (re)building a wheel I wanted to check.


#2

I use the ood message file. In /etc/ood/config/apps/dashboard/env, I have

MOTD_PATH="/etc/ood/motd"

That file is normally missing, but in the week leading up to maintenance contains a notice of the maintenance period.

That doesn’t help if the web login mechanism is down with your cluster, but for us it works, as that motd page displays right after login.

Otherwise, I think you are indeed stuck with Apache fiddling to display a maintenance page.

Cheers,

Ric


#3

We allow our users to login during regular maintenance days because they can’t run jobs, only queue them. If we need to update OOD, we block with firewall rules. Not a great user experience though so if they’re not paying attention previously to the MOTD notices, email reminders and other posted schedules, then they don’t know why the service is down.


#4

We use the OnDemand Dashboard announcements and the MOTD to communicate an upcoming downtime, but do not have a maintenance mode implemented.

That said, you could try using mod-rewrite and add a rewrite rule to the virtual host to redirect all requests to a maintenance.html page you add to /public, for example.

You would add a static html page like /var/www/ood/public/503.html and then add this to the virtual host of the Apache config:

ErrorDocument 503 /public/503.html
RewriteCond /etc/ood/config/maintenance_mode.txt -f
RewriteCond %{REQUEST_URI} !^.*503.html$
RewriteRule ^(.*)$ /$1 [R=503,L]

The idea is that on each request Apache will first check for the existence of /etc/ood/config/maintenance_mode.txt. If it exists, the response Apache serves is the HTML page /public/503.html. So then you can just do sudo touch /etc/ood/config/maintenance_mode.txt to enable maintenance mode, and then do sudo rm /etc/ood/config/maintenance_mode.txt to disable maintenance mode. As long as Apache is running, requests will be handled properly.

The danger of this of course is depending on where you place these directives you will immediately disable the proxy-ing from Apache to interactive app sessions, etc.

We used a similar approach many years ago in an earlier version of OSC OnDemand for a per-app maintenance mode (enforced by apps having their own .htaccess files).