Maintenance page without maintenance enabled

Hello,

I installed and configured Ondemand on RHEL7 via the “official” puppet module. The deployment also has Keycloak integration for authN.

A few weeks ago a some users started reporting that “from time to time”, when they try to access ood, they get the maintenance page. To solve it they didn’t need to do anything, just wait for a few seconds and refresh the page and the maintenance page goes away.

At that time I was running 1.7.11. This week I upgraded to 1.7.14 but apparently people are still having the same issue.
It seems it happens quiet often (few times per day maybe)

Do you have any idea of why this might be happening? Do you know how I can investigate this issue? What logs should I look at?

Thank you very much,
Daniel.

Users are saying that when this happens they often need to relogin again. Might this issue be related to having non matching values for the session timeout and idle session timeout in Keycloak and Ondemand ? (I had 8h for ondemand and 4h in Keycloak)

I don’t think this has anything to do with session settings. The code in Apache configs that handles the maintenance page is told to not show the maintenance page unless /etc/ood/maintenance.enable exists. The fact you see it without that file is very strange and not something we’ve been able to reproduce. We also use Keycloak with OnDemand and have maintenance enabled.

For now you may want to just disable the maintenance page logic by setting use_maintenance: false in /etc/ood/config/ood_portal.yml and then re-running /opt/ood/ood-portal-generator/sbin/update_ood_portal

Hi Trey,

thanks a lot for your message and your incredible puppet modules. I am a big fan of your work!

Now for this, I was indeed able to reproduce what my users were saying. These are the steps I followed:

Edit /opt/rh/httpd24/root/etc/httpd/conf.d/auth_openidc.conf and set some ridiculously low values for the OIDC session expiration:

OIDCSessionInactivityTimeout 15
OIDCSessionMaxDuration 60

Start an interactive app (in my case a Jupyterlab server) and wait until the session expires on its own. You will start getting some errors in the jupyter notebook. I you open Ondemand again you will get the maintenance page.

I think I found the problem. In /opt/rh/httpd24/root/etc/httpd/conf.d/ood-portal.conf comment out or delete the line that contains ErrorDocument 503 /public/maintenance/index.html and see if the issue goes away. I’ve taken one of our dev instances of OnDemand that is connected to Keycloak and made same config changes and have not yet been able to reproduce.

If commenting out ErrorDocument doesn’t help, I’m curious about your session timeout settings on Keycloak side. The changes we’ve made from defaults are the following realm properties:

"accessTokenLifespan" : 1800,
"ssoSessionIdleTimeout" : 3600,
"ssoSessionMaxLifespan" : 604800,

The above are from kcadm.sh doing something like kcadm.sh get realms.

Some other non-standard configs we have set for our OIDC instances:

oidc_settings:
  OIDCPassIDTokenAs: serialized
  OIDCPassRefreshToken: 'On'
  OIDCPassClaimsAs: environment
  OIDCStripCookies: mod_auth_openidc_session mod_auth_openidc_session_chunks mod_auth_openidc_session_0
    mod_auth_openidc_session_1

I commented out that line and now I am getting the default Apache 503 error page:

Service Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

My realm values are (I think Keycloak’s defaults or very similar to them):

"accessTokenLifespan" : 3600, (1h)
"ssoSessionIdleTimeout" : 14400, (4h)
"ssoSessionMaxLifespan" : 28800, (8h)

So you are still getting a 503 from the maintenance RewriteRule but makes no sense why you’d be getting that. There are by default 3 RewriteCond that are treated as AND conditions and at least 2 of them are going to be true in most cases but the RewriteCond /etc/ood/maintenance.enable -f is going to be false so the RewriteRule should never get hit. This feels like a bug with mod_auth_openidc, but have very little to prove that.

Can you remove the maintenance logic with use_maintenance: false in ood_portal.yml and see if the issue goes away? I am curious if you might hit some other issue that the maintenance rewrite is masking.

Also curious if you have any files with 503 behavior defined, so can check I think with grep -HnR "503" /opt/rh/httpd24/root/etc/httpd/conf*. If you are using Puppetlabs Apache module then your Apache config is likely very similar to ours and there won’t be any other files with 503 behavior defined.