OOD Status Monitoring - Remote HTTP Status

Is there a particular URI or endpoint I can point my monitoring solution (Telegraf/Grafana) at? Unfortunately, since I am using OIDC with an external Keycloak provider, I receive a 401 error when attempting to configure Telegraf to do an HTTP GET request against the root of my OOD server e.g. https://ondemand.example.com/.

It would be great if there was an status monitoring endpoint, such as https://ondemand.example.com/status to see if Open OnDemand was running and operational.

Unfortunately because of the architecture of OnDemand a /status endpoint would be based on the logged in user since OnDemand is run via the Per-User-NGINX instances. OnDemand does not operate like a standard web application, so the processes running the web app all run as the users that are logged in. The idea of monitoring OnDemand this way is something OSC has wanted to do by potentially writing a Prometheus exporter that actually logs into OnDemand via Keycloak and then can collect things like page load times and whether OnDemand is loading. At this time we have no solution to actually implement this.

1 Like

Do you guys have anything pragmatic that you do to get notified before the end users are aware of any application issues? I like the idea of a service account logging in and timing load times or checking for certain HTML elements.

At this time all we have is indirect checks that things are healthy like checking the a HTTP request to OnDemand has expected status codes using Prometheus blackbox exporter. That status code at this time is seen as valid if the code is a redirect to Keycloak. We do not have anything that checks for issues at the application layer as we have no yet deployed any monitoring that actually logs into OnDemand and starts a PUN. Our main safe guard against production issues is testing things before deployment of any app or OnDemand itself as we have a test environment for OnDemand that mirrors production.

1 Like