OOD 3.1 installs Tcl environmental modules on Alma89?

Hi All,

We have just updated our OOD to 3.1.1; and, for the requirements and obsolescence of CentOS 7, we have also updated our OS on the OOD host to Alma 8.9.

It mostly works, except we used to have a “login node” image for the OOD host, together with Lmod Envitonmental modules. For some reason, OOD installation (from RPMs) brings in the original TCL “environmental modules” package that overrides our Lmod.

Did anyone have a similar issue with OOD 3.1? Could the installation be prevented from meddling with Modules? Thanks for any advice!

Looks like the Modules are used by SCL, which is (is it?) needed by OOD RPMs.

rpm -e environment-modules-4.5.2-4.el8.x86_64
error: Failed dependencies:
environment(modules) is needed by (installed) python3-sphinx-1:1.7.6-3.el8.noarch
/usr/bin/modulecmd is needed by (installed) scl-utils-1:2.0.2-16.el8.x86_64

Grigory Shamov
University of Manitoba

OnDemand requires scl-utils and that package as you’ve seen requires the TCL environments RPM. OnDemand has no direct dependency on TCL environments but we do require scl-utils which has that requirement and that’s unavoidable as it’s a dependency set by RedHat.

To avoid loading the TCL modules remove /etc/profile.d/modules.sh and /etc/profile.d/modules.csh. The RPM can be present and just never loaded if you get rid of the environment initialization scripts.

Thanks for the idea. I have already tried exiting from the modules.sh script.
It seem to break another profile.d script which for some reason is doing sed on a Module related thing in a system location (RedHat is great). The script is scl-init.sh . The way it generate errors is this:

sed: can’t read /usr/share/Modules/init/.modulespath: No such file or directory

So, would SCL work without the scl-init profile script? Is this one needed?

In general, one would think RedHat-specific dependencies should be avoided with OOD entirely. Like, can one live without the Devtoolset GCC 13? Would not the GCC 8 from the system do for Ruby, Nginx and HTTPD 2 just as well? The RedHat stuff conflicts with most of HPC software stacks out there.


Grigory Shamov
University of Manitoba

You can try removing the scl-init.sh and see if OnDemand still works. OnDemand is generally designed to run on a dedicated web node where things like Lmod aren’t needed. You may have to modify the scl-init.sh to touch that .modulespath file or some other tricks to avoid TCL loading but allow SCL to load.

OnDemand relies on OS dependencies because vendoring everything has major maintenance and security implications. If we start vendoring Ruby and NodeJS then we are responsible for prompt security releases, etc. Right now we rely on the OS for as much as possible to reduce the maintenance overhead for OnDemand dependencies. We may start vendoring Ruby and NodeJS in future OnDemand releases so we can support distributions where things like Ruby 3.1 or 3.2 are not available, but that work is a long ways from being available for RedHat based distros, if we do decide to take on this work.

Removed both scl-init.* and modules.* from profile.d . This brings back Lmod. Testing the effects on OOD. Looks like may be still working.

OOD seems generally to be not designed like a dedicated Web node (if the word “dedicated” has connotations with minimal dependencies on the rest of the HPC system), it seems rather designed to be run on a “full-scale HPC login node”:

  • Needs access to parallel filesystem
  • Needs to have access to the scheduler.
  • To have access to modules is also not unthinkable. Auto-populating SLURM partitions happening, why not use modules to auto-populate software versions? I am sure there are sites out there that are doing just that.

So, I’d think it would be great if OOD components were orthogonal to all these login node components. In a sense, not conflicting with them.

Grigory:

The whole purpose of Open OnDemand is to be the primary ‘login interface’ to a cluster, hence the requirements to ‘run on a full-scale HPC login node’. In the spirit of that, the platform architecture is designed to rely on underlying system software / configurations where-ever possible to ensure maximum flexibility and compatibility, as well as reduce maintenance requirements as Trey mentioned (e.g. the resource manager, the software module system, user authentication).

If you are interested in learning more about the architecture, we have a good overview at Architecture — Open OnDemand 3.1.0 documentation

We of course are always happy consider changes (and always welcome pull requests to our code base!)