Recommendations for multiple clusters

ndusek · May 6, 2021, 3:34pm

We are deploying a new cluster, and the filesystem will be completely separate from the old cluster. We currently have OOD deployed on the old cluster and plan to deploy it for the new cluster as well.

What do you recommend for deployment in this situation, where the filesystems are separate? One OOD instance where the host is connected to both filesystems? Or separate instances of OOD?

I understand that both of these are viable options, but am more curious to hear others’ experiences doing one or the other in production.

jeff.ohrstrom · May 6, 2021, 6:28pm

Are you talking about HOME directories or NFS shared folders like project and scratch?

ndusek · May 6, 2021, 6:29pm

Home, project, and scratch directories all on the same filesystem per cluster. In other words, between the two clusters, there will be no shared file paths or overlap in terms of storage location.

Does that answer your question? Or did I miss it?

jeff.ohrstrom · May 6, 2021, 6:59pm

Yea that’s exactly it. If they have no shared folders, I’m not sure if you can run 1 single instance (or I may lack the imagination for a solution).

You can set the OOD_DATAROOT to be something other than ~/ondemand/data, but it seems like in this case you’d need a different setting per scheduler. Meaning if you could access both schedulers (A and B) and file systems 1 OOD, jobs that ran on scheduler A could read and write files to the NFS data root, but not jobs ran on scheduler B because the shared filesystem isn’t really shared.

Though, I believe SDSC (San Diego Supercomputer Center) may a setup like you’re describing - so they may have some insight into this. In fact, this may be more common that I’m thinking so someone else may be able to shed some light on their setup.

ikirker · May 11, 2021, 10:07pm

@jeff.ohrstrom Apologies for re-raising this thread, but this is an area we’re interested in at our site, too. Am I not correct in remembering that OSC has more than one cluster itself? (Three, I think?)

Do they all have a common filesystem for this purpose? How much other infrastructure do they share? I remember seeing in a demo, for example, that the jobs view can show jobs from all the clusters at once.

jeff.ohrstrom · May 13, 2021, 1:52pm

No issues @ikirker - it’s not solved yet!

But yes, OSC runs 2 Slurm clusters (we’ve been in flux for the last 1 year or so migrating from Torque so we had times where we had 1 Torque and 3 Slurms), but they share all the file systems. I think that’s the issue here - different clusters with completely separate file systems, which we do not have at OSC. We have HOME, scratch and project directories that are accessible from any clusters (even when we were migrating the scheduler).

ndusek · May 19, 2021, 3:33pm

For me, this settles it: We have two separate filesystems, so we’ll have to use two separate OOD instances.

Thanks,
Nick

ndusek · September 16, 2021, 3:02pm

Revisiting this thread based on some new use cases…

Is there any plan to support multiple clusters with different filesystems in the future? I don’t know how heavy of a lift this would be architecturally.

I’m thinking it would be nice to link different apps to different clusters, based on the hardware. For example, maybe you have an old cluster that you want to just run interactive desktops for classroom training, and you don’t want to run these on your main production cluster. Of course, you can always run a separate OOD instance for that, but administratively, it would be convenient to have it all behind one pane of glass so that users don’t have to go to different URLs, and so admins don’t have to maintain multiple portals.

Have you had anyone wanting to do something similar? Should I submit a feature request on GitHub?

jeff.ohrstrom · September 16, 2021, 3:44pm

You should always submit a feature request to github. I tend to remember github tickets much more easily than discourse topics. Which is to say - github tickets are easier to manage and keep track of for us. Discourse topics seem to get lost in a lot of noise.

That said - there is a ticket for this already. Give it a +1 and it may get bumped in priority. Though EXSEDE (or ACCESS as it will be) is trying to have an OnDemand instance that can talk to several service providers - so it’ll come as a part of that effort sometime, though I can’t say when. As you indicate it is sort of a heavy lift becuase the assumption of 1 HOME directory is sort of baked into a lot of places.

github.com/OSC/ondemand

per cluster dataroot

opened 03:15PM - 11 Aug 21 UTC

johrstrom

enhancement component/dashboard

There have been several discourse topics (links needed) where a site has filesys…tems that are distinct to a given cluster. Specifically, their HOME directories are separate for each cluster. Historically we've told folks they need a different ondemand instance per cluster because stuff like `dataroot` evaluate to `$HOME/ondemand/data`. The `staged_root` for a given job (the job's working directory, the webserver templates files and puts them here _before_ submitting the job) is a subdirectory under this `dataroot`. So clearly there's a need for some sites to have these directories on a _per cluster_ basis. If the site could come up with a sshfs scheme to mount the other filesystems, then the webserver could access 2 or more file systems. ```text # here's an example of 'dataroot' locations on a per cluster basis for the oakley and owens clusters. $HOME/ondemand/data/oakley $HOME/ondemand/data/owens ``` What the $HOME filesystem is on the webserver is anyones guess. Maybe it could be a local filesystem that mounts the others?

ndusek · September 16, 2021, 3:59pm

Cool, I gave it a +1. Multiple service providers sounds like a great idea. Looking forward to seeing this feature someday.

jeff.ohrstrom · September 24, 2021, 3:00pm

@ndusek I may have a patch you can apply to get 1 instance working with multiple file systems. Are you interested in such a thing?

jeff.ohrstrom · September 24, 2021, 3:01pm

And/or @ikirker - same message - I may have a patch for multiple filesystems.

ndusek · September 27, 2021, 1:26pm

@jeff.ohrstrom Yes, I would be interested in having a look. I am actually going to be doing a new deployment of OOD over the next couple weeks, so now might be a good time to test out something like this.

jeff.ohrstrom · September 27, 2021, 3:20pm

I’ve updated the same GH ticket. Please follow it to see any updates.

Topic		Replies	Views
Apps in multi-cluster environments Get Help question	4	684	May 26, 2022
Does OOD v1.8 still support our multiple instance setup? Get Help	6	1219	March 14, 2022
Job Composer, multiple OOD's, and shared homes Feature Requests and Roadmap Discussion feature-request	11	907	May 26, 2022
Multiple user projects and OOD paths/dataroot Get Help	3	942	May 26, 2022
Multiple instances issue with interactive apps Get Help	6	142	September 26, 2023

Recommendations for multiple clusters

Related Topics