Interactive job disappearing when using two OODs

Hello,

We have two OOD environments and both use the same shared home directory. I have noticed that existing interactive jobs were disappeared when I opened both OODs. I guess the problem is both are using the same /home/user/ondemand directory.

Is there a way to change the folder name?

Thank you

You can set the folder name in /etc/ood/config/nginx_stage.yml

via

Unique name of this OnDemand portal used to namespace multiple hosted portals

NB: If this is not set then most apps will use default namespace “ondemand”

image001.png

image002.png

Just to clarify / document, I think what you are asking for is to have 2 OOD environments that act independently? (e.g. if you launch an interactive session in one, it doesn’t show up in the other).

As an aside, at OSC we currently run something like 4 different production OOD instances (along with corresponding dev/test versions), all of which share the same shared home directory.

Can you be more specific about what these two OOD environments are (staging and development or different sites altogether)

As @azric pointed out you can change them so they’re logically different ‘sites’ if that’s the case. We do this, but the installations are logically different - even though they share the same NFS attachments.

We run dev/staging/prod with the same configurations and it’s fine. If this is your use case, then you only need to have the same cluster.d configs populated everywhere.

Here’s a topic on a similar issue from an LSF user who is running it in multi-cluster mode. If you’re running LSF in the same manner, this may be applicable to you.

@alanc @jeff.ohrstrom we have two separate clusters for production and test and two OODs for each cluster. They share a home directory.

@azric I added ondemand_portal: "ood2" but still creating ondemand folder. Do I have to run something to apply the change?

@alanc Currently, if I open my interactive sessions page in one, the existing interactive session in the other is disappeared.

@gp4r can you describe the end state functionality you actually want? Do you want the interactive sessions to show up in both OOD instances? Do you want them to act completely independently and not show sessions started from the other?

As an aside, at OSC we currently have 3 physical clusters, 1 running SLURM the other 2 Torque/MOAB, and multiple OOD instances that can connect to all clusters, shared home directories and have interactive sessions show up across all of them.

@alanc Could you let me know both ways?

For the production and dev cluster, I want to show independently.

I am thinking about adding more OOD into production, this case I want multiple OODs show all sessions.

Do you use some kind of HA proxy to distribute traffics to multiple OODs?

@gp4r if you’ve enabled ondemand_portal: ood2 then you just need to restart your PUN (the restart web server option at the top right) for it to take affect.

If you want sessions to show up in both OOD instances, then both instances need to be carbon copies of each other. That means they have essentially the same /etc/ood directory (nginx_stage.yml, cluster.d directory, etc).

For them to be completely independent, then specifying a different ondemand_portal is the route you should take.

We do not use HA Proxy because each OOD instance is effectively a different site with no redundancy. For dev/staging/production - they all have the same configurations (notably they all point use the same clusters.d folder). For new logically seperate sites altogether (for instance ondemand.osc.edu vs apps.totalsim.us for our commercial clients) we use differing ondemand_portal configurations to separate them.

We removed the restart button on OOD, so I ran “/opt/ood/nginx_stage/sbin/nginx_stage nginx_clean -f”

But it’s still creating ondemand folder.

[root@arcs config]# grep ondemand_portal /etc/ood/config/nginx_stage.yml
#ondemand_portal: null
ondemand_portal: ood2
[root@arcs config]# /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean -f
gp4r

Is " /opt/ood/nginx_stage/sbin/nginx_stage nginx_clean -f" a correct command to restart PUN?

Thank you for your help.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.