VirtualGL in OOD apps

This is a question for the OOD team.

I have noticed that some OOD apps that OSC publishes, which can use OpenGL graphics (Ansys, Matlab, VMD, Paraview, …) use VirtualGL. VGL needs X server running on the compute node, which in turn will sit on the GPU and potentially eat away resources from the CUDA computational jobs that may run on that GPU.

So, my question is, how does OSC technically allow VirtualGL? Do you run X Server on all your compute nodes? Or, start it with a job start using a flag (which I don’t see in the OOD apps so probably not). Or have cheap GPUs in each node used for GL that are independent from the computational GPUs? Or, something else?

I’d appreciate some details that would allow us to consider such deployment over here.

Naturally, if other HPC centers have their own solution for compute node GL rendering it’d be great to hear them.

What we do over here is to have a set of standalone (interactive) nodes that run X and VGL on mid-range Nvidia GTX cards, but, we don’t have any X or VGL on the cluster compute nodes. Most of our computes only have onboard video cards and our GPU nodes are heavily utilized with computation, but, we’d like to see if there is a room for using the GPU nodes for GL with OOD apps like Ansys or Paraview.

Thanks,
MC

Hi, everybody

We are using such configuration : a graphic node supporting the turbovnc sessions and all the compute nodes running X11 with Nvidia driver. All the opengl codes called inside the script. sh. erb are prefixed by vglrun.

When I’ll be back in my office, I can add our config into this topic, if you’re interested

Jean Marie

Thanks @jms27000! @mcuma sorry I didn’t see this earlier! Yes I believe we have X11 libraries installed on all our compute nodes so that someone may run an interactive session on them, whether they have GPUs or not (and most do not, I can’ say the exact %, but let’s say only 1/4 do as an approximate guess).

So all compute nodes have VirtualGL libraries, and can run X11 sessions, but not all compute nodes have GPUs.

How we actually do segregation or limits, I’m not sure. If two users get scheduled on the same compute node, one requesting the GPU specifically, and the other by accident, what’s to stop the second user from using the GPU? cgroup configurations? Seems like something the scheduler should work out to limit the second user. I can’t say for sure.

@tdockendorf may have more for you.

OK, thanks. Looks like you all run X server then when the nodes are running. Since our admins are not fans of running X server on the computes, I think we’ll stick to our current setup with dedicated interactive nodes with X server and VirtualGL and revisit if users start demanding it on the computes (so far they don’t).