Unable to Specify Cores or Memory for Interactive Desktop

Hello all,

I have a production instance of OOD that is close to vanilla/default, and I’m trying to customize it a little. In particular, the Interactive Desktop app by default provides one core to users, and I can’t even determine how it allocates memory. Basically, I’d like to be able to specify the number of cores and memory a user can request for their Interactive Desktop session.

The edits I’ve made to form.yml and submit.yml.erb do not seem to be working, despite my success at similar changes for our Jupyter Notebook/Lab application. For instance, here are some relevant lines from my form.yml:

attributes:
  desktop: "mate"
  bc_vnc_idle: 0
  bc_vnc_resolution:
    required: true
  node_type: null
  memory:
    widget: "number_field"
    max: 128000
    min: 2000
    step: 2000
    value: 4000
    label: "Memory (MB)"
    help: "Enter a value in MB between 2000 and 128000"
  cores:
    widget: "number_field"
    max: 16
    min: 1
    step: 1
    value: 1
    label: "Number of cores"
    help: "Enter a value between 1 and 16"

The fields show up on the form within OOD, but if I choose, e.g., 2 cores, running nproc on the resulting Desktop session shows 1 core only. Here are some relevant lines from submit.yml.erb:

  native: # ... array of command line arguments ...
    - "-c"
    - "<%= cores.blank? ? 1 : cores.to_i %>"
    - "--mem"
    - "<%= memory %>M"

This approach worked well with our Jupyter app, so I’m not sure what is failing here. I suspect I might be missing something trivial, or perhaps this behavior is different for Interactive Desktop sessions. Any advice would be welcome!

Warmest regards,
Jason

You only have attributes listed here. What about the form entires? bc_desktop is special in that you’re actually re-configuring an existing app (this one that’s distributed with the code base, so you probably have to also should set bc_num_slots: nil to let your own cores field take over.

You can always check your /var/log/ondemand-nginx/$USER/error.log for the actual command being executed. You can grep for execve. Here’s an example of a job I just ran on our Torque cluster.

App 45099 output: [2020-07-27 17:33:37 -0400 ]  INFO "execve = [{\"PBS_DEFAULT\"=>\"quick-batch.ten.osc.edu\", \"LD_LIBRARY_PATH\"=>\"/opt/torque/lib64:/opt/rh/rh-nodejs10/root/usr/lib64:/opt/rh/rh-ruby25/root/usr/local/lib64:/opt/rh/rh-ruby25/root/usr/lib64:/opt/rh/httpd24/root/usr/lib64:/opt/ood/ondemand/root/usr/lib64\"}, \"/opt/torque/bin/qsub\", \"-d\", \"/users/PZS0714/johrstrom/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/vdi-owens/output/e50db490-2580-4e3a-8311-beef4604eb8f\", \"-N\", \"ondemand/sys/dashboard/sys/bc_desktop/vdi-owens\", \"-S\", \"/bin/bash\", \"-o\", \"/users/PZS0714/johrstrom/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/vdi-owens/output/e50db490-2580-4e3a-8311-beef4604eb8f/output.log\", \"-j\", \"oe\", \"-l\", \"walltime=08:00:00\", \"-l\", \"nodes=1:ppn=1:owens\", \"/tmp/qsub.20200727-45099-6bv0nc\"]"

Here are my form definitions:

form:
  - bc_vnc_idle
  - desktop
  - bc_num_hours
  - cores
  - memory
  - bc_num_slots
  - node_type
  - bc_queue
  - bc_vnc_resolution
  - bc_email_on_started

I did go ahead and set bc_num_slots to null as well. And here is what Slurm is trying to run… I have no idea where this string is forming:

App 6672 output: [2020-07-27 19:23:29 -0400 ]  INFO "execve = [{\"SLURM_CONF\"=>\"/opt/slurm/slurm.conf\"}, \"/usr/bin/squeue\", \"--all\", \"--states=all\", \"--noconvert\", \"-o\", \"\\u001E%a\\u001F%A\\u001F%B\\u001F%c\\u001F%C\\u001F%d\\u001F%D\\u001F%e\\u001F%E\\u001F%f\\u001F%F\\u001F%g\\u001F%G\\u001F%h\\u001F%H\\u001F%i\\u001F%I\\u001F%j\\u001F%J\\u001F%k\\u001F%K\\u001F%l\\u001F%L\\u001F%m\\u001F%M\\u001F%n\\u001F%N\\u001F%o\\u001F%O\\u001F%q\\u001F%P\\u001F%Q\\u001F%r\\u001F%S\\u001F%t\\u001F%T\\u001F%u\\u001F%U\\u001F%v\\u001F%V\\u001F%w\\u001F%W\\u001F%x\\u001F%X\\u001F%y\\u001F%Y\\u001F%z\\u001F%Z\\u001F%b\", \"-j\", \"2724\", \"-M\", \"hpc\"]

Jason

You’re looking for an sbatch command to submit a job. You’ve given an squeue that gets info from SLURM.

Also, just to be clear (and make sure you don’t have a typo), it’s nil not null.

So, here is the sbatch command:

App 11306 output: [2020-07-27 19:55:58 -0400 ]  INFO "execve = [{\"SLURM_CONF\"=>\"/opt/slurm/slurm.conf\"}, \"/usr/bin/sbatch\", \"-D\", \"/home/simmsj/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/hpc/output/08b4a3cf-cd5a-49d7-b719-4595d73e13b1\", \"-J\", \"sys/dashboard/sys/bc_desktop/hpc\", \"-o\", \"/home/simmsj/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/hpc/output/08b4a3cf-cd5a-49d7-b719-4595d73e13b1/output.log\", \"-t\", \"01:00:00\", \"-N\", \"1\", \"--parsable\", \"-M\", \"hpc\"]"

I don’t see, e.g., a -c 4 or anything similar, which I would expect based on how I have submit.yml.erb configured. It’s like it’s ignoring that file. I don’t know where or why, for instance, it is including the -N 1, and also excluding --mem. Very confusing.

Jason

It could actually be ignoring that file. Try adding a sumbit: "relative/path/to/submit.yml.erb" in the form.yml. The -N 1 is coming from bc_num_slots.

Also here’s how we configure our bc_desktops at OSC because examples are always good. It may be good to have an example you can go off of. Note how we also have to specify a brand new submit yml.

Thanks, I’ll work with it some more and see where I get. Also, though, I do see this line in your owens.yml: bc_queue: null

I thought you were specific that it was nil and not null. Yet again I am confused. :slight_smile:

Jason

My sincere apologies. It’s me who’s confused. You’re right, you should use null because that’s what yaml expects (nil is a ruby thing).

At the head of my submit.yml.erb file, within the bc_desktop app directory, it says:

batch_connect:
  template: vnc

Do you think this is somehow instructing it to look elsewhere and ignore subsequent instructions in that file? I’s also stymied because if you look at the sbatch command that is submitted, one of the arguments is --parsable. So from within the bc_desktop directory, as well as within various parent directories, I ran grep -r parsable *, just to try to find where that is constructed, and nothing came back. Once again, I can’t figure out how that string is being built.

Jason

So it’s important to keep in mind that with the bc_desktop, you’re overwriting our configuration. The files you’ve pointed out are ours. So you need to specify your own submit.yml.erb location.

Ignoring your configs is not quite the right word, it’s more like it needs to be explicitly told that there’s another submit.yml. So in your form.yml you need to specify the submit: attribute to point to some other file as we do here for our owens desktop. Sorry, it took until now that this is your issue!

We’ll then take both of these configs (what we provide in our source tree and what you override in /etc/ood/apps/config/bc_desktop) and merge them (giving precedence to your configs over ours).

--parsable is something we pass to SLURM all the time just to get parsable output back, so that’s not really an app thing but just the way the SLURM integration was built.

Thanks for this more detailed description. Where, though, is your config file that is merged with my config file? So, if I specify an override to merge with yours, I know where mine is, but where is your “master” config file for the bc_desktop application? You mention it’s in “our source tree,” but where is that on the system?

Thanks!

Jason

Your files should be kept in /etc/ood/config/apps/bc_desktop. The source code (from the installation) gets distributed to /var/www/ood/apps/sys/bc_desktop.

Aha! That solved the problem!

I was unaware that there were local config files within /etc. I was editing the versions within /var, which were subsequently getting overwritten by the local options. I’ve got it working now.

Thank you for your patience and guidance. We got there in the end, and I learned a lot!

Jason

1 Like