OOD Job Memory reporting

mcuma · June 12, 2020, 10:05pm

We got alerted by an user that the Active Job memory reporting (when one expands the job info) is not correct. This may be specific to SLURM.

The problem seems to be that the SLURM OOD adaptor seems to use “squeue -o %m” option for the Memory report, which corresponds to the MinMemoryCPU value in the SLURM config.

In our case, we have MinMemoryCPU=0, which means the whole node’s memory, for the non-shared partition (only one job per node). Expanded Active Jobs job info shows the memory as:
Memory 0
which is the MinMemoryCPU, but, the actual job’s memory is the whole node (AFAIK not shown by any squeue command option).

For our shared partitions - where multiple jobs can run on a node, we have for example MinMemoryCPU=4000M, that is we give 4 GB per CPU core, and when asking for 4 cores, the correct job memory 16 GB. The Active Jobs shows:
Memory 4000M
which again is the MinMemoryCPU - value per 1 core.

Now, I am not sure what to do about this, as it is possible that other sites may have other policies. Though, regardless of site SLURM config, the MinMemoryCPU=0 is the whole node so perhaps instead of 0 we could display “Whole node”? For the shared, we could use simple math, MinMemoryCPU*NumCPUs ("squeue -o “%m %C”), though the MinMemoryCPU is a string so it’d need to be parsed.

I am not sure if this is worth the fuss but it does confuse our users as the memory they think they requested is not shown in the Active Jobs, so it would be nice to give some thoughts.

On a similar note, it would be useful to show max memory allocated to the job in the My Interactive Sessions, along with the App name, job #, # of nodes and cores, like e.g.:
RStudio server on Notchpeak (1409006) 16 GB memory | 1 node | 4 cores | Running

Please, let me know what you think about this.

Thanks,
MC

jeff.ohrstrom · August 4, 2020, 7:28pm

Hey, sorry for the delay. It seems like you’re right that we have an issue on our side in how we’re extracting this information. We’ll look into it and report back.

jeff.ohrstrom · March 25, 2021, 3:51pm

Hey, sorry we never looked into it until now. It appears that yes, we could compute MinMemoryCPU*NumCPUs ("squeue -o “%m %C”) for Slurm and provide it instead of min memory.

I’m moving this into feature requests now. I’ve also created several tickets for us and this feature (along with the feature request to add memory to the card heading).

mcuma · March 25, 2021, 4:10pm

Hi Jeff,

sounds good. Please just be aware that if MinMemoryCPU=0, it means the whole node, so include a condition like this in your display.

Thanks,
MC

jeff.ohrstrom · March 25, 2021, 4:15pm

Thanks! You can watch this github issue for updates.

Topic		Replies	Views
Error on the 'active jobs' page, redo Get Help question	9	899	May 26, 2022
Completed jobs history Get Help question	3	188	November 20, 2023
Interactive jobs "disappearing" Get Help	7	559	May 19, 2022
Specifying memory or leaving it blank (Slurm) Get Help	4	1073	May 26, 2022
Error Creating Slurm Batch Job from Within an Interactive Desktop Get Help	4	149	January 1, 2024

OOD Job Memory reporting

Related Topics