Testing ondemand


#1

We have been able to successfully install ondemand and are now in the testing phase. The following is our cluster yml file monsoon.yml:

# /etc/ood/config/clusters.d/monsoon.yml
---
v2:
  metadata:
    title: "Monsoon Cluster"
  login:
    host: "wind.hpc.nau.edu"
job:
    adapter: "slurm"
    cluster: "monsoon"
    bin: "/usr/bin"
    conf: "/etc/slurm/slurm.conf"

When we run the following we get an error we are not sure how to correct. Ignoring the Rails error of course.

[wew@ondemand /var/www/ood/apps/sys/dashboard ]$ sudo su $USER -c ‘scl enable rh-ruby22 nodejs010 – bin/rake test:jobs:monsoon RAILS_ENV=production’
Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/dashboard/log/production.log exists and is writable (ie, make it writable for user and group: chmod 0664 /var/www/ood/apps/sys/dashboard/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
Skipping ‘monsoon’ as it doesn’t allow job submission.

We’re not sure about setup of the remote job submission portion under slurm on our Centos 6 cluster.


#2

As some added information, our ondemand server has slurm and access to all the files, etc. I changed the yml file to point to the local ondemend machine and I still get the following error:

Skipping ‘monsoon’ as it doesn’t allow job submission.


#3

I made a minor edit to your post - placing the ``` “fenced code block”. Once you see that you will notice that the job: key is not indented properly along with metadata and login. I think when you indent it so that it lines up with login and metadata the rake task will work again (though you will still see that annoying Rails Error about production log file - you can ignore that). After the fix:

---
v2:
  metadata:
    title: "Monsoon Cluster"
  login:
    host: "wind.hpc.nau.edu"
  job:
    adapter: "slurm"
    cluster: "monsoon"
    bin: "/usr/bin"
    conf: "/etc/slurm/slurm.conf"

This is actually a problem other people have faced. Which example did you start working from for the cluster config? Perhaps that example has bad formatting.

We actually started to experiment with a possible validator for these config files https://github.com/OSC/ood-dashboard/pull/402 but haven’t completed that exploration yet. It looks like it would be better to push out something that would help in this situation even if it were suboptimal.


#4

My cut and past of the yml file removed the spacing. I used the example on the following url: https://osc.github.io/ood-documentation/master/installation/resource-manager/pbspro.html

I just doublechecked my monsoon.yml file and it has the same indentations as your example.


#5

We can run jobs from the command line with slurm. The test fails though indicating that “Skipping ‘monsoon’ as it doesn’t allow job submission.”

Here is output from the command line:
[wew@ondemand /scratch/wew ]$ sbatch lazyjob.sh
Submitted batch job 14503803


#6

Been one of those few days. Looks like the test is working now.


#7

So sorry you ran into problems! I hope the rest of the installation process goes more smoothly.


#8

So far so good with local password only. We were able to get a job to launch. Now we’re trying to get ldap authentication to work so we can open things up to our users.