SGE Configuration Test

Just installed Open OnDemand and am able to get the test configuration at

https://osc.github.io/ood-documentation/master/installation/resource-manager/test.html

to work with slurm but are having issues with UGE. (Running both schedulers on the cluster (different nodes) for testing)

The uge_genius.yml file is

v2:
metadata:
title: “Genius Cluster (UGE)”
login:
host: “genius.hpcc.ttu.edu”
job:
adapter: “sge”
cluster: “genius”
bin: “/export/uge/bin/lx-amd64”
# conf: “”
sge_root: “/export/uge”
libdrmaa_path: “/export/uge/lib/lx-amd64/libdrmaa.so”
# bin_overrides:
# sbatch: “/usr/local/bin/sbatch”
# squeue: “”
# scontrol: “”
# scancel: “”

From my account execute

sudo su $USER -c ‘scl enable ondemand – bin/rake test:jobs:uge_genius RAILS_ENV=production --trace’

and receive the following output with the TypeError.

** Invoke test:jobs:uge_genius (first_time)
** Invoke environment (first_time)
** Execute environment
Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/dashboard/log/production.log exists and is writable (ie, make it writable for user and group: chmod 0664 /var/www/ood/apps/sys/dashboard/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
** Invoke /home/thomasbr/test_jobs (first_time, not_needed)
** Execute test:jobs:uge_genius
Testing cluster ‘uge_genius’…
Submitting job…
rake aborted!
TypeError: no implicit conversion of nil into String
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge/batch.rb:36:in initialize' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge/batch.rb:36:in new’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge/batch.rb:36:in initialize' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb:19:in new’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb:19:in build_sge' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/factory.rb:36:in build’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/cluster.rb:78:in job_adapter' /var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:29:in block (4 levels) in <top (required)>’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in block in execute' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in each’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in execute' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:214:in block in invoke_with_call_chain’
/opt/rh/rh-ruby24/root/usr/share/ruby/monitor.rb:214:in mon_synchronize' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:194:in invoke_with_call_chain’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:183:in invoke' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:160:in invoke_task’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in block (2 levels) in top_level' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in each’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in block in top_level' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:125:in run_with_threads’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:110:in top_level' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:83:in block in run’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:186:in standard_exception_handling' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:80:in run’
bin/rake:4:in `’
Tasks: TOP => test:jobs:uge_genius

The UI shows the cluster but displays the same TypeError. Thanks.

Line 36 is your config file (the conf keyword). Looks like you can’t have that commented out. You can try setting it to "", an empty string or make an empty file (like /tmp/test_sge) and point it there.

When I look at the documentation it says that the conf value is optional (documentation may be wrong?). Created an empty conf file /tmp/test_sge.conf and modified the yml file to point to it. It now gives the following with error:

working directory /home/thomasbr/test_jobs
** Invoke test:jobs:uge_genius (first_time)
** Invoke environment (first_time)
** Execute environment
Rails Error: Unable to access log file. Please ensure that /var/www/ood/apps/sys/dashboard/log/production.log exists and is writable (ie, make it writable for use r and group: chmod 0664 /var/www/ood/apps/sys/dashboard/log/production.log). The log level has been raised to WARN and the output directed to STDERR until the pro blem is fixed.
** Invoke /home/thomasbr/test_jobs (first_time, not_needed)
** Execute test:jobs:uge_genius
Testing cluster ‘uge_genius’…
Submitting job…
[2019-12-03 07:51:50 -0600 ] INFO "execve = [{}, “/export/uge/bin/lx-amd64/qsub”, “-wd”, “/home/thomasbr/test_jobs”, “-N”, “test_jobs_uge_genius”, "-o “, “/home/thomasbr/test_jobs/output_uge_genius_2019-12-03T07:51:50-06:00.log”, “-l”, “h_rt=00:01:00”]”
rake aborted!
OodCore::JobAdapterError: Unable to run job: can’t resolve hostname “/home/thomasbr/test_jobs/output_uge_genius_2019-12-03T07”.
Exiting.
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb:88:in rescue in submit' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb:82:in submit’
/var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:30:in block (4 levels) in <top (required)>' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in block in execute’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in each' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in execute’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:214:in block in invoke_with_call_chain' /opt/rh/rh-ruby24/root/usr/share/ruby/monitor.rb:214:in mon_synchronize’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:194:in invoke_with_call_chain' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:183:in invoke’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:160:in invoke_task' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in block (2 levels) in top_level’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in each' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in block in top_level’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:125:in run_with_threads' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:110:in top_level’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:83:in block in run' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:186:in standard_exception_handling’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:80:in run' bin/rake:4:in

Caused by:
OodCore::Job::Adapters::Sge::Batch::Error: Unable to run job: can’t resolve hostname “/home/thomasbr/test_jobs/output_uge_genius_2019-12-03T07”.
Exiting.
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge/batch.rb:175:in call' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge/batch.rb:164:in submit’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/ood_core-0.9.3/lib/ood_core/job/adapters/sge.rb:86:in submit' /var/www/ood/apps/sys/dashboard/lib/tasks/test.rake:30:in block (4 levels) in <top (required)>’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in block in execute' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in each’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:273:in execute' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:214:in block in invoke_with_call_chain’
/opt/rh/rh-ruby24/root/usr/share/ruby/monitor.rb:214:in mon_synchronize' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:194:in invoke_with_call_chain’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/task.rb:183:in invoke' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:160:in invoke_task’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in block (2 levels) in top_level' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in each’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:116:in block in top_level' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:125:in run_with_threads’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:110:in top_level' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:83:in block in run’
/var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:186:in standard_exception_handling' /var/www/ood/apps/sys/dashboard/vendor/bundle/ruby/2.4.0/gems/rake-12.3.3/lib/rake/application.rb:80:in run’
bin/rake:4:in `’
Tasks: TOP => test:jobs:uge_genius

Yea docs must be wrong given this error, sorry about that! Looks like from the docs -o (output likely) is this -o [[hostname]:]path,..., though I can’t tell if hostname there is optional or not.

Also looking at the docs again, this file may exist already just by having sge already installed and configured (or a good version exists on your login hosts where folks currently submit jobs to sge). You should probably find that real file and point to it because it’s likely to have all sorts of configurations you need like what queues to use or where to submit the job among all sorts of other likely necessary configurations.

I’m not super familiar with SGE’s configs, but in the docs it looks like you can specify fs_stdout_host and fs_stderr_host (probably to localhost?).

@thomasbrTTU were you able to solve your issue?

Jeff,

No. We are still getting the issue with the UGE scheduler.

Tom

There is a bug in the adapter where nil is attempted to be used instead of “” for the default value. This may be a work around:

You have # conf: "". Could you change this to conf: "".

I’ve opened this as an issue https://github.com/OSC/ood_core/issues/175

Oh sorry I misread further in the post as this was already suggested and there are other problems. Ignore my previous comments.

Hi, I think we’ve figured out your issue and it seems to be limited to only the tests. OpenOndemand should actually work, it’s only the test that doesn’t.

Looks like you should apply this patch in this file /var/www/ood/apps/sys/dashboard/lib/tasks/test.rake. It’s small enough to just do it manually I’d think.

Also, long term, you should probably use "" instead of the temp file as it’ll likely be deleted at some point.

Jeff,

That fixed the issue.

Thanks

Tom