Interactive desktop: DISPLAY error

Alan: Any chance this is related to
https://github.com/OSC/ood_core/issues/109
Something to do with newer Slurm versions changing how environment variables are forwarded? This is was in a long OOD-users email thread in Dec 2018, subject: ‘no environment set for HPC desktop – job fails’ if you have those messages saved someplace.
Ric

Perhaps. All OOD-users mailing list posts are archived here: https://listsprd.osu.edu/pipermail/ood-users/
Here’s the first one in the thread: https://listsprd.osu.edu/pipermail/ood-users/2018-December/000316.html

In our dev environment (slurm with ohpc) we have started to see this 
error when trying to launch interactive desktops:

/tmp/slurmd/job00079/slurm_script: line 3: module: command not found
Setting VNC password...
Error: no HOME environment variable
Starting VNC server...
vncserver: The HOME environment variable is not set.
vncserver: The HOME environment variable is not set.
vncserver: The HOME environment variable is not set. vncserver: The HOME 
environment variable is


As we understand it, the PUN nginx worker launches the batch job that 
starts the desktop batch job.

The problem seems to be that the environment for the job is empty, hence 
no module function or HOME env or anything else.   We checked the env of 
the users nginx worker under /proc and it is completely empty.   Because 
our job env is inherited from the caller (the nginx worker in this case) 
the attempt to run the module command and vncserver commands naturally fail.

When we launch an interactive terminal, it runs just fine, but I'm 
guessing that's because the interactive session actually reads the 
normal shell startup and builds its environment, even if it happened to 
be missing in the proxy.

Do you have any pointers on what could cause this situation.   We 
noticed it after we started adding additional interactive apps but don't 
have a clear time point.  It was working fine originally and still 
functions fine in our prod env (without any of the additional 
interactive apps).

Thanks,

John-Paul

There are a lot of replies, so if one of them is applicable please post it here to document that. Thanks.

I’m not so sure that the output shared in the original post supports the idea that all environment variables are not set. I certainly agree that DISPLAY is not set.

Supporting OOD is a chance for me to learn about the world of web development and application hosting – however, my background is not in this area. How should ood be communicating to the compute node to set these variables? Or is that the wrong way to think about it?

How can I verify the mate install on the compute node? Again, right now, I’ve set up a reservation to work with a single compute node. Mate was installed via ‘yum groupinstall “Mate Desktop”’ on RHEL 7.

Thanks for any help in thinking through the situation.

Emily:

The easiest way to verify the desktop, if you have physical access, is to hook up a monitor and keyboard to the node, login there, and type

startx

and see if the desktop you get is what you expected, and is functional. Other possibilities exist, but require perturbing the environment a bit.

Cheers,

Ric

Hi, Ric –

Thanks for reminding me that sometimes the direct approach is the best :slight_smile:
I’ll provide an update on the outcome.

Cheers,
~ Em

Hi, Ric –

We have now verified that mate can be successfully launched on our test node.
Thanks for the nudge to go to the data center.

Ondemand seems to be leaving a record of the session environment generated with the desktop launch in the following temp file: /tmp/test.slurm
There are a few points that I take from reviewing that file:
– There is no DISPLAY
– There are no ‘vnc’ related variables
– The PATH does not include the path to turbovnc; nor to websockify
– The job respects my test slurm reservation, and is therefore directed to the test node.
– TERM=dumb

Cheers
~ Em
ps The test.slurm file, just in case that is of interest:
[mrd20@compt166 ~] cat /tmp/test.slurm BASH=/bin/bash BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() BASH_CMDS=() BASH_ENV=/usr/local/lmod/lmod/init/bash BASH_LINENO=([0]="0") BASH_SOURCE=([0]="/usr/local/slurm/TaskProlog") BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu") BASH_VERSION='4.2.46(2)-release' CC=mpicc CPLUS_INCLUDE_PATH=/usr/local/intel-17/openmpi/2.0.1/include CUDA_VISIBLE_DEVICES=NoDevFiles CXX=mpicxx C_INCLUDE_PATH=/usr/local/intel-17/openmpi/2.0.1/include DIRSTACK=() ENVIRONMENT=BATCH EUID=491395 F77=ifort FC=mpifort GPU_DEVICE_ORDINAL=NoDevFiles GROUPS=() HISTCONTROL=ignoredups HISTSIZE=1000 HOME=/home/mrd20 HOSTNAME=compt166 HOSTTYPE=x86_64 IFS=’ \t\n’
INCLUDE=/usr/local/intel/17/compilers_and_libraries_2017/linux/include
INSTALLS=/usr/local/intel-17/openmpi-2_0_1
INTEL_LICENSE_FILE=28518@license5.osc.edu
LANG=en_US.UTF-8
LD_LIBRARY_PATH=/usr/local/intel-17/openmpi/2.0.1/lib:/usr/local/intel/17/tbb/lib/intel64/gcc4.7:/usr/local/intel/17/compilers_and_libraries_2017/linux/mkl/lib/intel64:/usr/local/intel/17/compilers_and_libraries_2017/linux/lib/intel64::/usr/local/lib
LESSOPEN=’||/usr/bin/lesspipe.sh %s’
LIBRARY_PATH=/usr/local/intel-17/openmpi/2.0.1/lib:/usr/local/intel/17/tbb/lib/intel64/gcc4.7:/usr/local/intel/17/compilers_and_libraries_2017/linux/lib/intel64
LMOD_ANCIENT_TIME=86400
LMOD_CMD=/usr/local/lmod/lmod/libexec/lmod
LMOD_COLORIZE=yes
LMOD_DIR=/usr/local/lmod/lmod/libexec
LMOD_FULL_SETTARG_SUPPORT=no
LMOD_PKG=/usr/local/lmod/lmod
LMOD_PREPEND_BLOCK=normal
LMOD_SETTARG_CMD=:
LMOD_SYSTEM_DEFAULT_MODULES=StdEnv
LMOD_VERSION=7.3
LMOD_arch=x86_64
LMOD_sys=Linux
LM_LICENSE_FILE=28518@license5.osc.edu
LOADEDMODULES=users::intel/17:openmpi/2.0.1:StdEnv
LOGNAME=mrd20
MACHTYPE=x86_64-redhat-linux-gnu
MAIL=/var/spool/mail/mrd20
MANPATH=/usr/local/intel/17/compilers_and_libraries_2017/linux/man:/usr/local/lmod/lmod/share/man::/usr/share/man:/usr/local/man:/usr/local/share/man
MKL_HOME=/usr/local/intel/17/compilers_and_libraries_2017/linux/mkl
MODULEPATH=/usr/local/share/modulefiles/MPI/intel/17/openmpi/2.0.1:/usr/local/share/modulefiles/Compiler/intel/17:/home/mrd20/.usr/local/share/modulefiles:/usr/local/share/modulefiles/Linux:/usr/local/share/modulefiles/Core:/usr/local/lmod/lmod/modulefiles/Core
MODULEPATH_ROOT=/usr/local/share/modulefiles
MODULESHOME=/usr/local/lmod/lmod
MPICC=mpicc
MPICXX=mpicxx
MPIFORT=mpifort
OLDIAGS=/opt/dell/onlinediags/oldiags/bin
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/intel-17/openmpi/2.0.1/bin:/usr/local/intel/17/compilers_and_libraries_2017/linux/bin/intel64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/dell/srvadmin/bin:/home/mrd20/hpcadmin/scripts/bin
PIPESTATUS=([0]=“0”)
PKG_CONFIG_PATH=/usr/local/intel-17/openmpi/2.0.1/lib/pkgconfig
PPID=40201
PS4=’+ ’
PWD=/home/mrd20/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/smaster/output/92a4905f-d76d-4532-b819-deabe04d11a0
PYTHONUSERBASE=/home/mrd20/.usr/local/python/2.7.13
SBATCH_EXPORT=ALL
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=2
SLURMD_NODENAME=compt166
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint
SLURM_CLUSTER_NAME=smaster
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_CPUS_ON_NODE=1
SLURM_EXPORT_ENV=SBATCH_EXPORT
SLURM_GET_USER_ENV=1
SLURM_GTIDS=0
SLURM_JOBID=12468235
SLURM_JOB_ACCOUNT=arc_staff
SLURM_JOB_CPUS_PER_NODE=1
SLURM_JOB_GID=10085
SLURM_JOB_ID=12468235
SLURM_JOB_NAME=sys/dashboard/sys/bc_desktop/smaster
SLURM_JOB_NODELIST=compt166
SLURM_JOB_NUM_NODES=1
SLURM_JOB_PARTITION=batch
SLURM_JOB_QOS=normal
SLURM_JOB_RESERVATION=ood-test
SLURM_JOB_UID=491395
SLURM_JOB_USER=mrd20
SLURM_LOCALID=0
SLURM_MEM_PER_CPU=1024
SLURM_NNODES=1
SLURM_NODEID=0
SLURM_NODELIST=compt166
SLURM_NODE_ALIASES=’(null)’
SLURM_PRIO_PROCESS=0
SLURM_PROCID=0
SLURM_RLIMIT_AS=18446744073709551615
SLURM_RLIMIT_CORE=0
SLURM_RLIMIT_CPU=18446744073709551615
SLURM_RLIMIT_DATA=18446744073709551615
SLURM_RLIMIT_FSIZE=18446744073709551615
SLURM_RLIMIT_MEMLOCK=18446744073709551615
SLURM_RLIMIT_NOFILE=1024
SLURM_RLIMIT_NPROC=384966
SLURM_RLIMIT_RSS=18446744073709551615
SLURM_RLIMIT_STACK=8388608
SLURM_SUBMIT_DIR=/var/www/ood/apps/sys/dashboard
SLURM_SUBMIT_HOST=wmaster
SLURM_TASKS_PER_NODE=1
SLURM_TASK_PID=40201
SLURM_TOPOLOGY_ADDR=compt166
SLURM_TOPOLOGY_ADDR_PATTERN=node
TERM=dumb
TMPDIR=/tmp
UID=491395
USER=mrd20
XAUTHORITY=/home/mrd20/.Xauthority
XDG_DATA_DIRS=/home/mrd20/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
_=’]’
LMFILES=/home/mrd20/.usr/local/share/modulefiles::/usr/local/share/modulefiles/Core/intel/17.lua:/usr/local/share/modulefiles/Compiler/intel/17/openmpi/2.0.1.lua:/usr/local/lmod/lmod/modulefiles/Core/StdEnv.lua
ModuleTable001=X01vZHVsZVRhYmxlXz17WyJNVHZlcnNpb24iXT0zLFsiY19yZWJ1aWxkVGltZSJdPWZhbHNlLFsiY19zaG9ydFRpbWUiXT1mYWxzZSxkZXB0aFQ9e30sZmFtaWx5PXt9LG1UPXtTdGRFbnY9e1siZm4iXT0iL3Vzci9sb2NhbC9sbW9kL2xtb2QvbW9kdWxlZmlsZXMvQ29yZS9TdGRFbnYubHVhIixbImZ1bGxOYW1lIl09IlN0ZEVudiIsWyJsb2FkT3JkZXIiXT0zLHByb3BUPXt9LFsic3RhdHVzIl09ImFjdGl2ZSIsWyJ1c2VyTmFtZSJdPSJTdGRFbnYiLH0saW50ZWw9e1siZm4iXT0iL3Vzci9sb2NhbC9zaGFyZS9tb2R1bGVmaWxlcy9Db3JlL2ludGVsLzE3Lmx1YSIsWyJmdWxsTmFtZSJdPSJpbnRlbC8xNyIsWyJsb2FkT3JkZXIiXT0xLHByb3BUPXt9LFsic3RhdHVzIl09ImFj
ModuleTable002=dGl2ZSIsWyJ1c2VyTmFtZSJdPSJpbnRlbCIsfSxvcGVubXBpPXtbImZuIl09Ii91c3IvbG9jYWwvc2hhcmUvbW9kdWxlZmlsZXMvQ29tcGlsZXIvaW50ZWwvMTcvb3Blbm1waS8yLjAuMS5sdWEiLFsiZnVsbE5hbWUiXT0ib3Blbm1waS8yLjAuMSIsWyJsb2FkT3JkZXIiXT0yLHByb3BUPXt9LFsic3RhdHVzIl09ImFjdGl2ZSIsWyJ1c2VyTmFtZSJdPSJvcGVubXBpIix9LH0sbXBhdGhBPXsiL3Vzci9sb2NhbC9zaGFyZS9tb2R1bGVmaWxlcy9NUEkvaW50ZWwvMTcvb3Blbm1waS8yLjAuMSIsIi91c3IvbG9jYWwvc2hhcmUvbW9kdWxlZmlsZXMvQ29tcGlsZXIvaW50ZWwvMTciLCIvaG9tZS9tcmQyMC8udXNyL2xvY2FsL3NoYXJlL21vZHVsZWZpbGVzIiwiL3Vzci9sb2NhbC9z
ModuleTable003=aGFyZS9tb2R1bGVmaWxlcy9MaW51eCIsIi91c3IvbG9jYWwvc2hhcmUvbW9kdWxlZmlsZXMvQ29yZSIsIi91c3IvbG9jYWwvbG1vZC9sbW9kL21vZHVsZWZpbGVzL0NvcmUiLH0sWyJzeXN0ZW1CYXNlTVBBVEgiXT0iL2hvbWUvbXJkMjAvLnVzci9sb2NhbC9zaGFyZS9tb2R1bGVmaWxlczovdXNyL2xvY2FsL3NoYXJlL21vZHVsZWZpbGVzL0xpbnV4Oi91c3IvbG9jYWwvc2hhcmUvbW9kdWxlZmlsZXMvQ29yZTovdXNyL2xvY2FsL2xtb2QvbG1vZC9tb2R1bGVmaWxlcy9Db3JlIix9
ModuleTable_Sz=3
__Init_Default_Modules=1
__LMOD_STACK_CC=bXBpY2M=:aWNj
__LMOD_STACK_CXX=bXBpY3h4:aWNwYw==
__LMOD_STACK_F77=aWZvcnQ=
__LMOD_STACK_FC=bXBpZm9ydA==:aWZvcnQ=
__LMOD_STACK_INSTALLS=L3Vzci9sb2NhbC9pbnRlbC0xNy9vcGVubXBpLTJfMF8x:L3Vzci9sb2NhbC9pbnRlbC0xNw==
__LMOD_STACK_MPICC=bXBpY2M=
__LMOD_STACK_MPICXX=bXBpY3h4
__LMOD_STACK_MPIFORT=bXBpZm9ydA==
clearMT ()
{
eval $(LMOD_DIR/clearMT_cmd bash) } ml () { eval (LMOD_DIR/ml_cmd "@")
}
module ()
{
eval $(LMOD_CMD bash "@");
[ ? = 0 ] && eval (${LMOD_SETTARG_CMD:-:} -s sh)
}

@emily.dragowsky I am glad you are making progress and sorry that this has been so difficult.

I’m not sure about the problem you are having with DISPLAY not being set, but I can tell you how to modify the environment of the desktop apps without having to copy and modify the job template directly. There are two options.

The first is modifying the cluster config files to add a batch_connect: section. There is some basic documentation https://osc.github.io/ood-documentation/master/app-development/interactive/setup/modify-cluster-configuration.html?highlight=batch_connect and https://osc.github.io/ood-documentation/master/installation/cluster-config-schema.html#batch-connect of how these configuration options are managed. We use this to setup the environment properly for TurboVNC and WebSockify. Ours looks like this:

  batch_connect:
      basic:
        script_wrapper: "module restore\n%s"
      vnc:
        script_wrapper: "module restore\nmodule load ondemand-vnc\n%s"

Where the module load ondemand-vnc loads the turbovnc module and then sets WEBSOCKIFY_CMD=/usr/local/novnc/utils/websockify/run. The code that submits the interactive app job uses the script_wrapper: value with string interpolation to replace %s with the rest of the generated job script. So its a way to prepend or append code to it.

A second way is you can add another submit script with an arbitrary name like slurm.yml.erb file to /etc/ood/config/apps/bc_desktop/submit/ and reference this in the desktop configs under /etc/ood/config/apps/bc_desktop/submit/. Then in this custom slurm.yml.erb file you can add to the script: hash a key value pair for each attribute listed here: https://www.rubydoc.info/gems/ood_core/OodCore/Job/Script. The examples in the documentation show using native: but you can also do job_environment:. Here is an example in an earlier version of our Jupyter interactive app where we set the environment variables PYTHON_MODULE and possibly CUDA_MODULE: https://github.com/OSC/bc_osc_jupyter/blob/a89c50706dc45ef01b035b8157ca8a46244e19d7/submit.yml.erb#L3-L7

script:
  job_environment:
    PYTHON_MODULE: "python/3.5"
    <%- if node_type.include? "gpus" -%>
    CUDA_MODULE: "cuda/8.0.44"
    <%- end -%>
  native:
    resources:
      nodes: "<%= bc_num_slots %><%= node_type %>"

Hi, Ric –

Thanks for the note. I should have included information about our implementation of the vnc support. Your prescription to modify the cluster yml file with a module to modify the path makes a lot of sense. I wasn’t able to generalize, and so could not see that as an option on my own. Thanks!!

I had just relied on the qualitative fact that the vnc configuration successfully supports the jupyter interactive app – therefore, I thought that vnc wasn’t an issue – although local conversations and testing made it clear ultimately that vnc was just not being allowed to do it’s tasks. For completeness, here’s the way that I had implemented vnc that worked for jupyter app, but not for desktop. You’ll see the use of ‘module purge’, which I’m now wondering about. I’ll look into the docs to better appreciate the various strategies for manipulating the environment.

In the meantime, here’s our current cluster yml file:

/etc/ood/config/clusters.d/smaster.yml


v2:
metadata:
title: “rider”
login:
host: “rider.case.edu”
job:
adapter: “slurm”
cluster: “smaster”
bin: “/usr/bin”
conf: “/etc/slurm/slurm.conf”
batch_connect:
basic:
script_wrapper: |
module purge
%s
set_host: “host=$(hostname)”
vnc:
script_wrapper: |
module purge
export PATH="/usr/local/gcc-6_3_0/turbovnc/bin:$PATH"
export WEBSOCKIFY_CMD="/usr/local/websockify/run"
%s

It looks as though I would also benefit from adding explicit line breaks.
I’ll set up a module and implement your suggestion.

Cheers,
~ Em

@efranz - I’m working with @emily.dragowsky and appreciate your notes. Turns out the submission script did not utilize the “vnc” template for bc_desktop which led us to not initialize Xvnc.

That said, we are now working through what I think is a reverse proxy / NoVNC issue with our deployment. I can verify that the proxy works using the “nc -l 5432” verification process. I can also SSH port forward the VNC connection. When I launch a job, the NoVNC application errors with “Failed to connect to server”. If I curl both https://ondemand/rnode/compt166/42949 and compt166:42949 I get the same 405 response. How should I go about debugging this further?

Have you checked the error and output logs for the batch job itself to verify all the processes are starting as expected? The job data will be in a subdirectory with the name being a UUID i.e. ~/ondemand/data/sys/dashboard/batch_connect/sys/bc_desktop/owens/output/27fce160-3b13-4a91-82e9-a82aaeca6459/

Also, you might want to try nc -l 42950 and make sure that port numbers that high are not being blocked for some reason.

We have modified the /etc/ood/config/clusters/.yml to adopt the newer/alternate syntax for the script_wrapper:
script_wrapper: “module purge\nmodule load gcc/6.3.0\nmodule load ondemand-vnc\n%s”

Using ‘purge’ rather than restore so far seems preferable for our local lmod module hierarchy configuration.

The desktop does ultimately work, with one manual work-around, in that we need to enter “host:path” into a browser tab, and then select “launch noVnc in new tab” button.

So measured success. A few questions:
– Is it problematic to only have Mate installed? The output.log throws a number of complaints about missing files associated with gnome and xfce. I’m guessing most of that is to be expected, given they are not installed. Do we need any individual packages from gnome, or xfce?
– Even using the above script_wrapper, the path to the vncserver is not found, whereas the module (ondemand-vnc) when loaded in a shell appropriately updates the shell environment. This seems to be the condition that initially prevents the launch of the desktop, with which we workaround as described above.

Here’s the full output.log:
Setting VNC password…
Starting VNC server…

Desktop ‘TurboVNC: compt166:1 (mrd20)’ started on display compt166:1

Log file is vnc.log
Successfully started VNC server on compt166:5901…
Script starting…
Starting websocket server…
/usr/local/websockify/websockify/websocket.py:30: UserWarning: no ‘numpy’ module, HyBi protocol will be slower
warnings.warn(“no ‘numpy’ module, HyBi protocol will be slower”)
WebSocket server settings:

  • Listen on :49804
  • No SSL/TLS support (no cert file)
  • Backgrounding (daemon)
    Scanning VNC log file for user authentications…
    Generating connection YAML file…
    Resetting modules to system default
    Launching desktop ‘mate’…
    cat: /etc/xdg/autostart/gnome-keyring-gpg.desktop: No such file or directory
    cat: /etc/xdg/autostart/xfce4-power-manager.desktop: No such file or directory
    which: no vncserver in (/usr/local/intel-17/openmpi/2.0.1/bin:/usr/local/intel/17/compilers_and_libraries_2017/linux/bin/intel64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/dell/srvadmin/bin:/home/mrd20/hpcadmin/scripts/bin)
    generating cookie with syscall
    generating cookie with syscall
    generating cookie with syscall
    generating cookie with syscall
    mate-session[42364]: WARNING: Could not parse desktop file /home/mrd20/.config/autostart/gnome-keyring-gpg.desktop: Key file does not start with a group
    mate-session[42364]: GLib-GObject-CRITICAL: object GsmAutostartApp 0x137d250 finalized while still in-construction
    mate-session[42364]: GLib-GObject-CRITICAL: Custom constructor for class GsmAutostartApp returned NULL (which is invalid). Please use GInitable instead.
    mate-session[42364]: WARNING: could not read /home/mrd20/.config/autostart/gnome-keyring-gpg.desktop
    mate-session[42364]: WARNING: Could not parse desktop file /home/mrd20/.config/autostart/xfce4-power-manager.desktop: Key file does not start with a group
    mate-session[42364]: GLib-GObject-CRITICAL: object GsmAutostartApp 0x7fa8a4003770 finalized while still in-construction
    mate-session[42364]: GLib-GObject-CRITICAL: Custom constructor for class GsmAutostartApp returned NULL (which is invalid). Please use GInitable instead.
    mate-session[42364]: WARNING: could not read /home/mrd20/.config/autostart/xfce4-power-manager.desktop
    vmware-user: could not open /proc/fs/vmblock/dev
    /usr/bin/vmtoolsd: symbol lookup error: /usr/lib64/libvmtools.so.0: undefined symbol: intf_close
    SELinux Troubleshooter: Applet requires SELinux be enabled to run.
    *** ERROR ***
    TI:14:45:58 TH:0x1402a60 FI:gpm-manager.c FN:gpm_manager_systemd_inhibit,1784
  • Error in dbus - GDBus.Error:org.freedesktop.DBus.Error.AccessDenied: Permission denied
    Traceback:
    mate-power-manager() [0x418b9f]
    mate-power-manager() [0x411220]
    /usr/lib64/libgobject-2.0.so.0(g_type_create_instance+0x1fb) [0x7f59eb33e4eb]
    /usr/lib64/libgobject-2.0.so.0(+0x151fd) [0x7f59eb3221fd]
    /usr/lib64/libgobject-2.0.so.0(g_object_new_with_properties+0x27d) [0x7f59eb323aad]
    /usr/lib64/libgobject-2.0.so.0(g_object_new+0xc1) [0x7f59eb324491]
    mate-power-manager() [0x411a22]
    mate-power-manager() [0x4080b8]
    /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f59ea72e3d5]
    mate-power-manager() [0x4083db]
    Initializing caja-open-terminal extension
    Initializing caja-image-converter extension

(nm-applet:42439): nm-applet-WARNING **: 14:45:58.635: NetworkManager is not running
/usr/share/system-config-printer/applet.py:44: PyGIWarning: Notify was imported without specifying a version first. Use gi.require_version(‘Notify’, ‘0.7’) before import to ensure that the right version gets loaded.
from gi.repository import Notify
system-config-printer-applet: failed to start NewPrinterNotification service
system-config-printer-applet: failed to start PrinterDriversInstaller service: org.freedesktop.DBus.Error.AccessDenied: Connection “:1.1365” is not allowed to own the service “com.redhat.PrinterDriversInstaller” due to security policies in the configuration file
Setting VNC password…
Generating connection YAML file…
mate-session[42364]: CRITICAL: gsm_systemd_set_session_idle: assertion ‘session_path != NULL’ failed
mate-session[42364]: CRITICAL: gsm_systemd_set_session_idle: assertion ‘session_path != NULL’ failed
mate-session[42364]: CRITICAL: gsm_systemd_set_session_idle: assertion ‘session_path != NULL’ failed
mate-session[42364]: CRITICAL: gsm_systemd_set_session_idle: assertion ‘session_path != NULL’ failed
Gtk-Message: 15:12:22.208: GtkDialog mapped without a transient parent. This is discouraged.
[1549655183,000,xklavier.c:xkl_engine_start_listen/] The backend does not require manual layout management - but it is provided by the application
Window manager warning: CurrentTime used to choose focus window; focus window may not be correct.
Window manager warning: Got a request to focus the no_focus_window with a timestamp of 0. This shouldn’t happen!

(caja:42423): Gdk-WARNING **: 15:12:29.355: gdk_window_set_icon_list: icons too large

(mate-panel:42414): GLib-CRITICAL **: 15:12:29.390: g_hash_table_remove_internal: assertion ‘hash_table != NULL’ failed

(mate-panel:42414): GLib-GObject-WARNING **: 15:12:29.390: invalid unclassed pointer in cast to ‘MatePanelAppletFrameDBus’

(mate-panel:42414): GLib-CRITICAL **: 15:12:29.390: g_hash_table_remove_internal: assertion ‘hash_table != NULL’ failed

(mate-panel:42414): GLib-GObject-WARNING **: 15:12:29.390: invalid unclassed pointer in cast to ‘MatePanelAppletFrameDBus’

(mate-panel:42414): GLib-CRITICAL **: 15:12:29.390: g_hash_table_remove_internal: assertion ‘hash_table != NULL’ failed

(mate-panel:42414): GLib-GObject-WARNING **: 15:12:29.390: invalid unclassed pointer in cast to ‘MatePanelAppletFrameDBus’

(mate-panel:42414): GLib-CRITICAL **: 15:12:29.390: g_hash_table_remove_internal: assertion ‘hash_table != NULL’ failed

(mate-panel:42414): GLib-GObject-WARNING **: 15:12:29.390: invalid unclassed pointer in cast to ‘MatePanelAppletFrameDBus’

(mate-panel:42414): GLib-CRITICAL **: 15:12:29.390: g_hash_table_remove_internal: assertion ‘hash_table != NULL’ failed

(mate-panel:42414): GLib-GObject-WARNING **: 15:12:29.390: invalid unclassed pointer in cast to ‘MatePanelAppletFrameDBus’
(mate-panel:42414): GLib-GObject-WARNING **: 15:12:29.390: invalid unclassed pointer in cast to ‘MatePanelAppletFrameDBus’
mate-session[42364]: WARNING: Unable to stop system: Interactive authentication required.
Desktop ‘mate’ ended…
Cleaning up…
Killing Xvnc process ID 42264
Gdk-Message: 15:12:29.530: nm-applet: Fatal IO error 11 (Resource temporarily unavailable) on X server :1.

@emily.dragowsky and I worked on this more today. After doing some troubleshooting we discovered that our CAS implementation was not configured correctly in OOD. Turns out we missed the declaration “CASScope /” in ood_portal.yml file or our Apache configuration. This sets the scope for which a mod_auth_cas cookie is valid. Prior to doing this we would receive a 302 redirect (request for a new CAS cookie) which NoVNC could not understand. Thanks to @pingluo for his post on Yale’s CAS implementation.

Does this fix to CASScope / fix all of the issues you were experiencing, or is your OnDemand setup still sub-optimal as @emily.dragowsky describes:

The desktop does ultimately work, with one manual work-around, in that we need to enter “host:path” into a browser tab, and then select “launch noVnc in new tab” button.

  1. It should be fine to only have Mate installed. I will need to investigate the missing files warnings for GNOME and Xfce. I opened an issue https://github.com/OSC/ood_core/issues/128
  2. If the the module load commands in the script wrapper do not work the same way as they do when you login with the shell my guess would be it is an environment issue. https://github.com/OSC/ood_core/issues/109 could be part of it like @azric suggested earlier. If the wiping of the environment during job submission is a problem, one thing you could try with the script wrapper is above the module purge command to do a source /etc/profile.

Hi, Eric – We have made some progress, and the desktop app is largely working.
I’ve reviewed the documentation, but somehow am lacking the knowledge/experience to correctly parse what to do.

My installation is 1.3, if that is at all relevant.
My understanding is that the default form content is taken from:
/var/www/ood/apps/sys/bc_desktop/form.yml
And I understand that this file should not be edited.
I have followed the setup to have only a very sparse cluster file for the bc_desktop app:

# /etc/ood/config/apps/bc_desktop/smaster.yml
---
title: "Rider Desktop"
cluster: "smaster"
submit: "submit/slurm.yml.erb"

And the ‘submit.yml.erb’:

# /etc/ood/config/apps/bc_desktop/submit/slurm.yml.erb
#
batch_connect:
  template: "vnc"
  set_host: "host=$(hostname)"
script:
  job_environment:
    SBATCH_EXPORT: "ALL"
  native:
    - "-N"
    - "<%= bc_num_slots.blank? ? 1 : bc_num_slots.to_i %>"
    - "--reservation=ood-test"
    - "--nodelist=compt166"

The rendered form presents 4 fields: account, #hours, #nodes, partition.
Now, if I comment out the lines under slurm native associated with nodes ("-N", “…slots…”
I still have an opportunity to enter #nodes in the rendered form.

What I’d like to do is adjust the #cores, and can’t figure out how to do that through the form. I can do this through the “native” call below the “script” within the /etc/ood/config/apps/bc_desktop/submit/slurm.yml.erb:
native: ["-n", “2”, “–reservation=ood-test”, “–nodelist=compt166”]

How do I implement this through the form?
More generally, how do I adjust what fields appear in the form? It seems that all the fields in the ‘reference’ file should appear, and when listing various fields in the ‘attributes’, that’s how to preassign values. Sorry that I can’t glean this info from the existing documentation.

Cheers

(FYI: I added the backticks around the code blocks in your comment so that code formatting is applied by Discourse)

First no apology necessary. This “batch connect plugin” architecture is over engineered and hides too much so it is confusing. One of our goals is to fix or replace this with something that is much simpler and straight forward. (don’t worry, we will still support backwards compatibility)

Currently, there is a separate form: section that controls which attributes appear in the web form and what order they appear. IMHO you should just be able to have a single attributes hash or array and that would appear in the web form. There is an issue opened to fix this.

To control what fields appear, copy this to your override /etc/ood/config/apps/bc_desktop/smaster.yml file from the bc_desktop form.yml:

form:
  - bc_vnc_idle
  - desktop
  - bc_account
  - bc_num_hours
  - bc_num_slots
  - node_type
  - bc_queue
  - bc_vnc_resolution
  - bc_email_on_started

Remove bc_num_slots and add num_cores to the form:, then add the attributes section like this:

attributes:
  num_cores:
    label: "Number of Cores"
    widget: "number_field"
    value: "1"
    min: "1"
    max: "48"

Then the diff will look like:

 ---
 title: "Rider Desktop"
 cluster: "smaster"
 submit: "submit/slurm.yml.erb"
+attributes:
+  num_cores:
+    label: "Number of Cores"
+    widget: "number_field"
+    value: "1"
+    min: "1"
+    max: "48"
 form:
   - bc_vnc_idle
   - desktop
   - bc_account
   - bc_num_hours
-  - bc_num_slots
+  - num_cores
   - node_type
   - bc_queue
   - bc_vnc_resolution
   - bc_email_on_started

And then edit the submit yml:

# /etc/ood/config/apps/bc_desktop/submit/slurm.yml.erb
#
batch_connect:
  template: "vnc"
  set_host: "host=$(hostname)"
script:
  job_environment:
    SBATCH_EXPORT: "ALL"
  native:
+   - "-n"
+   - "<%= num_cores.blank? ? 1 : num_cores.to_i %>"
    - "--reservation=ood-test"
    - "--nodelist=compt166"

Thanks, Eric!
Much appreciated. I’ve followed this guidance to implement the user-selected number of cores.

Following your guidance, the form updates appropriately, and I see that other elements may be implemented in the form by selecting and configuring appropriate widgets. I also feel less ‘bewildered’ generally in thinking about the question of setting up the app.

My next learning exercise will be to implement a widget to select between Mate or Xfce.

One additional general question: Is there a way to expose the submit file ‘slurm native’ control to the user, say through a text field widget, to allow users to allocate arbitrary resources for an interactive desktop session?

Thanks for making my life easier, and helping to improve my understanding.

Cheers,
~ Em

You are welcome, I’m glad it is working out for you now!

Here is a cheat sheet for adding selection for Mate & Xfce since that is what we do at OSC.

As for exposing the native, it could be dangerous because of the problem with validating the input and then parsing it correctly. But here is an idea I haven’t tried yet but I think would work:

Add a new attribute of name “sbatch_args” to attributes and form and label “Custom sbatch arguments”. If you don’t specify a widget the default is a text field. Actually you might be able to just add it to form: and use the default functionality. The nice thing about specifying details in the attributes section is you can add default values and help text.

Take the resulting string during form submission and parse it like this in the submit/slurm.yml.erb:

# /etc/ood/config/apps/bc_desktop/submit/slurm.yml.erb
#
<%- 
require 'shellwords'
-%>
batch_connect:
  template: "vnc"
  set_host: "host=$(hostname)"
script:
  job_environment:
    SBATCH_EXPORT: "ALL"
  native:
    - "-n"
    - "<%= num_cores.blank? ? 1 : num_cores.to_i %>"
    - "--reservation=ood-test"
    - "--nodelist=compt166"
    <% sbatch_args.shellsplit.each do |arg| %>
    - "<%= arg %>"
    <% end %>

I think the <%- version of the tags omits extra whitespace, I don’t remember. You might not need to require shellwords if it is already required. Shellwords adds shellsplit method to strings. I think that shellsplit would remove or escape all " but I’m not sure. You can see the documentation here.

I haven’t actually tested this yet though. If it works for you let me know cause that is a really nifty trick.

Thanks again, Eric. I’ll try this out, and then talk it over with some folks to see if they’ll be on-board with implementation. Nonetheless, I’ll report back on how testing works out (later this week).

I did go ahead and implement the selection between Mate and Xfce as follows:
desktop:
label: “Desktop selection”
widget: select
options:
- [ “Mate”, “mate” ]
- [ “Xfce”, “xfce” ]

This resulted in the expected two-option dropdown menu, with xfce as the alternative selection.

Having the form content in the .yml file just makes so much sense – now that it’s all there. Ordering the fields, commenting lines as I choose – and reload the browser tab to find the changes implemented is very satisfying. I look forward to crawling a bit deeper into OOD interactive apps.

Cheers,
~ Em

1 Like