Mate Desktop in a Singularity container?

@jeff.ohrstrom, thank you so much for taking the time to provide working solution. I am impressed by how amazing and prompt the support here has been - I started playing with the Interactive Desktop and Apps last week, and I have both functioning now! Also, it is really impressive to see how robust and modular OOD is, and how we are able to pretty much straight-forward use Singularity. So, thank you for the great implementation and support!

I was able to fix the issue I was encountering earlier with my image by applying the solutions from related posts here, like adding “CASScope /” and fixing the host regex. I can verify that the Singularity suggestions provided by @jeff.ohrstrom above worked for me.

I just have an additional quick question. The url of the Interactive Desktop is pretty long, e.g. https://rhino-ood.unl.edu/pun/sys/dashboard/noVNC-1.0.0/vnc.html?utf8=✓&autoconnect=true&path=rnode%2Fc0831.rhino.hcc.unl.edu%2F13404%2Fwebsockify&resize=remote&password=B6qml788&compressionsetting=6&qualitysetting=2&commit=Launch+Desktop+Container. Is is possible to have something like https://rhino-ood.unl.edu/node/c0831.rhino.hcc.unl.edu/5901 instead? With Jupyter, I can use view.html.erb and have the shorter url. Since I can not use view.html.erb file with VNC, is there a way to make the url for the Interactive Desktop shorter? If I just open the short url with the node and port used by the Interactive Desktop, I am getting Bad Gateway. If I should open another issue with this question, or search this forum better, I can do that :slight_smile:

Thanks for the feedback! This feature seems a long time coming, so I’m glad we found a solution that works good (I actually KDE to work fairly simply with this).

To the new topic, yea I think it’s about time for one, but I can answer the question about the url quickly: no, there’s no way to shorten it.

1 Like

That’s great news, and thanks so much for all you’ve done to figure it out!

Unfortunately, I haven’t been successful yet at creating the image. It seems to be failing when it tries to install the gnome-keyring package. Any idea why?

...
Failed:
  gnome-keyring.x86_64 0:3.28.2-1.el7

Complete!
FATAL:   failed to execute %post proc: exit status 1
FATAL:   While performing build: while running engine: while running /hpc/applications/singularity/3.5.3/libexec/singularity/bin/starter: exit status 255
srun: error: b001: task 0: Exited with exit code 255
[rengland@login02 mate-desktop]$

It’s unclear what the issue is, but it’s up above what you’ve posted. Are you using the same image def I’ve provided? Either way, there are likely errors above the line that says ‘Failed’.

If there aren’t, I think this singularity issue may help us debug. They say run with a -d flag to get debug output and add the -xe option in post like %post -xe.

Also, there could be an issue with what gnome-keyring is trying to do, like update keys in /usr. I don’t know if singularity mounts directories when you are building, but it may, and if it’s mounting /usr you may be trying to update host keyrings in the installation process (though this doesn’t seem likely as I just tried and the fakeroot is in the host’s /tmp.

The only other errors output are these:

  Installing : trousers-0.3.14-2.el7.x86_64                                                                                        35/85
  Installing : nettle-2.7.1-8.el7.x86_64                                                                                           36/85
install-info: No such file or directory for /usr/share/info/nettle.info
  Installing : gnutls-3.3.29-9.el7_6.x86_64                                                                                        37/85
  Installing : glib-networking-2.56.1-1.el7.x86_64                                                                                 38/85

and:

  Installing : openssh-clients-7.4p1-21.el7.x86_64                                                                                 84/85
  Installing : gnome-keyring-3.28.2-1.el7.x86_64                                                                                   85/85
Error unpacking rpm package gnome-keyring-3.28.2-1.el7.x86_64
error: unpacking of archive failed on file /usr/bin/gnome-keyring-daemon;5ea1c341: cpio: cap_set_file
error: gnome-keyring-3.28.2-1.el7.x86_64: install failed
  Verifying  : libXext-1.3.3-3.el7.x86_64                                                                                           1/85
  Verifying  : at-spi2-atk-2.26.2-1.el7.x86_64                                                                                      2/85

I even trimmed down my definition such that I’m only installing epel-release, dbus-python, and gnome-keyring, which produces the same errors. I’ll look into those other possibilities and see if I turn something up.

Thanks.

Yea, so you may need to build the image in a different environment. Looks like it’s a known issue with gnome-keyring that it needs that capability?

In any case, I see you run it through srun, maybe try building in just a shell session? Also looks like cap_set_file is problematic with NFS drives, so maybe building on a server that doesn’t have an /tmp filesystem that’s an NFS? (or home directory maybe? though it worked for me with auto-fs and what I believe is net-app)

Looks like there’s another singularity topic with your similar error.

If you can’t make the image on a different VM, one that allows you to do this, you may try a different base OS like debian, because it seems the source is Centos’ use of cpio, I don’t know if Debian does.

Also, I’ll submit my image to shub, so you can just pull it from there without having to build it.

@jeff.ohrstrom Trying to test this out, after having upgraded to OOD 1.7.

One thing that we need at our site is to use Lmod to load the singularity module. I see you can’t add it up there by export PATH … because that winds up in the script that singularity launches. I tried putting it above the singularity command and it mentioned it didn’t know the module command.

Any hints?

Do you run GridEngine by any chance? module is a function in an /etc/profile.d file. Some folks have had to add . ~/.bashrc (load a users .bashrc)

Along with script_wrapper we also provide header. that is a header.

  header: | 
    #!/bin/bash
    . ~/.bashrc

You can see this topic for similar issue.

No, SLURM 18.08.8.

Will have a look/try that out. There is mention someplace of needing a login shell (forget whether it was some script’s comments or the docs or some discourse post), and I did see that there was a fix after the 0.2.0 bc_desktop that my install is [erroneously?] reporting.

But anyway, now that you’ve clarified that all of that does go in /etc and where and that I don’t need to make a copy of the app, I suspect that problem goes away and it will either work or I’ll have a new question.

Thanks!

@novosirj, we use Slurm and Lmod as well. In “submit.yml.erb” I load the module before running Singularity:

script_wrapper: |
module purge
module load singularity
cat << “CTRSCRIPT” > container.sh

If this doesn’t work for you, maybe you can try adding lines to manully export them, something like: . /util/opt/lmod/lmod/init/profile; export -f module ?

I hope this helps.

Ah, one more thing - template/script.sh.erb should not have Singularity or any modules in it since it is run within the image. I know you used Singularity there at some point, but the Desktop should be invoked with "<%= session.staged_root.join("desktops", "#{context.desktop}.sh") %>" only.

Thanks, yes. I thought I started with a vanilla bc_desktop app, but just to make sure, I deleted it and reinstalled the RPM that contained it. It’s changed a bit – the version number of the bc_desktop app is no longer visible at all, let alone wrong, so I think that’s good.

I used our own Singularity image and also one loosely based on the one @jeff.ohrstrom put up. Here’s my definition:

OSVersion: 7
#DistType: centos
MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/
Include: yum 
  
%post
        yum -y install epel-release
        yum -y groupinstall 'Base'
        yum -y groupinstall 'Infiniband Support'
        yum -y groupinstall 'MATE Desktop'
        yum -y groupinstall 'Compatibility Libraries'
        yum -y groupinstall 'Network File System Client'
        yum --setopt=group_package_types=mandatory,default,optional -y groupinstall 'Legacy X Window System Compatibility'
        yum --setopt=group_package_types=mandatory,default,optional -y groupinstall 'Internet Browser'
        yum -y install https://github.com/openhpc/ohpc/releases/download/v1.3.GA/ohpc-release-1.3-1.el7.x86_64.rpm
        yum -y install evince eog
        yum -y install glx-utils
        yum -y install systemd-libs
        yum -y install sssd
        yum -y install ohpc-slurm-client
        yum -y install python2-pip
        pip install ts
        yum install -y https://yum.osc.edu/ondemand/latest/compute/el7Server/x86_64/python-websockify-0.8.0-1.el7.noarch.rpm
        yum install -y https://yum.osc.edu/ondemand/latest/compute/el7Server/x86_64/turbovnc-2.2.3-1.el7.x86_64.rpm
        yum remove -y tigervnc-server python2-pip mate-power-manager
        yum clean all
        rm -rf /var/cache/yum/*
        # Create bind mount directories
        ...

Then our form definition:

title: "Amarel-Newark Desktop"
cluster: "amareln"
submit: submit/container.yml.erb
attributes:
  desktop: "mate"
  bc_vnc_idle: 0
  bc_vnc_resolution:
    required: true
  node_type: null

  memory_gigs:
    widget: "number_field"
    label: "Gigabytes of memory"
    value: 4

form:
  - bc_vnc_idle
  - desktop
#  - bc_account
  - bc_num_hours
#  - bc_num_slots
  - num_cores
  - memory_gigs
  - node_type
  - bc_queue
  - bc_vnc_resolution
#  - bc_email_on_started

And then our submit/container.yml.erb:

<%
  # your image location will differ
  image="/projects/community/containers/mate_desktop_v2.img"
# image="/projects/community/containers/mate_desktop-centos-7.7-login.sif"
%>
---
script:
  native:
    - "-c"
    - "<%= num_cores.blank? ? 1 : num_cores.to_i %>"
    - "--mem=<%= memory_gigs %>G"
  template: "vnc"
batch_connect:
  websockify_cmd: '/usr/bin/websockify'
  header: |
    #!/bin/bash
    . ~/.bashrc
  script_wrapper: |
    module purge
    module load singularity/3.5.3
    cat << "CTRSCRIPT" > container.sh
    export PATH="$PATH:/opt/TurboVNC/bin"
    %s  
    CTRSCRIPT

    # your bindpath will differ
    export SINGULARITY_BINDPATH="...various, long..."
    
    singularity run <%= image %> /bin/bash container.sh

If I don’t have the “header:” section, I get bad command for module and for singularity. Setting it, at one point I was getting /bin/bash: /bin/bash: cannot execute binary file, but that might have been not starting with a fresh bc_desktop app. Now, though, I get no output at all, but the job terminates soon after it starts.

What am I able to do to try to get some output/see what is going wrong?

Doh: supposed to be singularity exec; The singularity run command is to run the default action of a container (which that recipe does not define).

I feel like I’ve followed the instructions but I’m still getting an error:

cat: /etc/xdg/autostart/gnome-keyring-gpg.desktop: No such file or directory
cat: /etc/xdg/autostart/pulseaudio.desktop: No such file or directory
cat: /etc/xdg/autostart/rhsm-icon.desktop: No such file or directory
cat: /etc/xdg/autostart/spice-vdagent.desktop: No such file or directory
cat: /etc/xdg/autostart/xfce4-power-manager.desktop: No such file or directory
error: Cannot autolaunch D-Bus without X11 $DISPLAY

Usage:
  dconf write KEY VALUE 

Write a new value to a key

Arguments:
  KEY         A key path (starting, but not ending with '/')
  VALUE       The value to write (in GVariant format)

does anyone have any suggestions? I’m using UGE and centos7.

I would guess your issue is with how the container is setup and initialized. I know off the top that /var is important to mount into it for DBUS.

I would ask what your SINGULARITY_BINDPATH is and what’s all installed in the container (ensuring all the mate dependencies are there along with the vnc stuff).

it’s strange, because running the mate.sh script from a singularity shell run from within a qlogin session on a compute node “works” (i.e. displays a desktop session on my PC), suggesting that the container works OK. does anyone have their working app in a git repo anywhere that i can clone? i’ll try and throw mine in a repo tomorrow morning.

The submit.yml.erb I provide earlier in this thread is what I use (you can see I don’t mount /etc) . I would say it’s definitely your SINGULARITY_BINDPATH. Do you load a module that sets it? What’s your submit.yml.erb look like?

Thanks for the reply. I wasn’t able to resolve the issue (it didn’t seem related to bindpath). I’ve since switched to an XFCE desktop and put turbovnc and websockify inside the container too, and will try pulling them out of the container another time.

I’m finally coming back to this, and I’m quite sure that I’m close but still not able to make it work. I don’t think I’m totally clear which piece runs where, between the OnDemand server, the compute node, and then the compute node inside of Singularity.

We have our image on a shared filesystem that is mounted on both the OnDemand server and all compute nodes, and both websockify and vncpasswd/vncserver are also on this shared filesystem. They’re not in the Singularity image at all.

Here’s my form, in /etc/ood/config/apps/bc_desktop:

title: "Amarel Desktop"
cluster: "amarel"
submit: submit/container.yml.erb
attributes:
  desktop: "mate"
  bc_vnc_idle: 0
  bc_vnc_resolution:
    required: true
  node_type: null
  memory_gigs:
    widget: "number_field"
    label: "Gigabytes of memory"
    value: 4
    help: |
      Number of gigabytes of memory (larger values may mean longer wait)
    min: 1
    max: 100
    step: 1

form:
  - bc_vnc_idle
  - desktop
#  - bc_account
  - bc_num_hours
#  - bc_num_slots
  - num_cores
  - memory_gigs
  - node_type
  - bc_queue
  - bc_vnc_resolution
  - bc_email_on_started
  - reservation

And then here’s my submit/container.yml.erb:

<%
  image="/projects/community/containers/mate_desktop_v2.img"
%>
---
script:
  native:
    - "-c"
    - "<%= num_cores.blank? ? 1 : num_cores.to_i %>"
    - "--mem=<%= memory_gigs %>G"
  template: "vnc"
batch_connect:
  websockify_cmd: '/projects/community/containers/bin/websockify'
  script_wrapper: |
    cat << "CTRSCRIPT" > container.sh
    export PATH="$PATH:/projects/community/containers/bin"
    module purge
    module load singularity
    %s  
    CTRSCRIPT

    module purge
    module load singularity

    export SINGULARITY_BINDPATH="/run/munge:/run/munge,/tmp/$ID:/run/user,/projects,/scratch,/cache/sw:/opt/sw,/cache,/cache/home:/home,/projectsp,/projectsn,/projectsc,/etc/slurm"

    singularity exec <%= image %> /bin/bash container.sh

What I get with this setup is something I don’t understand in output.log:

Setting VNC password...
e[91mERROR  : Failed to set effective UID to 0
e[0mStarting VNC server...
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
e[91mERROR  : Failed to set effective UID to 0
e[0me[91mERROR  : Failed to set effective UID to 0
e[0m
Cleaning up...
e[91mERROR  : Failed to set effective UID to 0
e[0m

This appears to be Singularity saying this, but I don’t totally understand it.

If I remove module purge; module load singularity from the line above %s, I get the below in output.log:

Setting VNC password...
/projects/community/containers/bin/vncpasswd: line 3: singularity: command not found
Starting VNC server...
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found
/projects/community/containers/bin/vncserver: line 3: singularity: command not found

If I put it back but instead remove it from below CTRSCRIPT, I get the following in output.log:

/var/lib/slurm/slurmd/job14618206/slurm_script: line 216: singularity: command not found

I’m not totally sure what is happening. I’d appreciate any pointers you might be able to provide.

Got somewhat farther adding a couple of set -x to see what was really happening.

The errors were coming from vncpasswd/vncserver, not singularity. I switched to using the TurboVNC install inside the container vs. outside and that solved that problem. Then couldn’t actually connect to the session, and it was the same situation with websockify. Once both of those were pointing to the in-container installs, this “just worked.”