Window Manager (xfwm4) for xfce4 failing

Have installed Paraview in my sandbox. When paraview is launched using OOD it submits the job to the UGE queue. When it goes to the “Paraview Queued” and “Launch Paraview” screen then it is removed from the queue. Eventually the screen goes away.
Here is the output.log file. It appears that the VNC server loads properly.
Setting VNC password…
Starting VNC server…

Warning: compute-19-11:1 is taken because of /tmp/.X1-lock
Remove this file if there is no X server compute-19-11:1

Desktop ‘TurboVNC: compute-19-11:2 (thomasbr)’ started on display compute-19-11:2

Log file is vnc.log
Successfully started VNC server on compute-19-11:5902…
Script starting…
Starting websocket server…
Scanning VNC log file for user authentications…
Generating connection YAML file…
Cleaning up…

Looking at the standard error file it appears that the window manager for xfce4 (xfwm4) is having issues and displays a Glib-CRITCAL error, and cannot open a display via VirtualGL.

xfwm4 --compositor=off --daemon --sm-client-disable
(xfwm4:113052): GLib-CRITICAL **: 09:53:02.905: g_str_has_prefix: assertion ‘prefix != NULL’ failed
(xfwm4:113052): xfwm4-WARNING **: 09:53:02.926: The property ‘/general/double_click_distance’ of type int is not supported
Bus::open: Can not get ibus-daemon’s address.
IBusInputContext::createInputContext: no connection to ibus-daemon
[VGL] ERROR: Could not open display :0.

The “xsetroot -solid ‘#D3D3D3’ “ command seems to work but then when it tries to execute “xfsettingsd --sm-client-disable” it gets the error

(xfsettingsd:113069): GLib-CRITICAL **: 09:53:03.341: g_str_has_prefix: assertion ‘prefix != NULL’ failed
(xfsettingsd:113069): GLib-GObject-CRITICAL **: 09:53:03.342: g_value_get_string: assertion ‘G_VALUE_HOLDS_STRING (value)’ failed
(xfsettingsd:113069): GLib-GObject-CRITICAL **: 09:53:03.343: g_value_get_string: assertion ‘G_VALUE_HOLDS_STRING (value)’ failed
Killing Xvnc process ID 112990
xfwm4: Fatal IO error 2 (No such file or directory) on X server :2.
xfce4-panel: Fatal IO error 11 (Resource temporarily unavailable) on X server :2.
xfsettingsd: Fatal IO error 2 (No such file or directory) on X server :2

Tried to run the commands directly on the compute nodes by “SSH -Y” to the compute node. It seems to be having issues finding a screen to manage.

compute-19-11:$ xfwm4 --compositor=off --daemon --sm-client-disable
(xfwm4:114427): xfwm4-WARNING **: 10:00:24.362: Could not find a screen to manage, exiting

compute-19-11:/opt/TurboVNC/bin$ xfsettingsd --sm-client-disable
compute-19-11:/opt/TurboVNC/bin$ xfsettingsd: No window manager registered on screen 0.

compute-19-11:/opt/TurboVNC/bin$ xfce4-panel --sm-client-disable
xfce4-panel: No window manager registered on screen 0. To start the panel without this check, run with --disable-wm-check.

This brings up part of the xfce desktop (inner panel missing) but has the application pull-down menu (top part) and the bottom icons. If I click on Applications -> Other is Paraview then it displays the Paraview screen.

The X server seems to working since I can display xclock and other X-applications. vglxinfo provides at the top and the remainder of output seems good.

name of display: localhost:10.0
display: localhost:10 screen: 0
direct rendering: Yes

glxgears runs fine but when running vglrun glxgears it provides

compute-19-11:$ vglrun glxgears
[VGL] NOTICE: Automatically setting VGL_CLIENT environment variable to
[VGL] 10.200.21.2, the IP address of your SSH client.
[VGL] ERROR: Could not open display :0.

12.200.21.2 is the local IP for the OOD server machine. Changed the display to the “name of display” given by vglxinfo and a windows shows up but quickly closes.
compute-19-11:$ vglrun -d localhost:10 glxgears
[VGL] NOTICE: Automatically setting VGL_CLIENT environment variable to
[VGL] 10.200.21.2, the IP address of your SSH client.
[VGL] ERROR: Could not connect to VGL client. Make sure that vglclient is
[VGL] running and that either the DISPLAY or VGL_CLIENT environment
[VGL] variable points to the machine on which vglclient is running.
[VGL] ERROR: in connect–
[VGL] 261: Connection refused

I do not have a vgl client running on 12.200.21.2.

These are your problematic error lines. I think the others can be ignored (while debugging a different issue I was able to replicate all those (xfwm4:113052) type lines and they’re just thrown because it’s the initial session setup).

Bus::open: Can not get ibus-daemon’s address.
IBusInputContext::createInputContext: no connection to ibus-daemon
[VGL] ERROR: Could not open display :0.

You see it looks at display :0, but you opened your display on :2 (from the line that says started on display compute-19-11:2).

Do you have XFCE libraries installed? Does the XFCE desktop work for you? For reference we have this version of ibus installed. ibus-1.5.17-5.el7.x86_64.

I know you’re trying to get paraview to work and presumably you need XFCE specifically, you just need some window manager. If you have all the MATE dependencies, and don’t want XFCE, you could try to get this running in a MATE session instead.

Here is the xfce4 and ibus packages that are installed.

compute-19-10:$ rpm -qa | grep ibus
libuser-0.60-9.el7.x86_64
ibus-1.5.17-10.el7.x86_64
ibus-chewing-1.4.4-14.el7.x86_64
ibus-setup-1.5.17-10.el7.noarch
libusbx-1.0.21-1.el7.x86_64
ibus-sayura-1.3.2-3.el7.x86_64
ibus-gtk3-1.5.17-10.el7.x86_64
ibus-m17n-1.3.4-13.el7.x86_64
ibus-libpinyin-1.6.91-4.el7.x86_64
ibus-gtk2-1.5.17-10.el7.x86_64
ibus-table-1.5.0-5.el7.noarch
ibus-libs-1.5.17-10.el7.x86_64
ibus-qt-1.3.2-4.el7.x86_64
ibus-rawcode-1.3.2-3.el7.x86_64
ibus-kkc-1.5.18-7.el7.x86_64
ibus-table-chinese-1.4.6-3.el7.noarch
libusbmuxd-1.0.10-5.el7.x86_64
ibus-hangul-1.4.2-11.el7.x86_64
libusal-1.1.11-25.el7.x86_64

compute-19-10:$ rpm -qa | grep xfce
libxfce4util-4.12.1-2.el7.x86_64
xfce4-power-manager-1.6.0-2.el7.x86_64
xfce4-settings-4.12.1-1.el7.x86_64
xfce-polkit-0.2-8.el7.x86_64
xfce4-panel-4.12.1-4.el7.x86_64
xfce4-appfinder-4.12.0-4.el7.x86_64
libxfce4ui-4.12.1-3.el7.x86_64
xfce4-session-4.12.1-8.el7.x86_64
xfce4-terminal-0.8.7.4-2.el7.x86_64
xfce4-session-engines-4.12.1-8.el7.x86_64
xfce4-pulseaudio-plugin-0.2.5-2.el7.x86_64

Does a plain XFCE desktop work for you? I think that’s what we need to get working first, then we can hop into a vglrun session within that desktop to triage further.

If you want to attempt a MATE background desktop, ensure you can boot a plain MATE desktop. Then in the job script you background that desktop (similar to how we do it in this script with a large &() block). This file is how we start a MATE desktop, so that or similar is what you’d need to start up and background.

Sure a plain XFCE desktop would work.

The matlab application is working with xfce but the paraview is still failing.

The desktop using MATE is working. What parts of the mate.sh file do I need to add to my script file.

OK, then I would try to get Paraview to work within an XFCE desktop manually to triage. Indeed any vglrun command withing the desktop may shed light on this. Maybe MATE may not be the right path yet, as it could be heavy lift to migrate this.

If a plain XFCE desktop works, then trying to get vglrun paraview to boot manually within that desktop session would be our best bet to troubleshoot. That way we replicate the environment of the job, have a good $DISPLAY already set and so on.

The XFCE desktop is working. Can start the Paraview application via the Applications->Other pull-down menu option. Running “vglrun paraview” from the XFCE terminal provides [VGL] “Error: Could not open display :0.”

Can also run Paraview using the application finder in the XFCE desktop.

Running “echo $DISPLAY” through the XFCE terminal gives :2.0
The Display name in XFCE is VNC-0. Specifying the display as :2.0 gives the following test error when running glxgears.

image

So there’s some difference in the initialization between the desktop application and running in a terminal.

Can you find that .desktop file from your applications menu? It may be in /usr/share/applications and you’re looking for the Exec entry for the command it initiates. I wonder what that is and how it differs. Clearly there’s some issue in finding your libGL.so.1 (or other files?) but it’s unclear to me why.

Here are the lines in the .desktop files in the /usr/share/applications directory that contain “Exec=”. Is this what you want?

[root@ivy applications]# grep ‘Exec=’ *.desktop
authconfig.desktop:Exec=/usr/bin/system-config-authentication
bluetooth-sendto.desktop:Exec=bluetooth-sendto
emacs.desktop:Exec=emacs %f
emacsclient.desktop:Exec=emacsclient -c --alternate-editor="" %f
gcr-prompter.desktop:Exec=/usr/libexec/gcr-prompter
gcr-viewer.desktop:Exec=/usr/bin/gcr-viewer
gkbd-keyboard-display.desktop:Exec=gkbd-keyboard-display
gnome-background-panel.desktop:Exec=gnome-control-center background
gnome-bluetooth-panel.desktop:Exec=gnome-control-center bluetooth
gnome-color-panel.desktop:Exec=gnome-control-center color
gnome-control-center.desktop:Exec=gnome-control-center --overview
gnome-datetime-panel.desktop:Exec=gnome-control-center datetime
gnome-default-apps-panel.desktop:Exec=gnome-control-center default-apps
gnome-display-panel.desktop:Exec=gnome-control-center display
gnome-info-overview-panel.desktop:Exec=gnome-control-center info-overview
gnome-keyboard-panel.desktop:Exec=gnome-control-center keyboard
gnome-mouse-panel.desktop:Exec=gnome-control-center mouse
gnome-network-panel.desktop:Exec=gnome-control-center network
gnome-notifications-panel.desktop:Exec=gnome-control-center notifications
gnome-online-accounts-panel.desktop:Exec=gnome-control-center online-accounts
gnome-power-panel.desktop:Exec=gnome-control-center power
gnome-printers-panel.desktop:Exec=gnome-control-center printers
gnome-privacy-panel.desktop:Exec=gnome-control-center privacy
gnome-region-panel.desktop:Exec=gnome-control-center region
gnome-removable-media-panel.desktop:Exec=gnome-control-center removable-media
gnome-search-panel.desktop:Exec=gnome-control-center search
gnome-sharing-panel.desktop:Exec=gnome-control-center sharing
gnome-sound-panel.desktop:Exec=gnome-control-center sound
gnome-thunderbolt-panel.desktop:Exec=gnome-control-center thunderbolt
gnome-universal-access-panel.desktop:Exec=gnome-control-center universal-access
gnome-user-accounts-panel.desktop:Exec=gnome-control-center user-accounts
gnome-wacom-panel.desktop:Exec=gnome-control-center wacom
gnome-wifi-panel.desktop:Exec=gnome-control-center wifi
gvim.desktop:Exec=gvim -f %F
ibus-setup.desktop:Exec=ibus-setup
java-1.6.0-openjdk-jconsole.desktop:Exec=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/bin/jconsole
java-1.6.0-openjdk-policytool.desktop:Exec=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/bin/policytool
java-1.7.0-openjdk-1.7.0.171-2.6.13.0.el7_4.x86_64-jconsole.desktop:Exec=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.171-2.6.13.0.el7_4.x86_64/bin//jconsole
java-1.7.0-openjdk-1.7.0.171-2.6.13.0.el7_4.x86_64-policytool.desktop:Exec=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.171-2.6.13.0.el7_4.x86_64/bin//policytool
java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64-jconsole.desktop:Exec=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/bin/jconsole
java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64-policytool.desktop:Exec=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/bin/policytool
liveinst.desktop:Exec=/usr/bin/liveinst
mutter.desktop:Exec=mutter
nm-connection-editor.desktop:Exec=nm-connection-editor
pavucontrol.desktop:Exec=pavucontrol
qt4-designer.desktop:Exec=designer-qt4
qt4-linguist.desktop:Exec=linguist-qt4
redhat-userinfo.desktop:Exec=userinfo
redhat-usermount.desktop:Exec=usermount
redhat-userpasswd.desktop:Exec=userpasswd
session-properties.desktop:Exec=gnome-session-properties
vino-server.desktop:Exec=/usr/libexec/vino-server
xdg-desktop-portal-gtk.desktop:Exec=/usr/libexec/xdg-desktop-portal-gtk

This is what I was looking for. Maybe yours is in /usr/local/share instead? In any case, we see it just runs paraview. Can you confirm that just running paraview works? I guess what I’m getting at is narrowing the issue to vglrun and not paraview.

[root@cce03674bf73 /]# cat /usr/share/applications/paraview.desktop 
[Desktop Entry]
Version=1.0
Type=Application
Name=ParaView
Comment=Parallel visualization application
Exec=paraview
TryExec=paraview
Icon=paraview

In any case, I found this comment in the virtual gl github that may apply. It seems to indicate that the GPU driver is misconfigured, or the less likely X11 headless or you need a display manager.

I’ve googled [VGL] ERROR: Could not open display :0. I believe that is our relevant error message, and I believe we have a VGL issue and not necessarily paraview.

cat /usr/share/applications/paraview.desktop gives the same output as your.

Paraview runs but gives the following

compute-19-10:$ paraview
Bus::open: Can not get ibus-daemon’s address.
IBusInputContext::createInputContext: no connection to ibus-daemon

The response of the application in slow.

It seems like the issue is with virtualGL. On the compute node we have the following display.

[root@compute-19-10 ~]# lshw -class display
*-display
description: VGA compatible controller
product: ASPEED Graphics Family
vendor: ASPEED Technology, Inc.
physical id: 0
bus info: pci@0000:06:00.0
version: 21
width: 32 bits
clock: 33MHz
capabilities: pm vga_controller cap_list rom
configuration: driver=ast latency=0
resources: irq:16 memory:90000000-90ffffff memory:91000000-9101ffff ioport:1000(size=128)

It appears that the graphics card is not GPU based (not NVIDIA). Does it make sense to try re-configuring the GPU.

The ligthdm manager is not running on the node.
[root@compute-19-10 ~]# systemctl status lightdm
● lightdm.service - Light Display Manager
Loaded: loaded (/usr/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
Active: failed (Result: start-limit) since Fri 2020-05-15 12:48:51 CDT; 3 weeks 5 days ago
Docs: man:lightdm(1)
Main PID: 4250 (code=exited, status=1/FAILURE)

May 15 12:48:51 compute-19-10 systemd[1]: lightdm.service failed.
May 15 12:48:51 compute-19-10 systemd[1]: lightdm.service holdoff time over, scheduling restart.
May 15 12:48:51 compute-19-10 systemd[1]: Stopped Light Display Manager.
May 15 12:48:51 compute-19-10 systemd[1]: start request repeated too quickly for lightdm.service
May 15 12:48:51 compute-19-10 systemd[1]: Failed to start Light Display Manager.
May 15 12:48:51 compute-19-10 systemd[1]: Unit lightdm.service entered failed state.
May 15 12:48:51 compute-19-10 systemd[1]: Triggering OnFailure= dependencies of lightdm.service.
May 15 12:48:51 compute-19-10 systemd[1]: lightdm.service failed.

Same fate for gdm.,

[root@compute-19-10 ~]# systemctl status gdm
● gdm.service - GNOME Display Manager
Loaded: loaded (/usr/lib/systemd/system/gdm.service; disabled; vendor preset: enabled)
Active: inactive (dead)

Maybe I need to restart one of the display managers.

Was able to run “vglrun xclock” but “vglrun glxspheres64” and “vglrun glxgears” fail with the “[VGL] ERROR: Could not open display :0.” error.

compute-19-10: vglrun glxspheres64 [VGL] NOTICE: Automatically setting VGL_CLIENT environment variable to [VGL] 10.200.21.2, the IP address of your SSH client. Polygons in scene: 62464 (61 spheres * 1024 polys/spheres) [VGL] ERROR: Could not open display :0. compute-19-10: vglrun xclock
compute-19-10: vglrun glxgears [VGL] NOTICE: Automatically setting VGL_CLIENT environment variable to [VGL] 10.200.21.2, the IP address of your SSH client. [VGL] ERROR: Could not open display :0. compute-19-10: vglrun xclock
[VGL] NOTICE: Automatically setting VGL_CLIENT environment variable to
[VGL] 10.200.21.2, the IP address of your SSH client.