Misadventures in non-vnc Matlab app construction

Firstly, my apologies if I should be directly contacting VT-ARC or the Mathworks people about this – I thought surely someone else would eventually ask about this topic (given the intense interest during the tips-n-tricks call) but I’ve not seen anything here, nor in ARC’s github-issues. If there’s a better venue (e.g. discord/slack/etc), please let me know, thanks!

Trying to follow the Dev Guide from VT, I’ve made the following observations/additions/etc/etc…

The guide mentions modifying manifest.yml, but I wonder if that is supposed to suggest modifying form.yml? (To specify your cluster and customize fields.)

Similarly, although the guide does mention updating template/script.sh.erb for the path to the .sif image, I also needed to update several of the bind parameters. Beyond the obvious filesystem differences, though, it was definitely not clear to me at first that $MATLAB_DIR and $TMPFS seem to be VT-site-specific environment variables. I also had to bind our newer version of readline’s ‘libhistory’.

[jason@wind] $ cd ~/ondemand/dev/bc_vt_matlab_html/template
[jason@wind] $ diff -u script.sh.erb-origBindings script.sh.erb
--- script.sh.erb-origBindings  2021-11-24 12:07:04.000000000 -0700
+++ script.sh.erb               2021-12-10 11:13:59.033674779 -0700
@@ -20,12 +20,13 @@
+echo "SUSPECTED VT-SPECIFIC VARS: MATLAB_DIR=$MATLAB_DIR ... TMPFS=$TMPFS"
 export SINGULARITYENV_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
 export SINGULARITYENV_PATH=$PATH
 singularity run --nv --writable-tmpfs \
-    --bind=$MATLAB_DIR:/opt/matlab,$TMPFS:/tmp,/work/${USER},/projects \
-    --bind=`pwd`/matlab.rc:/mathworks.rc,/cm,/etc/slurm/slurm.conf \
-    --bind=/lib64/libhistory.so.6:/lib/x86_64-linux-gnu/libhistory.so.6 \
+    --bind=/packages/matlab/R2021b:/opt/matlab,/tmp,/projects \
+    --bind=`pwd`/matlab.rc:/mathworks.rc,/etc/slurm/slurm.conf \
+    --bind=/lib64/libhistory.so.7:/lib/x86_64-linux-gnu/libhistory.so.6 \
     --bind=/usr/lib64/libmunge.so.2:/lib/x86_64-linux-gnu/libmunge.so.2,/var/run/munge \
     --bind=`pwd`/entrypoint.sh:/entrypoint.sh \
     /home/jason/ondemand/dev/bc_vt_matlab_html/matlab.sif bash /entrypoint.sh

When I launch my dev Matlab app, it gets stuck “Starting”, and the app/script output indicates that matlab-jupyter-app cannot be found…

[jason@wind] $ cd ~/ondemand/data/sys/dashboard/batch_connect/dev/bc_vt_matlab_html/output/f36b9577-ac0f-4894-9adb-b3d318b6405a
[jason@wind] $ cat output.log
starting before
No modules loaded
Script starting...
Waiting for Matlab to open port 41492...
/home/jason/ondemand/data/sys/dashboard/batch_connect/dev/bc_vt_matlab_html/output/f36b9577-ac0f-4894-9adb-b3d318b6405a
module works
starting singularity
starting Matlab on cn31 using 41492
SUSPECTED VT-SPECIFIC VARS: MATLAB_DIR= ... TMPFS=
retrieved ENV variables from matlab.rc
MWI_APP_PORT=41492
MWI_BASE_URL=/matlab
TMPDIR=/tmp
MWI_EXT_URL=ood.arc.vt.edu
MLM_LICENSE_FILE=/opt/matlab/licenses/network.lic
To use the web-desktop: http://ood.arc.vt.edu/matlab/index.html
starting web matlab
/entrypoint.sh: line 30: matlab-jupyter-app: command not found

Though my app session is stuck starting up, I can stay in the ondemand-generated working-directory to leverage the resources it already prepared. When I interactively shell in, it’s now clearly on my $PATH:

[jason@wind] $ singularity shell --nv --writable-tmpfs                                   \
>     --bind=/packages/matlab/R2021b:/opt/matlab,/tmp,/projects                          \
>     --bind=`pwd`/matlab.rc:/mathworks.rc,/etc/slurm/slurm.conf                         \
>     --bind=/lib64/libhistory.so.7:/lib/x86_64-linux-gnu/libhistory.so.6                \
>     --bind=/usr/lib64/libmunge.so.2:/lib/x86_64-linux-gnu/libmunge.so.2,/var/run/munge \
>     --bind=`pwd`/entrypoint.sh:/entrypoint.sh                                          \
>     /home/jason/ondemand/dev/bc_vt_matlab_html/matlab.sif
INFO:    Could not find any nv files on this host!

Singularity> which matlab-jupyter-app
/usr/local/bin/matlab-jupyter-app

Now, obviously there are some variables in template/before.sh.erb (that define the ephemeral matlab.rc) that everyone should be localizing; but more to the point, I can modify template/entrypoint.sh to provide the full path to matlab-jupyter-app to move things along.

$ diff -u entrypoint.sh-orig entrypoint.sh
--- entrypoint.sh-orig  2021-12-10 12:50:53.352527908 -0700
+++ entrypoint.sh       2021-12-10 12:51:36.872916964 -0700
@@ -27,4 +27,4 @@
 echo ""
 echo starting web matlab
-matlab-jupyter-app
+/usr/local/bin/matlab-jupyter-app

Unforunately, although that definitely got me closer and I can launch the app, I’m still hitting some error and not sure how to go about diagnosing/debugging it:

Now, when I view my app sessions, this Matlab one is still appears active/available and when I click the blue “Connect” button, I get a similar looking page that has empty promises about error logs. Clicking the “Start MATLAB Session” button here starts the cycle over, at the “Starting” screenshot.


The output.log file doesn’t show anything enlightening, and there doesn’t appear to be anything remotely special in my /var/log/ondemand-nginx logs.

$ tail -f output.log
INFO:MATLABProxyApp:MATLAB_LOG_DIR:/tmp/MWI/31511
INFO:MATLABProxyApp:MATLAB_READY_FILE:/tmp/MWI/31511/connector.securePort
INFO:MATLABProxyApp:Starting MATLAB on port 31511
INFO:MATLABProxyApp:Installing handler for signal: 1
INFO:MATLABProxyApp:Installing handler for signal: 2
INFO:MATLABProxyApp:Installing handler for signal: 3
INFO:MATLABProxyApp:Installing handler for signal: 15
MATLAB is selecting SOFTWARE OPENGL rendering.
Discovered Matlab listening on port 18154!
Generating connection YAML file...

                            < M A T L A B (R) >
                  Copyright 1984-2021 The MathWorks, Inc.
                  R2021b (9.11.0.1769968) 64-bit (glnxa64)
                             September 17, 2021

INFO:MATLABProxyApp:Waiting for MATLAB to exit...
INFO:MATLABProxyApp:MATLAB has exited with errorcode: -9
ERROR:MATLABProxyApp:MATLAB returned an unexpected error. For more details, see the log below.

INFO:MATLABProxyApp:Cleaning up matlab_ready_file.../tmp/MWI/31511/connector.securePort
INFO:MATLABProxyApp:MATLAB_LOG_DIR:/tmp/MWI/31511
INFO:MATLABProxyApp:MATLAB_READY_FILE:/tmp/MWI/31511/connector.securePort
INFO:MATLABProxyApp:Starting MATLAB on port 31511
MATLAB is selecting SOFTWARE OPENGL rendering.

                            < M A T L A B (R) >
                  Copyright 1984-2021 The MathWorks, Inc.
                  R2021b (9.11.0.1769968) 64-bit (glnxa64)
                             September 17, 2021

INFO:MATLABProxyApp:Waiting for MATLAB to exit...
INFO:MATLABProxyApp:MATLAB has exited with errorcode: -9
ERROR:MATLABProxyApp:MATLAB returned an unexpected error. For more details, see the log below.

Note that that is the entirety of the file, and “the long below” is evidently empty.

Suggestions?
Thank you!
Jason Buechler
NAU Monsoon HPC

A bunch of work later, I fixed some ruby typos, tweaked resources allotment, and probably some other things. But I still would appreciate some help understanding a few things:

  • at the end of the entrypoint script, I don’t understand why I have to provide the full path to matlab-jupyter-app (as mentioned above, I can shell-into the container and it IS in root’s path)
  • Unsure the source or significance of an “/opt/matlab/parallel_remote” warning that appears after the MATLAB masthead, as displayed in a session’s output.log (see below)
  • is it fine to leave MWI_BASE_URL=/matlab in before.sh? (I ask because these comments make it seem like we should be matching our OOD installation’s reverse-proxy → OnDemandApps/Dockerfile at 609eb334cec0d95a2163500556644ac01454a256 · AdvancedResearchComputing/OnDemandApps · GitHub)
  • Starting the app up takes a LONG. LONG. TIME. Much more than a few minutes, oftentimes, though not all times. strace’ing the activity on the compute-node shows it looking at an endless list of files, as though it were validating the installation base every. single. run.

Here is that “/opt/matlab/parallel_remote” warning that appears after the MATLAB masthead, as displayed in a session’s output.log:

INFO:MATLABProxyApp:Installing handler for signal: 15
MATLAB is selecting SOFTWARE OPENGL rendering.
Discovered Matlab listening on port 58714!
Generating connection YAML file...
ESC[?1hESC=
                            < M A T L A B (R) >
                  Copyright 1984-2021 The MathWorks, Inc.
                  R2021b (9.11.0.1769968) 64-bit (glnxa64)
                             September 17, 2021

Warning: Name is nonexistent or not a directory: /opt/matlab/parallel_remote

To get started, type doc.
For product information, visit www.mathworks.com.

Starting CPP Connector on Worker
Warming up worker

Thanks for any input!
–jason