i successfully launched a jupyter notebook on slurm from OOD.
however, when i click on the “Connect to Jupyter” button in
/pun/sys/dashboard/batch_connect/sessions
this leads me to
http:///node/worker001/44559/login
instead of
http://:44559/node/worker001/44559/login
where the notebook is actually running.
which options do i need to change for this url to lead to the correct host + port?
If you notice, the URL is absolute - the assumption is that OnDemand or the /node or /rnode proxies do not have a suburi. So in OSC OnDemand’s case, we would have https://ondemand.osc.edu and then this /URL would be appended to that base domain.
Now we run OnDemand on port 80 and have not done testing to see what problems we might run into running OnDemand on a different port.
yes, i forgot the host in the urls i shared. i meant that at:
http:///pun/sys/dashboard/batch_connect/sessions
i clicked on “Connect to Jupyter”
and was led to
http:///node/worker001/27910/login
instead of
http:///node/worker001/27910/login
jupyter notebook --config=/home/users/neranjan/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/e0dc01e9-d850-4fc0-8f78-d856e1e0758e/config.py
[W 11:20:34.326 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 11:20:34.360 NotebookApp] Serving notebooks from local directory: /home/users/neranjan
[I 11:20:34.360 NotebookApp] The Jupyter Notebook is running at:
[I 11:20:34.360 NotebookApp] http://node1.rs.gsu.edu:31263/node/node1.rs.gsu.edu/31263
[I 11:20:34.360 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Timed out waiting for Jupyter Notebook server to open port 31263!
TIMING - Wait ended at: Fri Sep 27 11:25:27 EDT 2019
[C 11:25:27.350 NotebookApp] received signal 15, stopping
Cleaning up…
[I 11:25:27.362 NotebookApp] Shutting down 0 kernels
/var/spool/slurmd/blue/job01029/slurm_script: line 20: 458659 Terminated “/home/users/neranjan/ondemand/data/sys/dashboard/batch_connect/dev/jupyter/output/e0dc01e9-d850-4fc0-8f78-d856e1e0758e/script.sh”
How can I make-sure the server correctly do the proxy.
I have interactive app working correctly, btw. I can launch VNC sessions without any problems.
Are you sure that your view.html.erb is the same as our example?
In it, as Eric says, it holds this line which is actually what you’re redirecting to.
<form action="/node/<%= host %>/<%= port %>/login" method="post" target="_blank">
Which just gave me this image below. You can see the action does not have a host or port in it, it’s simply the path bit (the host:port being the one I’m currently connected to).
If you confirm your view.html.erb is the same as our example, can you show us what your form looks like (the html div that I’ve given in the image above).
Please let me know if I’m still not clear. My main question if how to make sure the main OOD app correctly navigate a user to Jupyter instance once it created. Right now a user has to manually get the URL from the output file and change the URL to correct one. Even with this method, I do not see a way to get the password. Please help.
OK I see what’s going on, I apologize for not taking a better look at the logs before.
We’re disconnecting you in this file because we think that that port is not yet open. Here’s the file that’s giving you trouble. That function wait_until_port_used is here in this library. It’s basically just netcat on the host and port.
As you have shown, this should work, as you’re able to connect to it. My guess is, however, that there’s some problem connecting to the local port from the machine itself. Some firewall or ip table rule could be blocking you. Note how it times out, not immediately fails with something like ‘connection refused’.
If you’re able to ssh into your host (like cder15.rs.gsu.edu from that logfile) during this time, see if you can’t run the same nc command nc -w 2 $HOST $PORT < /dev/null &> /dev/null (replacing host and port here appropriately).
If this fails try with just localhost instead of the hostname. Maybe there’s some problem with routing, but you can connect to it if you just call it localhost or 127.0.0.1. If this if you can connect to it through localhost just replace the ${host} bit from the after.sh shown above with localhost. This way that shell script will use localhost instead of the network name.
If that doesn’t work, if you still can’t connect to it through localhost, you’ll have to reach out to your administrators to see what networking rules are blocking you. Obviously you can connect from the outside but you also need to connect to that port locally.
@neranjan it is possible that nc is not available or fails to exit with a 0 status when the port is in use. We don’t use nc to check for the port in use when launching TurboVNC, so that is likely why you see things working for VNC and not for Jupyter. The fact that we do not handle this case is a bug and is captured https://github.com/OSC/ood_core/issues/153. This mailing list thread from Oct 2018 covers the problem in greater detail https://listsprd.osu.edu/pipermail/ood-users/2018-October/000269.html.
Note: that example provided for overriding port_used uses lsof instead. If your bash is a new enough version you may also be able to use the pseudo-device /dev/tcp to determine if the port is in use (mentioned in the body of the GitHub issue)
It boots on that node, so logs indicating that it boots on that node is appropriate. When you hit the apache process on head.rs.gsu.edu it essentially redirecs to node1.rs.gsu.edu. All of that is well and good, because it doesn’t know about the redirection.
It works when you do nc -w 2 node1.rs.gsu.edu 31263 or when you try with localhost? If it works with localhost you can modify this line of your after.sh to use "localhost" instead of "${host}".
OR if you want to use lsof instead,
Where you want to add the override is in your before.sh.erb about in the same place as I’ve marked. No need to modify the ood_core library (the safer way would be to add an initializer in the dashboard in /etc/ood/config/apps/dashboard/initializers/template_override.rb and override it there).
I wouldn’t edit the dashboard gem file directly. If you add or edit the Jupyter plugin file template/before.sh to override the port_used that would be preferred…though it would seem you would need that fix for every interactive app that is not VNC.
Though you said nc is installed on the server… I think the issue is that it seems that this function is returning a non-zero value when the port is of the Jupyter notebook server:
This function accepts an argument like: “host:port” i.e. “localhost:1234”. So either the command is not correct for your version of nc or the host and port values are not set correctly when passed into this function.
This is necessary so when Jupyter Notebook server creates hyperlinks in its response HTML, those links are not broken links, but when followed, the requests make it back to the Jupyter Notebook server.
I think I finally solve this problem. It seems you need to disable SELinux on compute nodes as well. I had SELinux disabled on ondemand node but cluster has SELinux enabled. once I disable SELinux on the node it worked. Thanks for all your help.
@neranjan other sites have SELinux enabled on the compute nodes and have no problems. You shouldn’t need to disable SELinux on the compute nodes. Perhaps the fact doing this fixes the problem might be a clue to what the problem was. @tdockendorf any ideas?