Custom home directory broken after upgrade to 1.6.x

We use the custom home directory setup (you added for us!) and it was working until we upgraded to 1.6.7-1. I just upgraded to 1.6.17-2 in our development environment to see if it was fixed but it isn’t. Users without a home directory are getting the error:
chdir(2) failed: no such file or directory
Your connection to the remote server has been terminated.

I thought it might be an issue with DEFAULT_SSHHOST but I tried pointing the app directly to one of our login servers and it’s still failing. This does work for our users if they login to our server via SSH so I’m pretty sure it’s an OOD issue.

Let me know what other info you need from us. Thanks!
Dori
UB CCR

Is this problem isolated to the Shell app? I recall the fix we added was to address when a user first accesses OnDemand they would see an error.

We haven’t tested any other apps. That’s as far as they can get without a home directory so they launch a shell and it creates the home directory. Then they restart the OOD webserver process and the OOD dashboard comes up.

Ah so before they would launch the shell and the home directory gets created. Now they launch the shell and they get an error instead. Here are all the changes to the shell app between 1.5.5 and 1.6.17: https://github.com/OSC/ood-shell/compare/v1.4.2...v1.4.6

I don’t see any changes there specific to the type of problem you could be seeing.

We do have this line:

If the URL being used to open the shell app specifies a path after the host i.e. instead of https://ondemand.osc.edu/pun/sys/shell/ssh/owens.osc.edu it is something like https://ondemand.osc.edu/pun/sys/shell/ssh/ondemand.osc.edu/users/PZS0562/efranz/ondemand/data/ then when initiating the ssh connection the command cd /users/PZS0562/efranz/ondemand/data/ will be attempted.

Perhaps that is what is happening here? You could probably see in the user’s nginx access logs that are experiencing this problem what URL they tried to access.

I’m not seeing anything in the logs except:

App 18475 output: Connection established
App 18475 output: Opened terminal: 18529
App 18475 output: Closed terminal: 18529
[ N 2019-10-03 09:36:10.3311 18453/T4 age/Cor/CoreMain.cpp:1117 ]: Checking whether to disconnect long-running connections for process 18475, application /var/www/ood/apps/sys/shell (production)

[ N 2019-10-03 10:00:03.1713 18453/T7 age/Cor/CoreMain.cpp:641 ]: Signal received. Gracefully shutting down… (send signal 2 more time(s) to force shutdown)
[ N 2019-10-03 10:00:03.1713 18453/T1 age/Cor/CoreMain.cpp:1216 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected…
[ N 2019-10-03 10:00:03.1714 18453/Ta Ser/Server.h:902 ]: [ServerThr.2] Freed 0 spare client objects
[ N 2019-10-03 10:00:03.1714 18453/Ta Ser/Server.h:558 ]: [ServerThr.2] Shutdown finished
[ N 2019-10-03 10:00:03.1714 18453/T7 Ser/Server.h:902 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2019-10-03 10:00:03.1715 18453/T7 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2019-10-03 10:00:03.1718 18453/Tc Ser/Server.h:902 ]: [ApiServer] Freed 0 spare client objects
[ N 2019-10-03 10:00:03.1718 18453/Tc Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2019-10-03 10:00:03.2040 18453/T1 age/Cor/CoreMain.cpp:1295 ]: Passenger core shutdown finished

This is what we have in our missing_home_directory.html file:
Click here to open a shell to create your home directory

/etc/ood/config/apps/shell/env contains:
DEFAULT_SSHHOST=“vortex”

NOTE: I tried FQDN here as well

We’re not purposely using ‘cd’ into any directory but this is what happens when a new user logs in. The home directory is created and then the user is dumped into it. Nothing has changed on that end though.

Thanks for your help!
Dori

I think I know what the problem is, though will need time to test it out. I think it broke when we updated from https://github.com/chjj/pty.js to https://github.com/microsoft/node-pty (the microsoft version is a fork with updated code). This commit may have introduced the problem. Though, the error message there is “chdir(2) failed.” not “chdir(2) failed: no such file or directory” so that is odd.

One solution might be to modify the shell app so that it omits the cwd argument to pty.spawn if the home directory is found to not exist.

I haven’t tested this yet however.

edited to include in “solution”: release 1.6.19 has a fix for this

@dsajdak there is a new patch release 1.6.19 in the /latest/ repo i.e. https://yum.osc.edu/ondemand/latest/web/el7/x86_64/ that has the fix I think you need. Would you be willing to test this on your test server?

To test this, you would execute these commands:

yum install https://yum.osc.edu/ondemand/latest/ondemand-release-web-latest-1-6.noarch.rpm
yum clean all
yum update ondemand 

then when you want to go back to the 1.6 release repo (versus the latest) do

yum remove ondemand-release-web-latest
yum install https://yum.osc.edu/ondemand/1.6/ondemand-release-web-1.6-1.noarch.rpm
yum clean all
yum update ondemand

If this solves the problem we will make this patch release available for everybody.

Hi @efranz - yes we have a development environment so I can test this this afternoon. Thanks for working on it so quickly!
Dori

This update fixed the problem. Thanks very much!!