Directory Copy and Paste issue with Lustre

Hello,
We are facing an issue with directory copy and paste via File Explorer. It’s working ok with normal NFS share or local folders, but when I copy and paste a folder within Lustre share or from Lustre to other locations. Inside contents are missing.

I have tested from other locations to Lustre share and it worked fine. Any advice would be appreciated.

Thank you

Unfortunately I’m unable to reproduce the issue, but the code that does this is an old version of https://github.com/coderaiser/node-copymitter: https://github.com/coderaiser/node-copymitter/tree/v1.8.10

Up until 1.4 we couldn’t update this dependency. I’m investigating what version we can update to and verify the app is functional as before. If it is I’ll give you a test that you try to see if it fixes the problem.

So you could try this. If using OnDemand 1.4, you can cd /var/www/ood/apps/sys/files/ then edit lib/cloudcmd/package.json and change "copymitter": "1.8.10", to "copymitter": "3.2.3",. Then in the root directory of the app run bin/setup and it will rebuild the app with the updated dependency.

I do not know if this will address the problem you are facing or not though. If it does let me know and I’ll be sure to get this dependency update in the 1.5 release.

One thing I noticed while testing this is that at OSC on our web nodes directories like the project and scratch space ones auto-mount after you first cd to that directory. This means getting a listing of the project and scratch space directories before cd-ing to that directory doesn’t work. I doubt something like that is in play for you here but just thought I’d mention it.

Thanks for taking care of this issue. We are using 1.4 and I ran bin/setup but I got the following error.

[root@ood-dev files]# cat /var/www/ood/apps/sys/files/lib/cloudcmd/package.json | grep copymitter
    "copymitter": "3.2.3",
[root@ood-dev files]# bin/setup
cd /var/www/ood/apps/sys/files

== Building Files App ==

== Installing dependencies ==
rm -rf node_modules
npm install --production --prefix tmp yarn
/usr/share/gems/gems/rake-0.9.6/lib/rake/file_utils.rb:53:in `block in create_shell_runner': Command failed with status (127): [npm install --production --prefix tmp yarn...] (RuntimeError)
	from /usr/share/gems/gems/rake-0.9.6/lib/rake/file_utils.rb:45:in `call'
	from /usr/share/gems/gems/rake-0.9.6/lib/rake/file_utils.rb:45:in `sh'
	from /usr/share/gems/gems/rake-0.9.6/lib/rake/file_utils_ext.rb:40:in `sh'
	from bin/setup:30:in `block in <main>'
	from /usr/share/ruby/fileutils.rb:125:in `chdir'
	from /usr/share/ruby/fileutils.rb:125:in `cd'
	from /usr/share/gems/gems/rake-0.9.6/lib/rake/file_utils_ext.rb:40:in `chdir'
	from bin/setup:10:in `<main>'

Hello efranz,

I was able to download the files from https://github.com/OSC/ood-fileexplorer and ran scl enable rh-git29 rh-ruby24 rh-nodejs6 -- bin/setup and sudo rsync -rlptv --delete . /var/www/ood/apps/sys/files for test.

However, it didn’t resolve the problem.

Hmm. So I’ll investigate setting up another test you can run so maybe we can patch the app with a different copy mechanism to use than the one that is currently being used. I won’t be able to work on this till mid this week but will keep you posted.

Could you let me know the options you are using for Lustre mount? I was wondering if it’s something with our Luste options since you was not able to reproduce the issue. Below is our Lustre mount option.

192.68.0.3@o2ib1:192.168.0.4@o2ib1:/snx11093 /sfs/lustre lustre rw,localflock,noauto 0 0

Thank you

We don’t use Lustre anymore. If you remove localflock is there still a problem? Otherwise I’ll look into an alternative now that you can build a version of the files app from https://github.com/OSC/ood-fileexplorer.

I tested it without localflock but didn’t work. I would appreciate if you could provide alternative.

Thank you

Here is another experiment you could do. Add a directory to scratch space that has a few files (a job template, lets say). Then in the Job Composer, as long as your dataroot is set to the default - the home directory, you could choose “New From Specified Path” and paste the full path to the scratch space directory to copy in the “Source path” field.

Verify the copy succeeds.

If that experiment fails then it is fair to say that simply replacing copymitter with another npm library to do the copy may not work.

Also have you confirmed that you can ssh to the OnDemand web node and manually accomplish the copy action you are trying? Without previously listing the contents of the scratch space directory you are copying.

I have tested both using ssh and job composer. Both worked fine and all sub-directories an contents were copied correctly. Only copying using File Explorer has an issue.

Ok I created a new branch on thes files app. Please git fetch then git checkout use_rsync_instead_of_copymitter and re-run scl enable rh-git29 rh-ruby24 rh-nodejs6 -- bin/setup and sudo rsync -rlptv --delete . /var/www/ood/apps/sys/files for testing. Let me know if this solves your problem.

Unfortunately, it didn’t work. Still missing contents in a folder.

Seems like it is an issue related to mounting the files. On our web nodes we auto-mount project space directories, for example, so if you cd to the project space parent directory, just executing ls doesn’t list all of the sub directories. You would have to cd directly to a project space directory before it is mounted and the files appear. So in the Files app you can access a project space directory directly, but not by navigating through the Files app.

At this point I don’t think I will be able to help without the ability to do some more direct experiments. Please reach out to me directly at efranz@osc.edu - if you are willing to provide me access to your OnDemand node in the form of an account on your systems then I maybe able to debug the problem.

I really appreciate your helps. I will check if we can give you an access to our OOD. I will contact you with the email. Thanks!

@efranz Sorry about the delay, we are currently updating our Lustre scratch. I will test with a new scratch and contact you via email if the problem still exists on new Lustre scratch.

1 Like

I discussed this with @gp4r offline and we at CHPC also have this issue - it’s with copying contents of a Lustre directory to a different location. Copying plain files is OK.
Eric, you should still have an account at CHPC so feel free to try it out by going to ondemand.chpc.utah.edu and let us know if you need any help with it.
Thanks,
MC

Folks,

Note that the patch release of OnDemand put out today (Open OnDemand 1.6.17 patch release now available) contains a fix for the issues reported on Lustre.