Fails to upload of 3G +file

I’m trying to upload a file 4g or bigger and it fails

I’ve updated nginx_file_upload_max: ‘12884901888’ (12g)
I remapped pun_tmp_root to same filesystem where files will are being uploaded.
pun_tmp_root: ‘/home/ondemand/%{user}’

Watching the upload I see tmp file grows to the size of the upload and a copy operation starts copying it to its intended target. Once filesize get around 1.7g it stops and the original temp file zeros out than starts growing again.

Any thoughts suggestions to correct. We want to allow our users upload 10G files.

Example of it looping:
total 3.8G
drwx------. 2 jwaters root 1 Dec 2 13:30 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 3.8G Dec 2 13:32 .nfsb54965d4c24057f900000014
total 3914121
drwx------. 2 jwaters root 1 Dec 2 13:30 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 4008058880 Dec 2 13:32 .nfsb54965d4c24057f900000014
total 3.8G
drwx------. 2 jwaters root 1 Dec 2 13:30 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 3.8G Dec 2 13:32 .nfsb54965d4c24057f900000014
total 1
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 0 Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 1.0K
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 0 Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 1
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 0 Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 1.0K
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 0 Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 146497
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 150011904 Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 144M
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 144M Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 393217
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 381878272 Dec 2 13:33 .nfsfc2b247f427d472c00000015
total 385M
drwx------. 2 jwaters root 1 Dec 2 13:33 .
drwxr-xr-x. 7 root root 5 Dec 2 12:15 …
-rw-------. 1 jwaters jwaters 365M Dec 2 13:33 .nfsfc2b247f427d472c00000015

Thanks,
Jesse

Hi Jesse.

Thanks for posting. We will look into this tomorrow and let you know what we find.

Thanks,
-gerald

Hi Jesse.

Can you please check the following settings to see if they are set in accordance with your needs? These settings below are default, so you will need to adjust to fit your needs. If you can check these nginx settings, I’ll take a look at this issue in the morning. It sounds like a connection timeout or client_max_body_size setting to small.

proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
client_header_timeout 3m;
client_body_timeout 3m;
send_timeout 3m;
client_max_body_size 5M;

These settings were not in default

/opt/ood/nginx_stage/templates/pun.conf.erb

client_max_body_size <%= nginx_file_upload_max %>;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
#send_timeout 600; #said it is a duplicate not sure where it is set
client_header_timeout 3m;
client_body_timeout 3m;
#client_max_body_size 5M; #defined above

I put them in to force settings and no change. Start testing client_body/header settings and see.

Thanks for response,

Jesse

Hi Jesse.

To make sure I understand what you did.
Looks like you used the values that I sent you.

You will want to change those values to reflect what you need. The timeout values are in seconds, so you may want try 3600 as the timeout values and work backwards from there. 3600 seconds being 1 hour.

Hope this helps.
-gerald

I set them for their defaults and bumped them all way up to 7200 (2hr) to see if they would make a difference, with no joy.

We are running v2.0.18
centos 7, with selinux set permissive

3G file uploads and I can see filehandle growning.
nginx 29708 jwaters 10u REG 0,39 2753312798 14609299515652899220 /home/ondemand/jwaters/client_body/0000000001 (deleted)

Once file upload hits 100%, the UI says upload failed.
The move still runs anyway
cd /home/corvid/jwaters/tmp/upload_test
ls -la *tgz; ls -lah *tgz
-rw-------. 1 jwaters jwaters 982843392 Dec 3 10:04 parallel_studio_xe_2020_cluster_edition.tgz
-rw-------. 1 jwaters jwaters 993M Dec 3 10:04 parallel_studio_xe_2020_cluster_edition.tgz

Which completes, sha1sum the files and they match.

Seems more an issue with webUI and handling trapped error.

Sounds similar to File Upload Issue

This is the exact same issue, applying patch now let you know

UI still failing on +3g files to distributed filesystem (beegfs kernel mount or nfs mounted)
Any know issues when using distributed or nfs mount for storage?

Fri Dec 3 10:57:00 EST 2021
nginx 7759 jwaters 10u REG 253,1 2968947421 33683906 /tmp/ondemand/jwaters/client_body/0000000004 (deleted)
ruby 7764 jwaters 12u REG 253,1 2474639360 4735057 /tmp/PassengerTeeInput-12mlcae (deleted)

as it copies
-rw-------. 1 jwaters jwaters 459714560 Dec 3 11:38 parallel_studio_xe_2020_cluster_edition.tgz

Copy completes and sha1sum is correct, while the webui reports failed

If I point it all to local storage pun_tmp_root and upload to /tmp it works
Changed pun_tmp_root: ‘/tmp/ondemand’

Long story short I am able upload 10G file now

Rocky Linux 8
v20.0.20

modifications:
/etc/httpd/conf.d/ood-portal.conf
within <virtual host …>
Timeout 900
ProxyTimeout 900

/var/www/ood/apps/sys/dashboard/app/views/files/index.html.erb
added another zero
timeout: 128 * 10000,

/etc/ood/config/nginx_stage.yml
#added
passenger_pool_idle_time: ‘18000’

1 Like