I’m using Docker to launch NVidia containers to run Jupyter. Unfortunately, the containers are not stopped when the job completes, and resources are withheld as a result. I’m thinking I could handle this by generating a random string in before.sh.erb, save it as an env var, use it as part of the docker run command in script.sh.erb, and then clean up in after.sh.erb. Does this sound about right?
A related question is, am I reinventing the wheel here? I would think this would be a common need. Especially since DeepOps uses OOD. Makes me wonder if I’m overlooking a prior solution to this…
Thanks, Clark