Terminate function waiting for port to be open

Hi!

Since I’m using a containerised Jupyter Notebook app, I increased the time for the function:
wait_until_port_used.
The problem I have is that if the job fails for other reasons, I have it still running until the timeout of the wait_until_port_used function is passed. Is it possible (in a clean way) to exit that function if I reach the end of the “script.sh”?
I could run a “kill $PPID” but this looks too aggressive (and will not run the clean.sh function).
I’m using LSF scheduler, not sure this happens with other schedulers as well.
Have anyone faced this issue?

Thanks!

Instead of overriding wait_until_port_used why not just rewrite you’re after.sh.erb to check for both conditions? The port being open AND the process being alive?

I mean, just change the exit conditions to also check for jupyter still being alive.

Hi Jeff. Thanks for the hint.
I still don’t get how to do it. I may have get wrong how this works. So after.sh.erb is running:
if wait_until_port_used…
the wait_until_port_used function will not return to the after.sh file until either the port is used or the timeout is reached, so while waiting for this function I don’t think I can check any other condition.
How would you change the after.sh.erb file and how should the condition look like?

So best solution I found so far it is to overwrite the function but in after.sh.erb, I re-declare the function so it uses the “local” one.

# Wait $2 seconds until port $1 is in use
# Default: wait 30 seconds
wait_until_port_used () {
    local port="${1}"
    local time="${2:-30}"
    local script_pid="${3}"
    local pid_check=${script_pid}
    for ((i=1; i<=time*2; i++)); do
        port_used "${port}"
        port_status=$?
        if [ "$port_status" == "0" ]; then
            return 0
        elif [ "$port_status" == "127" ]; then
            echo "commands to find port were either not found or inaccessible."
            echo "command options are lsof, nc, bash's /dev/tcp, or python (or python3) with socket lib."
            return 127
        fi
        pid_check=`ps -o pid | grep -w ${script_pid}`
        if [[ -z "${pid_check}" ]]; then
            echo "Script process terminated!"
            return 1
        else
            sleep 0.5
        fi
    done
    return 1
}

I pass the pid as argument and use in the function to check.
I’m open to any better way to do it.

I gotcha. So what does you’re after.sh.erb look like now? If I were update our jupyter after.sh.erb, here’s what it’d look like.

This wraps the wait_until_port_used into a retry while loop so that it’ll only wait_until_port_used if the pid is actually running. So it’ll give you sleep time * retires seconds (3*10, 30 seconds) to start-up before entering the wait_until_port_used.

retries=0
while [ $retries -lt 10 ]
do

  if ps ${script_pid} >/dev/null 2>&1; then
    echo "Waiting for Jupyter server to open port ${port}..."
    if wait_until_port_used "${host}:${port}" 600; then
      echo "Discovered Jupyter server listening on port ${port}!"
    else
      echo "Timed out waiting for Jupyter server to open port ${port}!"
      clean_up 1
    fi
  else
    echo "script pid $script_pid doesn't exist. retrying."
    retires=$((retries+1))
    sleep 3
  fi

done

if retires -ge 10; then
  echo "${script_pid} never came up after ${retries}"
  clean_up 1
fi

another variation may be something like:

function ps_up_and_port_used(){
  local script_pid=$1
  local host=$2
  local port=$3

  if ps ${script_pid} >/dev/null 2>&1; then
    if wait_until_port_used "${host}:${port}" 30;
      return 0
    else
      return 1    
    fi
  else
    return 1
  fi
}


retries=0
while [ $retries -lt 10 ]
do
  if ps_up_and_port_used $script_pid $host $port; then
    echo "Discovered Jupyter server listening on port ${port}!"
  else
    echo "either pid $script_pid doesn't exist or $port not yet open. retrying."
    retires=$((retries+1))
    sleep 3
  fi
done

if retires -ge 10; then
  echo "${script_pid} never came up after ${retries}"
  clean_up 1
fi

But you’re addition is not unwarranted. I think it may actually be needed in the source code to check for both the process running and the port being open. Though keeping your own branch/modifications up to date with releases may be a struggle.

Pull requests welcome! I think we should ship with this functionality, maybe not in the wait_until_port_used function, but in some new function that does both.

In my after.sh.erb I just added the function that I posted before at the top (so that it get used instead of the “default” one) and I call the function passing a 3rd argument:
if wait_until_port_used "${host}:${port}" 600 ${SCRIPT_PID}; then
SCRIPT_PID is declared by the script itself (I mean the “default” script that is calling before, script and after scripts)
I tried something similar to what you suggested but the problem is that as soon as you reach this part:
if wait_until_port_used "${host}:${port}" 600; then
you will be just waiting for the function to return. So after.sh script will anyway hang on that call.
That’s why I thought to change the function to not just “check that the port is used” but also “that it still makes sense to check”.
I wrote all my code in the thread but I can make a PR, just don’t know which file I should push. I can add the function I modified in after.sh with another name (wait_until_port_used_and_script_running :smiley: ) and make a PR. Anyway, I feel this can be useful in all apps waiting for port to be open.