Q-commands
optional commands to run on cluster user front-end to get info and perform some actions on batch nodes without explicit login to the nodes
Poor standard facilities of batch submission systems (like utility qstat) make debugging and monitoring of jobs without login to batch nodes impossible. The document describes how to provide lacking functionality without explicit login to batch nodes in step-by-step guide with required system tuning and extra scripts:
- login to batch nodes from master node is to provided for cluster administrator or a special user (user sgeadmin below) by SSH with authentication by RSA key:
bash> ssh-keygen -t rsa # generate RSA keys for SSH protocol version 2
<Enter># enter empty passphrase (no passphrase)
bash> PUB_KEY=`cat $HOME/.ssh/id_rsa.pub`
bash> ssh sgeadmin@bn001 echo $PUB_KEY '>> $HOME/.ssh/authorized_keys2'
sgeadmin@bn001's password: # enter remote host password
bash> ssh sgeadmin@bn001
bn001:bash> # voila! login without passwordThe remote ssh-server is to be configured to allow authentication with RSA key. The same public key is to be copied to all batch nodes.
- utility is to be run on remote host with a "tunneling" script:
bash> cat > /home/sgeadmin/bin/cluster_utility_starter.sh
#!/bin/bash
#
# Author: Alexey Filin
function usage() {
echo "Usage: PROGRAM [OPTION]... [HOST]
Run a program from list:"
cmds=`for i in ${util_nm[*]}; do echo "$i,"; done | sort`
echo " "$cmds
echo "on HOST from list:
$batch_host_list
Exit program by SIGINT/^C if a programm is looped.
-h print help and exit
-l List available hosts and exit"
}
# cluster work nodes
readonly batch_host_list="@okaf001 @okaf002 @okaf003"
# util list for remote execution
util_nm[0]="Qcat"
util_cm[0]="/bin/cat"
util_nm[1]="Qcp"
util_cm[1]="/bin/cp"
util_nm[2]="Qfree"
util_cm[2]="/usr/bin/free"
util_nm[3]="Qfree"
util_cm[3]="/usr/bin/free"
util_nm[4]="Qkill"
util_cm[4]="/bin/kill"
util_nm[5]="Qls"
util_cm[5]="/bin/ls"
util_nm[6]="Qmv"
util_cm[6]="/bin/mv"
util_nm[7]="Qps"
util_cm[7]="/bin/ps"
util_nm[8]="Qpstree"
util_cm[8]="/usr/bin/pstree"
util_nm[9]="Qrm"
util_cm[9]="/bin/rm"
util_nm[10]="Qstat"
util_cm[10]="/usr/bin/stat"
util_nm[11]="Qtop"
util_cm[11]="/usr/bin/top b"
util_nm[12]="Qvmstat"
util_cm[12]="/usr/bin/vmstat"
util_nm[13]="Qchmod"
util_cm[13]="/bin/chmod"
util_nm[14]="Qchattr"
util_cm[14]="/usr/bin/chattr"
util_nm[15]="Qchgrp"
util_cm[15]="/bin/chgrp"
util_nm[16]="Qgrep"
util_cm[16]="/bin/grep"
if test $# -le 1
then usage $0
exit 1
fi
i=0
cmd_name="$1"
shift
util_name=""
while :; do
if test $i -ge ${#util_nm[*]}; then
echo >&2 "$cmd_name: no such command"
usage
exit 1
fi
if test "${util_nm[$i]}" == "$cmd_name"; then
util_name="${util_cm[$i]}"
break
fi
let $((++i))
done
host_name=""
options=""
while test $# -gt 0
do case "$1" in
-h) echo -e "Copyright (C) 2005 Alexey Filin\n"
usage
exit ;;
-l) echo -n "Available batch hosts: "
echo $batch_host_list
exit ;;
@*) if test -n "$host_name"; then
usage
exit
fi
for i in $batch_host_list; do
if test "$i" == "$1"; then
host_name="$1"
break
fi
done
if test -z "$host_name"; then
echo "$1: no such host name"
exit
fi
shift ;;
*) options="$options $1"
shift ;;
esac
done
if test -n $host_name; then
#echo ssh -x "sgeadmin$host_name" "sudo -u $SUDO_USER $util_name $options"
ssh -x "sgeadmin$host_name" "sudo -u $SUDO_USER $util_name $options"
else
usage
exit
fi
exit
^D
bash> chmod +x /home/sgeadmin/bin/cluster_utility_starter.sh - the script above is to be run by user with wrapper script:
bash> cat > /home/sgeadmin/bin/sudo_starter.sh
#!/bin/bash
sudo -u sgeadmin /home/sgeadmin/bin/cluster_utility_starter.sh `basename $0` $*
^D
bash> chmod +x /home/sgeadmin/bin/sudo_starter.sh the wrapper script above is to be run under user sgeadmin with sudo on user front-end host:
bash# visudo
...
# SGE users are allowed to run some utilities on batch nodes
%sgeusers oka04 = (sgeadmin) NOPASSWD: /home/sgeadmin/bin/cluster_utility_starter.shwhere sgeusers is a special group which allow to run specified below utilities remotely to any user assigned with the group, oka04 is a user front-end host,
- utility on batch node is to be run with sudo to change effective uid to uid of user initiated run on user front-end:
bash# visudo
...
# Cmnd alias specification
Cmnd_Alias CLUSTERCOMMANDS = /usr/bin/top b, /bin/ps, /usr/bin/free,\
/usr/bin/vmstat, /usr/bin/stat, /bin/ls,\
/bin/cat, /usr/bin/pstree, /bin/kill,\
/bin/cp, /bin/mv, /bin/rm, /bin/chmod,\
/usr/bin/chattr, /bin/chgrp, /bin/grep
...
# sgeadmin can run any cluster command as user
sgeadmin ALL = (%sgeusers) NOPASSWD: CLUSTERCOMMANDSmodified /etc/sudoers is to be copied to other batch nodes
- symbolic links are to be created to run wrapper script with convenient names:
bash> cat > /home/sgeadmin/bin/installQcommands.sh
#!/bin/sh
#
# Author: Alexey Filin
# path to sudo starter
starter="/home/sgeadmin/bin/sudo_starter.sh"
if [ "$SGE_ROOT" = "" -o ! -d "$SGE_ROOT" ]; then
echo "$SGE_ROOT: wrong SGE_ROOT"
fi
cd "$SGE_ROOT/bin/"`$SGE_ROOT/util/arch` || exit $?
if [ ! -x "$starter" ]; then
echo "$starter: wrong sudo starter"
exit 1
fi
ln -s "$starter" Qcat && \
ln -s "$starter" Qcp && \
ln -s "$starter" Qfree && \
ln -s "$starter" Qkill && \
ln -s "$starter" Qls && \
ln -s "$starter" Qmv && \
ln -s "$starter" Qps && \
ln -s "$starter" Qpstree && \
ln -s "$starter" Qrm && \
ln -s "$starter" Qstat && \
ln -s "$starter" Qtop && \
ln -s "$starter" Qvmstat && \
ln -s "$starter" Qchmod && \
ln -s "$starter" Qchattr && \
ln -s "$starter" Qchgrp && \
ln -s "$starter" Qgrep || exit $?
^D
bash> chmod +x /home/sgeadmin/bin/installQcommands.sh
bash> /home/sgeadmin/bin/installQcommands.shwhere directory the links are created in is to be included in PATH environment vartiable of cluster users to run Q-commands as usual utilities, the requirements is fulfilled by SGE environment setup script
- if all the issues above have been done in right way any of Q-commands can be run by any cluster user, log in as a test cluster user and check it
Have a fun!