Personal tools
You are here: Home Members zopeadmin guides & specs administration Q-commands
Document Actions

Q-commands

by portal administrator last modified July 5, 2006 at 16:51

optional commands to run on cluster user front-end to get info and perform some actions on batch nodes without explicit login to the nodes


Poor standard facilities of batch submission systems (like utility qstat) make debugging and monitoring of jobs without login to batch nodes impossible. The document describes how to provide lacking functionality without explicit login to batch nodes in step-by-step guide with required system tuning and extra scripts:

  1. login to batch nodes from master node is to provided for cluster administrator or a special user (user sgeadmin below) by SSH with authentication by RSA key:
    bash> ssh-keygen -t rsa # generate RSA keys for SSH protocol version 2
    <Enter># enter empty passphrase (no passphrase)
    bash> PUB_KEY=`cat $HOME/.ssh/id_rsa.pub`
    bash> ssh sgeadmin@bn001 echo $PUB_KEY '>> $HOME/.ssh/authorized_keys2'
    sgeadmin@bn001's password: # enter remote host password
    bash> ssh sgeadmin@bn001
    bn001:bash> # voila! login without password

    The remote ssh-server is to be configured to allow authentication with RSA key. The same public key is to be copied to all batch nodes.

  2. utility is to be run on remote host with a "tunneling" script:
    bash> cat > /home/sgeadmin/bin/cluster_utility_starter.sh
    #!/bin/bash
    #
    # Author: Alexey Filin

    function usage() {
    echo "Usage: PROGRAM [OPTION]... [HOST]
    Run a program from list:"
    cmds=`for i in ${util_nm[*]}; do echo "$i,"; done | sort`
    echo " "$cmds
    echo "on HOST from list:
    $batch_host_list
    Exit program by SIGINT/^C if a programm is looped.

    -h print help and exit
    -l List available hosts and exit"
    }

    # cluster work nodes
    readonly batch_host_list="@okaf001 @okaf002 @okaf003"

    # util list for remote execution
    util_nm[0]="Qcat"
    util_cm[0]="/bin/cat"

    util_nm[1]="Qcp"
    util_cm[1]="/bin/cp"

    util_nm[2]="Qfree"
    util_cm[2]="/usr/bin/free"

    util_nm[3]="Qfree"
    util_cm[3]="/usr/bin/free"

    util_nm[4]="Qkill"
    util_cm[4]="/bin/kill"

    util_nm[5]="Qls"
    util_cm[5]="/bin/ls"

    util_nm[6]="Qmv"
    util_cm[6]="/bin/mv"

    util_nm[7]="Qps"
    util_cm[7]="/bin/ps"

    util_nm[8]="Qpstree"
    util_cm[8]="/usr/bin/pstree"

    util_nm[9]="Qrm"
    util_cm[9]="/bin/rm"

    util_nm[10]="Qstat"
    util_cm[10]="/usr/bin/stat"

    util_nm[11]="Qtop"
    util_cm[11]="/usr/bin/top b"

    util_nm[12]="Qvmstat"
    util_cm[12]="/usr/bin/vmstat"

    util_nm[13]="Qchmod"
    util_cm[13]="/bin/chmod"

    util_nm[14]="Qchattr"
    util_cm[14]="/usr/bin/chattr"

    util_nm[15]="Qchgrp"
    util_cm[15]="/bin/chgrp"

    util_nm[16]="Qgrep"
    util_cm[16]="/bin/grep"

    if test $# -le 1
    then usage $0
    exit 1
    fi

    i=0
    cmd_name="$1"
    shift
    util_name=""

    while :; do
    if test $i -ge ${#util_nm[*]}; then
    echo >&2 "$cmd_name: no such command"
    usage
    exit 1
    fi
    if test "${util_nm[$i]}" == "$cmd_name"; then
    util_name="${util_cm[$i]}"
    break
    fi
    let $((++i))
    done

    host_name=""
    options=""

    while test $# -gt 0
    do case "$1" in
    -h) echo -e "Copyright (C) 2005 Alexey Filin\n"
    usage
    exit ;;
    -l) echo -n "Available batch hosts: "
    echo $batch_host_list
    exit ;;
    @*) if test -n "$host_name"; then
    usage
    exit
    fi
    for i in $batch_host_list; do
    if test "$i" == "$1"; then
    host_name="$1"
    break
    fi
    done
    if test -z "$host_name"; then
    echo "$1: no such host name"
    exit
    fi
    shift ;;
    *) options="$options $1"
    shift ;;
    esac
    done
    if test -n $host_name; then
    #echo ssh -x "sgeadmin$host_name" "sudo -u $SUDO_USER $util_name $options"
    ssh -x "sgeadmin$host_name" "sudo -u $SUDO_USER $util_name $options"
    else
    usage
    exit
    fi
    exit
    ^D
    bash> chmod +x /home/sgeadmin/bin/cluster_utility_starter.sh
  3. the script above is to be run by user with wrapper script:
    bash> cat > /home/sgeadmin/bin/sudo_starter.sh
    #!/bin/bash
    sudo -u sgeadmin /home/sgeadmin/bin/cluster_utility_starter.sh `basename $0` $*
    ^D
    bash> chmod +x /home/sgeadmin/bin/sudo_starter.sh
  4. the wrapper script above is to be run under user sgeadmin with sudo on user front-end host:

    bash# visudo
    ...
    # SGE users are allowed to run some utilities on batch nodes
    %sgeusers oka04 = (sgeadmin) NOPASSWD: /home/sgeadmin/bin/cluster_utility_starter.sh

    where sgeusers is a special group which allow to run specified below utilities remotely to any user assigned with the group, oka04 is a user front-end host,

  5. utility on batch node is to be run with sudo to change effective uid to uid of user initiated run on user front-end:
    bash# visudo
    ...
    # Cmnd alias specification
    Cmnd_Alias CLUSTERCOMMANDS = /usr/bin/top b, /bin/ps, /usr/bin/free,\
    /usr/bin/vmstat, /usr/bin/stat, /bin/ls,\
    /bin/cat, /usr/bin/pstree, /bin/kill,\
    /bin/cp, /bin/mv, /bin/rm, /bin/chmod,\
    /usr/bin/chattr, /bin/chgrp, /bin/grep
    ...
    # sgeadmin can run any cluster command as user
    sgeadmin ALL = (%sgeusers) NOPASSWD: CLUSTERCOMMANDS

    modified /etc/sudoers is to be copied to other batch nodes

  6. symbolic links are to be created to run wrapper script with convenient names:
    bash> cat > /home/sgeadmin/bin/installQcommands.sh
    #!/bin/sh
    #
    # Author: Alexey Filin

    # path to sudo starter
    starter="/home/sgeadmin/bin/sudo_starter.sh"

    if [ "$SGE_ROOT" = "" -o ! -d "$SGE_ROOT" ]; then
    echo "$SGE_ROOT: wrong SGE_ROOT"
    fi

    cd "$SGE_ROOT/bin/"`$SGE_ROOT/util/arch` || exit $?


    if [ ! -x "$starter" ]; then
    echo "$starter: wrong sudo starter"
    exit 1
    fi

    ln -s "$starter" Qcat && \
    ln -s "$starter" Qcp && \
    ln -s "$starter" Qfree && \
    ln -s "$starter" Qkill && \
    ln -s "$starter" Qls && \
    ln -s "$starter" Qmv && \
    ln -s "$starter" Qps && \
    ln -s "$starter" Qpstree && \
    ln -s "$starter" Qrm && \
    ln -s "$starter" Qstat && \
    ln -s "$starter" Qtop && \
    ln -s "$starter" Qvmstat && \
    ln -s "$starter" Qchmod && \
    ln -s "$starter" Qchattr && \
    ln -s "$starter" Qchgrp && \
    ln -s "$starter" Qgrep || exit $?
    ^D
    bash> chmod +x /home/sgeadmin/bin/installQcommands.sh
    bash> /home/sgeadmin/bin/installQcommands.sh

    where directory the links are created in is to be included in PATH environment vartiable of cluster users to run Q-commands as usual utilities, the requirements is fulfilled by SGE environment setup script

  7. if all the issues above have been done in right way any of Q-commands can be run by any cluster user, log in as a test cluster user and check it

Have a fun!

Powered by Plone, the Open Source Content Management System