Tutorial for OKA cluster users
environment setup, usage policy
Sun Grid Engine - a facility for executing UNIX jobs on remote machines
In accordance with SGE site:
"The Grid Engine project is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems and hosted by CollabNet, the Grid Engine project provides enabling distributed resource management software for wide ranging requirements from compute farms to grid computing."As of local installation the SGE is a batch monitor developed and released in open sources under license SISSL by SUN. It features standard batch monitor facilities and user utilities.
Environment Setup
For bash/zsh login shell add to $HOME/.profile:
> export SGE_ROOT=/okas/sgeadmin/sge/rootFor csh/tcsh login shell add to $HOME/.login:
> . $SGE_ROOT/OKA/common/settings.sh
> setenv SGE_ROOT /okas/sgeadmin/sge/root
> source $SGE_ROOT/OKA/common/settings.csh
Example
If you aren't familiar with batch submission systems then it's highly recommended to read man of sge_intro at first:
> man sge_introBasic commands provided by SGE are:
qsub | submit a batch job to Grid Engine |
qstat | show the status of Grid Engine jobs and queues |
qdel | delete Grid Engine jobs from queues |
qmon | GUI front-end to user's and administrator's utilities |
"qsub submits batch jobs to the Grid Engine queuing system. Grid EnginePlace and run executable only from your file server directory /okas/<your directory>, user home directories are not visible on batch nodes in current cluster configuration:
supports single- and multiple-node jobs. Command can be a path to a
binary or a script (see -b option) which contains the commands to be run
by the job using a shell (for example, sh(1) or csh(1)). Arguments to
the command are given as command_args to qsub . If command is handled
as a script then it is possible to embed flags in the script. If the
first two characters of a script line either match `#$' or are equal
the prefix string defined with the -C option described below, the line
is parsed for embedded command flags (man qsub for more info):> cat > test_scripts.csh
#!/bin/csh
# Which account to be charged cpu time
#$ -A santa_claus
# date-time to run, format [[CC]yy]MMDDhhmm[.SS]
#$ -a 12241200
# set memory and job CPU time limits to 128MB and 5 hours respectively,
# man queue_conf ("RESOURCE LIMITS" section) to list all available
# parameters
#$ -l h_vmem=128,h_cpu=5:0:0
# If I run on dec_x put stderr in /tmp/foo, if I
# run on sun_y, put stderr in /usr/me/foo (-o for stdout, by default
# stderr and stdout are put into home dir of user on execution host)
#$ -e dec_x:/tmp/foo,sun_y:/usr/me/foo
# Send mail to these users
#$ -M santa@heaven,claus@heaven
# Mail at beginning/end/on suspension
#$ -m bes
# Export these environmental variables
#$ -v PVM_ROOT,FOOBAR=BAR
# to export all environmental variables use `-V' option
a.out
^D
> qsub test_script.csh
> cd /okas/filin/test
> qsub test_script.csh
Q-commands
By users requests some extra commands were created to monitor system and processes on batch nodes. Names of the commands start with character Q. Here is list of the commands:
Qcat, Qchattr, Qchgrp, Qchmod, Qcp, Qfree, Qgrep, Qkill, Qls,The commands are equivalents of common used UNIX utilities and reproduce their behaviour and options. Use -h option to get help, -l option to get list of available batch nodes. It is safe to use them because all commands are run under uid and gid of user started the commands, so a user can damage only its own jobs and their environment. Examples of usage:
Qmv, Qps, Qpstree, Qrm, Qstat, Qtop (in batch mode), Qvmstat
> Qps afx @okaf003
> Qtop -p 26971 @okaf001If you have any comments or proposals to extend the list of commands contact cluster administrator.
Usage Policy
There are two common queues on the cluster:
short | high priority, CPU time limit 3 hours (job in the queue suspends job in `long' and personal queues on the same node), nice 2, 1 job slot per host. The queue is to be used for debugging and short-time tests. Users using the queue for other purposes will be removed from list of cluster users |
long | low priority, there is no CPU time limit, nice 15, 1 job slot per host |
ioucht | nice 12 | 1 job slot per host |
roma | nice 15 | 1 job slot per host |
kolosov | nice 15 | 1 job slot per host |
kdatsko | nice 15 | 1 job slot per host |
tchikil | nice 15 | 1 job slot per host |
polarush | nice 15 | 1 job slot per host |
slava | nice 15 | 1 job slot per host |
- each node can run up to three jobs simultaneously,
- long queue has one job slot on every node;
- each user has a personal queue,
- each personal queue has one slot on each node,
- the number of slots on each node is equal the number of users;
- every user can use all nodes simultaneously, e.g. can load whole cluster, so no one node will wait if only one user want to use cluster,
- every user can run the number of jobs he/she needs at any time at least not less than number of nodes;
( 512Mb RAM + 1.5Gb swap ) / 3 = 667Mb667Mb is the max memory available for one job.
So all queues have hard limits:
- 600Mb virtual memory limit,
- 150Mb RSS (Real Segment Size - maximum size of process memory kept in RAM simultaneously, the rest is swapped out);
- 2.2 load threshold prohibits running more three jobs on each node simultaneously to guarantee hard memory limits.
> cd /okas/filin/testor with batch node name pointed explicitly:
> qsub -q long loop.sh
> cd /okas/filin/testSchedule interval is set to 5 seconds so submission is performed job by job every 5 seconds. Decay time is set to 3 minutes.
> qsub -q long@okaf002 loop.sh
Each job is provided by a local temporal directory with unique name being passed by TMP (is equal to TMPDIR) environment variable. Overall size of files placed in the directory by job can't exceed 9.5GB.
By default job working directory is set to directory the job is started from, so STDOUT and STDERR are saved to files in the directory.
Use qstat to list info about jobs, queues:
> qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
61 0.56000 loop.sh filin r 12/21/2004 23:10:50 long@okaf002.ihep.su 1
> qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
long@okaf001.ihep.su BIP 0/3 0.00 lx24-x86
----------------------------------------------------------------------------
long@okaf002.ihep.su BIP 0/3 0.00 lx24-x86
----------------------------------------------------------------------------
long@okaf003.ihep.su BIP 0/3 0.00 lx24-x86
----------------------------------------------------------------------------
short@okaf001.ihep.su BIP 0/1 0.00 lx24-x86
----------------------------------------------------------------------------
short@okaf002.ihep.su BIP 0/1 0.00 lx24-x86
----------------------------------------------------------------------------
short@okaf003.ihep.su BIP 0/1 0.00 lx24-x86
...
> qstat -g c
CLUSTER QUEUE CQLOAD USED AVAIL TOTAL aoACDS cdsuE
-------------------------------------------------------------------------------
long 0.00 0 9 9 0 0
short 0.00 0 3 3 0 0
Documentation
In addition to SGE manual:
> man sge_intro
there is User's Guide in PDF:
> gv $SGE_ROOT/../../docs/UsersGuide.pdfIn wich they state that the job status is one of:
- d(eletion),
- t(ransfering),
- r(unning),
- R(estarted),
- s(uspended),
- S(uspended),
- T(hreshold),
- w(aiting),
- h(old);
"The state d(eletion) indicates that a qdel(1) has been used to ini-Queue state is one of or combinations thereof:
tiate job deletion. The states t(ransfering) and r(unning) indicate
that a job is about to be executed or is already executing, whereas
the states s(uspended), S(uspended) and T(hreshold) show that an
already running jobs has been suspended. The s(uspended) state is
caused by suspending the job via the qmod(1) command, the
S(uspended) state indicates that the queue containing the job is
suspended and therefore the job is also suspended and the T(hresh-
old) state shows that at least one suspend threshold of the corre-
sponding queue was exceeded (see queue_conf(5)) and that the job has
been suspended as a consequence. The state R(estarted) indicates
that the job was restarted. This can be caused by a job migration or
because of one of the reasons described in the -r section of the
qsub(1) command.
The states w(aiting) and h(old) only appear for pending jobs. The
h(old) state indicates that a job currently is not eligible for exe-
cution due to a hold state assigned to it via qhold(1), qalter(1) or
the qsub(1) -h option or that the job is waiting for completion of
the jobs to which job dependencies have been assigned to the job via
the -hold_jid option of qsub(1) or qalter(1)."
- u(nknown) if the corresponding sge_execd(8) cannot be contacted,
- a(larm),
- A(larm),
- C(alendar suspended),
- s(uspended),
- S(ubordinate),
- d(isabled),
- D(isabled),
- E(rror);
- r(unning),
- R(estarted),
- s(uspended),
- S(uspended),
- T(hreshold),
- w(aiting),
- h(old),
- x(exited);