Nautilus was decommissioned on May 1, 2015. For more information, see Nautilus Decommission FAQs.
Batch scripts can be used to run a set of commands on a systems compute partition. The batch script is a shell script containing PBS flags and commands to be interpreted by a shell. Batch scripts are submitted to the batch manager, PBS, where they are parsed. Based on the parsed data, PBS places the script in the queue as a job. Once the job makes its way through the queue, the script will be executed on the head node of the allocated resources.
Batch scripts are submitted for execution using the qsub
command. For example, the following will submit the batch script named test.pbs
:
> qsub test.pbs
If successfully submitted, a PBS job ID will be returned. This ID can be used to track the job.
- Common PBS Options
- PBS Environment Variables
- Example Batch Scripts
- Requesting GPU's
- Interactive Batch Jobs
- Monitoring Job Status
- Making Changes After Job Submission
Common PBS Options
In a batch script, these options should be in the form:
#PBS <option>
as shown in the example scripts below. Please note that all PBS option lines must be at the very beginning of the script. As soon as the first non-comment line is encountered, all other lines starting with #PBS are treated as comments.
For interactive jobs, qsub accepts these options as flags, as seen in the examples below.
Note: For job submission, the only required option is
-l ncpus=<n>
. If
you only specify ncpus then your default account will be used, walltime defaults
to 1 hour, and the memory allocation will default to 4000MB per CPU requested. See
the Job Accounting page for more information on
specifying ncpus and memory on Nautilus.
-A <account>
- Charge the job to
<account>
. To view your available accounts, typeshowusage
. -l walltime=<timespec>
- Requests a time slot of up to
<timespec>
for the job.<timespec>
can be specified as seconds (eg,walltime=3600
) or minutes:seconds (eg,walltime=5:00
) or hours:minutes:seconds (eg,walltime=2:30:00
). -l ncpus=<n>
- Requests
<n>
processor cores for job. Actual allocation and charge will be in multiples of 8--see here for more information. -l mem=<size>
- Requests a memory limit of
<size>
for job.<size>
units can beKB
,MB
orGB
(eg,-l mem=16GB
). Memory request may affect number of CPUs allocated--see here for more information. -l software=<pkg>
- Requests a license for software package
. This is only required for Matlab ( -l software=matlab
) and IDL (-l software=idl
).
Note: Multiple -l
options can be concatenated into one line. Options must be separated by a comma with no spaces (eg, -l walltime=1:00:00,ncpus=16,mem=16GB
).
-o <name>
-
Writes standard output to
<name>
instead of<job script>.o$PBS_JOBID
.$PBS_JOBID
is an environment variable created by PBS that contains the PBS job identifier. -e <name>
- Writes standard error to
<name>
instead of<job script>.e$PBS_JOBID.
-j {oe|eo}
- Combines standard output and standard error into the standard error file (
eo
) or the standard out file (oe
). -m {a,b,e}
- Sends email to the submitter or email address given with
-M
when the job {aborts,begins,ends}. You may mix options. For example, if you wanted to be alerted when the job begins and aborts, use#PBS -m ab
. -M <address>
- Specifies email address to use for
-m
options. -N <name>
- Sets the job name to
<name>
instead of default which is the name of your batch script. -S <shell>
- Sets the shell to interpret the job script.
-q <queue>
- Directs the job to the specified queue. All jobs will be submitted to the
computation
queue by default. Use this option to specify use of theanalysis
queue. -v <var>
- Exports the environment variable <var> from the submitting shell into the batch shell.
-W depend=afterok:<jobID>
- Hold job until <jobID> has finished. This should be the full job ID as output by
qsub
(eg,1234.nautilus.nics.utk.edu
).
Note: Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_ needed_envars>. You can also include module load statements in the job script.
Example:
#PBS -v PATH,LD_LIBRARY_PATH,PV_NCPUS,PV_LOGIN,PV_LOGIN_PORT
Further details and other PBS options may be found through the qsub
man page.
PBS Environment Variables
There are a few useful environment variables set within PBS jobs:
- PBS sets the environment variable
$PBS_O_WORKDIR
to the directory where the batch job was submitted. - By default, a job starts in your home directory.
- Include the following command in your script if you want it to start in the submission directory:
cd $PBS_O_WORKDIR
$PBS_O_WORKDIR
- PBS sets the environment variable
$PBS_JOBID
to the job's ID. - For example, if you wanted a unique folder for each job's output files, you may want to do something like the following:
mkdir $PBS_JOBID cd $PBS_JOBID
$PBS_JOBID
Example Batch Scripts
Example MPI job
#PBS -N hpl #PBS -S /bin/bash #PBS -j oe #PBS -l ncpus=96 #PBS -l mem=384GB #PBS -l walltime=24:00:00 cd $HOME/xd-viz/bench/hpl/hpl-2.0/bin/sgi_uv mpiexec ./xhpl
Example multi-threaded job
#PBS -N stream #PBS -S /bin/bash #PBS -j oe #PBS -l ncpus=96,mem=384GB,walltime=24:00:00 cd $HOME/xd-viz/bench/stream export OMP_NUM_THREADS=96 ./stream-f.pgi
Interactive Batch Jobs
Users are not allowed to directly run jobs directly on the login
nodes. To run an interactive job, users must use a batch-interactive PBS job. This
is achieved by using the -I
option with the qsub
command
along with other PBS options.
For interactive batch jobs, PBS options are passed through qsub
on the command line.
% qsub -I -A XXXYYY -q analysis -l ncpus=16,mem=64GB,walltime=1:00:00
The options here are:
-I
- Start an interactive session.
-A
- Charge to the “XXXYYY” project.
-q analysis
- Run in the analysis queue.
-l ncpus=16,mem=64GB,walltime=1:00:00
- Request 16 compute cores with a memory limit of 64GB for one hour.
Note: For job submission, the only required option is
-l ncpus=<n>
. If
you only specify ncpus then your default account will be used, walltime defaults
to 1 hour, and the memory allocation will default to 4000MB per CPU requested. See
the Job Accounting page for more information on
specifying ncpus and memory on Nautilus.
Once your specified number of cores are available, your interactive session will begin on a compute node. From there, you may execute commands directly instead of through a batch script. To end the interactive job, simply exit from the compute node.
Monitoring Job Status
PBS and Moab provide multiple tools to view queue, system, and job statuses. Below are the most common and useful of these tools.
showq
The Moab utility showq
gives a detailed description of the queue. The utility will display the queue in the following states:
- Active
- These jobs are currently running, listed in order of expected completion from soonest to latest.
- Eligible
- These jobs are currently queued awaiting resources, listed in order of priority from highest priority to lowest. A user is allowed five jobs in the eligible state.
- Blocked
- These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold, or the owning user currently having five jobs in the eligible state.
You can also run showq
with the options -r
, -i
or -b
to show
only active, eligible (idle) or blocked jobs respectively.
checkjob
The Moab utility checkjob
can be used to view details of a job in the queue. For example, if job 736 is a job currently in the queue in a blocked state, the following can be used to view why the job is in a blocked state:
% checkjob 736
The return may contain a line similar to the following:
BlockMsg: job 736 violates idle HARD MAXJOB limit of 1 running job
in the computation queue for userid
This line indicates the job is in the blocked state because the owning user has one job already running in the computation queue. The text after the BlockMsg header will vary depending on the reason the job is blocked.
showstart
The Moab utility showstart
gives an estimate of when the job
will start.
% showstart 100315 job 100315 requires 16384 procs for 00:40:00 Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12 Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12
Since the start time may change dramatically as new jobs with higher priority are submitted, so you may need to periodically rerun the command.
qstat
The PBS command qstat -a
checks the status of submitted jobs from PBS's perspective. Unlike showq
, qstat
does not know about scheduling, only the current status.
% qstat -a nautilus.nics.utk.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------ -------- -------- ---------- ------ ----- --- ------ ----- - ----- 123.nautilus.ni user1 analysis a.out -- 1 -- -- 00:01 Q --
The job's current status is given by S (second column from the right):
Status value | Meaning |
---|---|
E | Exiting after having run |
H | Held |
Q | Queued, eligible to run |
R | Running |
S | Suspended |
T | Being moved to new location |
W | Waiting for its execution time |
C | Recently completed (within the last 5 minutes) |
Making Changes After Job Submission
Removing a Job from the Queue
Jobs in the queue in any state can be stopped and removed from the queue using the command qdel
.
For example, to remove a job with a PBS ID of 1234, use the following command:
% qdel 1234
To remove all of your jobs from the system (in any state) use:
% qdel all
More details on the qdel
utility can be found through the qdel
man page.
Holding and Releasing Queued Jobs
Jobs in the queue in a non-running state may be placed on hold using the qhold
command. Jobs placed on hold will not be removed from the queue, but they will not be eligible for execution.
For example, to move a currently queued job with a PBS ID of 1234 to a hold state, use the following command:
% qhold 1234
Once on hold the job will not be eligible to run until it is released to return to a queued state. The qrls
command can be used to remove a job from the held state.
For example, to release job 1234 from a held state, use the following command:
% qrls 1234
More details on the qhold
and qrls
utilities can be found through their respective man pages.
Modifying Options for Jobs
Non-running (or on-hold) only jobs can be modified with the qalter
PBS command. For example, this command can be used to:
- modify the job´s name,
% qalter -N <newname> <jobid>
- modify the number of requested cores,
% qalter -l ncpus=<n> <jobid>
- or modify the job´s wall time.
% qalter -l walltime=<hh:mm:ss> <jobid>
Notes:
- Please use the
qstat -a <jobid>
command to verify the changes afterward. - Users cannot specify a new walltime for their job that exceeds the maximum walltime of the queue where your job is.
- If you need to modify a running job, please contact us. Certain alterations can only be performed by NICS operators.