The National Institute for Computational Sciences

Batch Scripts on Nautilus

Nautilus was decommissioned on May 1, 2015. For more information, see Nautilus Decommission FAQs.

Batch scripts can be used to run a set of commands on a systems compute partition. The batch script is a shell script containing PBS flags and commands to be interpreted by a shell. Batch scripts are submitted to the batch manager, PBS, where they are parsed. Based on the parsed data, PBS places the script in the queue as a job. Once the job makes its way through the queue, the script will be executed on the head node of the allocated resources.

Batch scripts are submitted for execution using the qsub command. For example, the following will submit the batch script named test.pbs:

> qsub test.pbs

If successfully submitted, a PBS job ID will be returned. This ID can be used to track the job.

Common PBS Options

In a batch script, these options should be in the form:

#PBS <option> 

as shown in the example scripts below. Please note that all PBS option lines must be at the very beginning of the script. As soon as the first non-comment line is encountered, all other lines starting with #PBS are treated as comments.

For interactive jobs, qsub accepts these options as flags, as seen in the examples below.

Note: For job submission, the only required option is -l ncpus=<n>. If you only specify ncpus then your default account will be used, walltime defaults to 1 hour, and the memory allocation will default to 4000MB per CPU requested. See the Job Accounting page for more information on specifying ncpus and memory on Nautilus.

-A <account>
Charge the job to <account>. To view your available accounts, type showusage.
-l walltime=<timespec>
Requests a time slot of up to <timespec> for the job. <timespec> can be specified as seconds (eg, walltime=3600) or minutes:seconds (eg, walltime=5:00) or hours:minutes:seconds (eg, walltime=2:30:00).
-l ncpus=<n>
Requests <n> processor cores for job. Actual allocation and charge will be in multiples of 8--see here for more information.
-l mem=<size>
Requests a memory limit of <size> for job. <size> units can be KB, MB or GB (eg, -l mem=16GB). Memory request may affect number of CPUs allocated--see here for more information.
-l software=<pkg>
Requests a license for software package . This is only required for Matlab (-l software=matlab) and IDL (-l software=idl).

Note: Multiple -l options can be concatenated into one line. Options must be separated by a comma with no spaces (eg, -l walltime=1:00:00,ncpus=16,mem=16GB).

-o <name>
Writes standard output to <name> instead of <job script>.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
-e <name>
Writes standard error to <name> instead of <job script>.e$PBS_JOBID.
-j {oe|eo}
Combines standard output and standard error into the standard error file (eo) or the standard out file (oe).
-m {a,b,e}
Sends email to the submitter or email address given with -M when the job {aborts,begins,ends}. You may mix options. For example, if you wanted to be alerted when the job begins and aborts, use #PBS -m ab.
-M <address>
Specifies email address to use for -m options.
-N <name>
Sets the job name to <name> instead of default which is the name of your batch script.
-S <shell>
Sets the shell to interpret the job script.
-q <queue>
Directs the job to the specified queue. All jobs will be submitted to the computation queue by default. Use this option to specify use of the analysis queue.
-v <var>
Exports the environment variable <var> from the submitting shell into the batch shell.
-W depend=afterok:<jobID>
Hold job until <jobID> has finished. This should be the full job ID as output by qsub (eg, 1234.nautilus.nics.utk.edu).

Note:  Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_ needed_envars>. You can also include module load statements in the job script.

Example:
#PBS -v PATH,LD_LIBRARY_PATH,PV_NCPUS,PV_LOGIN,PV_LOGIN_PORT

Further details and other PBS options may be found through the qsub man page.

PBS Environment Variables

There are a few useful environment variables set within PBS jobs:

    $PBS_O_WORKDIR
    • PBS sets the environment variable $PBS_O_WORKDIR to the directory where the batch job was submitted.
    • By default, a job starts in your home directory.
    • Include the following command in your script if you want it to start in the submission directory:
      cd $PBS_O_WORKDIR
    $PBS_JOBID
    • PBS sets the environment variable $PBS_JOBID to the job's ID.
    • For example, if you wanted a unique folder for each job's output files, you may want to do something like the following:
      mkdir $PBS_JOBID
      cd $PBS_JOBID

Example Batch Scripts

Example MPI job


#PBS -N hpl
#PBS -S /bin/bash
#PBS -j oe
#PBS -l ncpus=96
#PBS -l mem=384GB
#PBS -l walltime=24:00:00

cd $HOME/xd-viz/bench/hpl/hpl-2.0/bin/sgi_uv
mpiexec ./xhpl

Example multi-threaded job

#PBS -N stream
#PBS -S /bin/bash
#PBS -j oe
#PBS -l ncpus=96,mem=384GB,walltime=24:00:00

cd $HOME/xd-viz/bench/stream
export OMP_NUM_THREADS=96
./stream-f.pgi

Interactive Batch Jobs

Users are not allowed to directly run jobs directly on the login nodes. To run an interactive job, users must use a batch-interactive PBS job. This is achieved by using the -I option with the qsub command along with other PBS options.

For interactive batch jobs, PBS options are passed through qsub on the command line.

% qsub -I -A XXXYYY -q analysis -l ncpus=16,mem=64GB,walltime=1:00:00

The options here are:

-I
Start an interactive session.
-A
Charge to the “XXXYYY” project.
-q analysis
Run in the analysis queue.
-l ncpus=16,mem=64GB,walltime=1:00:00
Request 16 compute cores with a memory limit of 64GB for one hour.

Note: For job submission, the only required option is -l ncpus=<n>. If you only specify ncpus then your default account will be used, walltime defaults to 1 hour, and the memory allocation will default to 4000MB per CPU requested. See the Job Accounting page for more information on specifying ncpus and memory on Nautilus.

Once your specified number of cores are available, your interactive session will begin on a compute node. From there, you may execute commands directly instead of through a batch script. To end the interactive job, simply exit from the compute node.

Monitoring Job Status

PBS and Moab provide multiple tools to view queue, system, and job statuses. Below are the most common and useful of these tools.

showq

The Moab utility showq gives a detailed description of the queue. The utility will display the queue in the following states:

Active
These jobs are currently running, listed in order of expected completion from soonest to latest.
Eligible
These jobs are currently queued awaiting resources, listed in order of priority from highest priority to lowest. A user is allowed five jobs in the eligible state.
Blocked
These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold, or the owning user currently having five jobs in the eligible state.

You can also run showq with the options -r, -i or -b to show only active, eligible (idle) or blocked jobs respectively.

checkjob

The Moab utility checkjob can be used to view details of a job in the queue. For example, if job 736 is a job currently in the queue in a blocked state, the following can be used to view why the job is in a blocked state:

% checkjob 736

The return may contain a line similar to the following:

 BlockMsg: job 736 violates idle HARD MAXJOB limit of 1 running job  
in the computation queue for userid

This line indicates the job is in the blocked state because the owning user has one job already running in the computation queue. The text after the BlockMsg header will vary depending on the reason the job is blocked.

showstart

The Moab utility showstart gives an estimate of when the job will start.

% showstart 100315
job 100315 requires 16384 procs for 00:40:00

Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12
Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12

Since the start time may change dramatically as new jobs with higher priority are submitted, so you may need to periodically rerun the command.

qstat

The PBS command qstat -a checks the status of submitted jobs from PBS's perspective. Unlike showq, qstat does not know about scheduling, only the current status.

% qstat -a 

nautilus.nics.utk.edu: 
                                                           Req'd  Req'd   Elap
Job ID       Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
------------ -------- -------- ---------- ------ ----- --- ------ ----- - -----
123.nautilus.ni user1    analysis a.out         --      1  --    --  00:01 Q   -- 


The job's current status is given by S (second column from the right):

Status value Meaning
E Exiting after having run
H Held
Q Queued, eligible to run
R Running
S Suspended
T Being moved to new location
W Waiting for its execution time
C Recently completed (within the last 5 minutes)

Making Changes After Job Submission

Removing a Job from the Queue

Jobs in the queue in any state can be stopped and removed from the queue using the command qdel.

For example, to remove a job with a PBS ID of 1234, use the following command:

% qdel 1234

To remove all of your jobs from the system (in any state) use:

% qdel all

More details on the qdel utility can be found through the qdel man page.

Holding and Releasing Queued Jobs

Jobs in the queue in a non-running state may be placed on hold using the qhold command. Jobs placed on hold will not be removed from the queue, but they will not be eligible for execution.

For example, to move a currently queued job with a PBS ID of 1234 to a hold state, use the following command:

% qhold 1234

Once on hold the job will not be eligible to run until it is released to return to a queued state. The qrls command can be used to remove a job from the held state.

For example, to release job 1234 from a held state, use the following command:

% qrls 1234

More details on the qhold and qrls utilities can be found through their respective man pages.

Modifying Options for Jobs

Non-running (or on-hold) only jobs can be modified with the qalter PBS command. For example, this command can be used to:

  • modify the job´s name,
    % qalter -N <newname> <jobid>
  • modify the number of requested cores,
    % qalter -l ncpus=<n> <jobid>
  • or modify the job´s wall time.
    % qalter -l walltime=<hh:mm:ss> <jobid>

Notes:

  • Please use the qstat -a <jobid> command to verify the changes afterward.
  • Users cannot specify a new walltime for their job that exceeds the maximum walltime of the queue where your job is.
  • If you need to modify a running job, please contact us. Certain alterations can only be performed by NICS operators.