The National Institute for Computational Sciences

Running Jobs


Introduction


When you log in, you will be directed to one of the login nodes. The login nodes should only be used for basic tasks such as file editing, code compilation, and job submission. Please do not run production jobs on the login nodes. If you submit a production job on a login node, it will be administratively terminated. Instead, use the ACF’s compute resources for production jobs. In this document, you will learn how to execute, monitor, and modify jobs on ACF resources.

Job Submission


Batch Scripts

Batch scripts are used to submit jobs to the ACF. Batch scripts allow users to run non-interactive batch jobs. These jobs submit a group of commands, run them through the queue, and then output the results for review.

All non-interactive jobs must be submitted to the ACF using job scripts via the qsub command. Batch scripts are shell scripts that contain PBS flags and commands to be interpreted by the shell. The batch script is submitted to the system’s resource manager to be parsed, queued, and executed. At the time of this writing, up to 2500 non-interactive jobs may be submitted by a single user at one time.

To write a batch script, use your preferred text editor, such as vim, nano, or emacs. Once open, insert the following information into the editor. Follow this order as presented. Deviating from this pattern will produce undesirable results.

  1. First, specify the script’s interpreter. If you do not specify the interpreter, the system will use its default shell. At the time of this writing, the default shell on the ACF is bash. The syntax for the line appears in Table 2.1. Any of the interpreters listed in the figure may be used, though only one interpreter may be specified per batch script.

  2. Table 2.1 - Options for the script interpreter
    #!/bin/sh #!/bin/bash
    #!/usr/bin/csh #!/usr/bin/ksh
  3. Second, specify the necessary PBS submission options. Each option must be preceded by #PBS.

  4. Third, specify the shell commands. These commands are the executable content of the batch script. These commands must follow the final #PBS option. It is best to execute cd $PBS_O_WORKDIR as the first command in the script so that your job is executed within Lustre space. Additionally, you may use the mpirun command to specify how many MPI ranks will be used by your job. In Figure 2.2, mpirun uses 48 MPI ranks that are spread across nodes in groups of 12 with the -ppn option. The ./a.out target is the file that the job will execute. For more information on MPI and the ranking process, review some of the ACF’s presentations on its usage.

Figure 2.1 depicts a basic batch script. The #PBS options will be discussed in a later section.

#!/bin/bash
#PBS -A ACF-UTK0011
#PBS -l nodes=4,walltime=2:00:00

cd $PBS_O_WORKDIR
mpirun -n 48 -ppn=12 ./a.out
Figure 2.1 - Basic Batch Script

Altering Batch Jobs

After you submit a job, you may need to modify it. Several commands facilitate these modifications. If you need to modify a running job, please contact the ACF staff. Certain modifications can only be performed by administrators. For further information on the commands presented here, use man <commandname> on the ACF for detailed documentation on the command and the options you may use with it.

Remove a Job from the Queue

Jobs in the queue in any state can be stopped and removed from the queue using the qdel command. For example, qdel 1234 would remove the job with that identifier. Note that job identifiers can be viewed with the qstat -a command.

Hold a Queued Job

Jobs in the queue that are not running may be placed on hold using the qhold command. For example, to move the job with the identifier of 1234 into a hold state, use qhold 1234. Jobs placed on hold remain in the queue, but they will not be executed.

Release a Held Job

When you place a job on hold, it will not execute until it is released. To release a job in a held state, use the qrls command. For example, to release the job with the identifier of 1234 from a held state, use qrls 1234.

Modify Job Details

Non-running jobs or jobs in a held state can be modified with the qalter command. The various uses of this command are presented in Table 2.2. For walltime modifications, please note that you cannot specify a new walltime that exceeds the maximum walltime for the queue where your job is.

Table 2.2 - qalter options
OptionArgument(s)Purpose
qalter -N <newname> <jobid> Modifies a job's name
qalter -l nodes <numnodes> <jobid> Modifies the number of requested nodes
qalter -l walltime <hh:mm:ss> <jobid> Modifies the job's walltime
qalter -W depend type:argument <jobid> Sets dependencies for a job
qalter -W depend type <jobid> Removes dependencies from a job

To verify that the changes to your job completed successfully, use qstat -a <jobid>.

Interactive Batch Jobs

Interactive batch jobs allow users to directly manipulate compute resources. One common use for interactive batch jobs is debugging. This section demonstrates how to run interactive jobs and provides common usage tips.

Users are not allowed to run interactive jobs from login nodes. If you submit an interactive job on a login node, it will be administratively terminated. Instead, run your interactive jobs with the qsub -I command. Figure 2.2 shows the syntax used to run an interactive job. Table 2.3 explains the interactive submission options.

qsub -I -A UT-NTNL0121 -l nodes=1,walltime=1:00:00
Figure 2.2 - Syntax for an Interactive Job

Table 2.3 - Options for Interactive Jobs
OptionArgument(s)Description
-I N/A Start an interactive session
-A <account> Change to another account
-l nodes <numnodes> Request the specified number of nodes

After running this command, you must wait until enough compute nodes are available. Once the job starts, the standard input and standard output of this terminal will be linked directly to the head node of the allocated resource. The executable should then be placed on the same line after the mpirun command. Figure 2.3 provides an example of what to type when the interactive job starts.

cd /lustre/haven/$USER
mpirun -n 16 ./a.out
Figure 2.3 - Commands to Run When the Interactive Job Begins

Issuing the exit command will end the interactive job.

PBS Usage


This section gives an overview of common PBS options. Table 3.1 lists the PBS options necessary to run a job. Table 3.2 lists PBS options that might be useful to your job.

PBS Options

Table 3.1 - Necessary PBS Options
OptionUsageDescription
A #PBS -A <account> Causes the job time to be charged to <account>. The account string is typically composed of three letters followed by three digits and optionally followed by a subproject identifier. The utility showusage can be used to list your valid assigned project ID(s). This is the only option required by all jobs.
l #PBS -l nodes=<numnodes> Number of requested nodes.
  #PBS -l walltime=<time> Maximum wall-clock time. <time> is in HH:MM:SS format. The default walltime is one hour.
Table 3.2 - Useful PBS Options
OptionUsageDescription
o #PBS -o <name> Writes standard output to <name> instead of <job script>.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
e #PBS -e <name> Writes standard error to <name> instead of <job script>.e$PBS_JOBID.
j #PBS -j {oe,eo} Combines standard output and standard error into the standard error file (eo) or the standard out file (oe).
m #PBS -m a Sends email to the submitter when the job aborts.
  #PBS -m b Sends email to the submitter when the job begins.
  #PBS -m e Sends email to the submitter when the job ends.
M #PBS -M <address> Specifies email address to use for -m options.
N #PBS -N <name> Sets the job name to <name> instead of the name of the job script.
S #PBS -S <shell> Sets the shell to interpret the job script.
q #PBS -q <queue> Directs the job to the run under the specified QoS. This option is not required to run in the default QoS.
l #PBS -l feature=<feature> Select the desired node feature set.

Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_needed_envars>. You can also include module load statements in the job script. Figure 3.1 shows an example of this process.

#PBS -v PATH,LD_LIBRARY_PATH,PV_NCPUS,PV_LOGIN,PV_LOGIN_PORT
Figure 3.1 - Using PBS -v for Environment Variables

Environment Variables

This section gives an overview of useful environment variables within PBS jobs. These variables contain useful information that can simplify your batch scripts. Table 3.3 lists and describes these variables.

Table 3.3 - PBS Environment Variables
VariableDescription
$PBS_O_WORKDIR Directory from which the batch job was submitted.
$PBS_JOBID Refer to the job's identifier.
$PBS_NNODES Return the number of logical cores requested by a job.

In situations where you wish to append a job’s ID to the standard output and standard error files, the $PBS_JOBID variable is useful. For example, placing the PBS option shown in Figure 3.2 in a batch script would write the job’s output to the specified file.

#PBS -o scriptname.$PBS_JOBID
Figure 3.2 - Using $PBS_JOBID to redirect standard output to a separate file

The $PBS_NNODES variable is most useful when used with mpirun. For example, rather than manually specify the number of nodes yourself, you could instead use the command shown in Figure 3.3.

mpirun -n $($PBS_NNODES) ./a.out
Figure 3.3 - Using $PBS_NNODES with mpirun

Job Monitoring


Before and during job execution, you may wish to monitor the job’s status. The ACF features several commands that enable such monitoring. Several of these commands are listed and described below.

To view the status of your submitted jobs, use the qstat -a command. The output of the command shown in Figure 4.1 is explained in Table 4.1. Additionally, the possible statuses for a job are listed in Table 4.2.

> qstat -a 
ocoee.nics.utk.edu: 
Job ID               Username    Queue    Jobname          SessID NDS   TSK    Memory   Time     S  Time
-----------------  -----------     --------   ----------------   ------    -----   ------  ------        --------   -  --------
102903              lucio       batch    STDIN              9317    --       16       --            01:00:00 C 00:06:17
102904              lucio       batch    STDIN              9590    --       16       --            01:00:00 R      -- 
>
Figure 4.1 - Output of qstat -a

Table 4.1 - Explanation of qstat -a Output
ColumnDescription
Job ID The first column gives the PBS-assigned job identifier.
Username The second column gives the submitting user's login name.
Queue The third column gives the queue into which the job has been submitted.
Jobname The fourth column gives the PBS job name. This is specified by the PBS -N option in the PBS batch script. Or, if the -N option is not used, PBS will use the name of the batch script.
SessID The fifth column gives the associated session ID.
NDS The sixth column gives the PBS node count.
Tasks The seventh column gives the number of logical cores requested by the job's -size option.
Requested Memory The eighth column gives the job's requested memory.
Requested Time The ninth column gives the job's requested wall time.
S (Status) The tenth column gives the job's current status. See Table 4.2 for status types.
Elapsed Time The eleventh column gives the job's time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank.
Table 4.2 - Status Values for Jobs
StatusDescription
E The job has finished running and is exiting.
H The job is being held.
Q The job is queued.
R The job is running.
S The job is suspended.
T The job is being transferred to a new location.
W The job is waiting for execution.
C The job was completed within the last five minutes.

To determine the current state of your submitted jobs, use the showq utility. Table 4.3 shows the possible states of your jobs.

Table 4.3 - Possible Job States
StateDescription
Running The jobs are currently running.
Idle These jobs are currently queued awaiting to be assigned resources by the scheduler. A user is allowed five jobs in the Idle state to be considered by the scheduler.
Blocked Blocked jobs are those that are ineligible to be considered by the scheduler. Common reasons for jobs in this state are jobs that the specified resources are not available or the user or system has put a hold on the job.
BatchHold These jobs are currently in the queue but are on hold from being considered by the scheduler usually because the requested resources are not available in the system or because the resource manager has repeatedly failed in attempts to start the job.

To see the status of a specific job in the queue, use the checkjob utility. For example, using checkjob 1234 would return the status of the job with the identifier of 1234. This can be helpful to determine whether a job is blocked and the reason why.

To determine when a submitted job will start, use the showstart utility. For example, using showstart 1234 would return the estimated start time of the job with the identifier of 1234. Note that the start time is subject to dramatic change, so periodically rerun the command to get a clearer picture of when the job will start.

If you have a short job you wish to run on the ACF, you can view the current backfill with the showbf utility. Backfilling allows smaller, shorter jobs to use otherwise idle resources.

Scheduling Policy


Several factors influence the priority of a given job. The major factors are listed and described below.

  1. Jobs that request more nodes get a higher priority.

  2. Priority increases along with a job’s queue wait time. Blocked jobs are not counted because the scheduler does not see them as queued.

  3. The number of jobs submitted by a user influences the priority of those jobs. At the time of this writing, only ten jobs can be executed at a time. Single core jobs submitted by the same user are generally scheduled on the same node. Users on the same project can share nodes with written permission of the PI.

In certain cases, the priority of a job may be manually increased upon request. To request a priority change for one of your jobs, please contact user assistance. They will need the job ID and reason to submit the request.

Condos


The ACF uses a condo-based model for scheduling. In a condo-based model, nodes are grouped into logical units consisting of several compute nodes that are scheduled as an independent cluster. Condos are provided for institutional or individual investments. All faculty, staff, and students will have access to an institutional condo provided by funding from their respective institution. Individual investors will be provided exclusive access to a given number of nodes commensurate with their investment level. Investor projects will have exclusive use of the nodes in their condo.

Condos use queues, partitions, features, and quality-of-service attributes to control access to condos. In most cases, the project ID will place a job in the correct condo and no other attributes are needed.

Queues

Queues are used by the scheduler to aid in the organization of jobs. There are currently two queues: “batch” and “debug.” By default, all jobs are submitted to the 'batch' queue and users do not have to indicate that they wish to run in the batch queue.

The ACF has set aside four Rho nodes for debug jobs. Debug jobs are limited to one hour of walltime. To access the debug queue, add #PBS -q debug to your batch script for non-interactive jobs and -q debug for interactive jobs.

Partitions

Partitions are used to group similar nodes together. Nodes in the ACF are grouped into the partitions listed below. Note that the UTK Institutional Condo uses the general, beacon, and rho partitions by default while individual condos use the general partition.

  • general (consists of skylake, sigma, and sigma_bigcore nodes)
  • beacon
  • beacon_gpu
  • rho
  • monster
  • knl
  • skylake_volta

To request a partition other than the default, use the #PBS -l partition=<partition> option in your batch script for non-interactive jobs and -l partition=<partition> for interactive jobs.

Features

Features are an attribute that applies to individual nodes. Features are used to explicitly request nodes based on their feature attribute. For example, to request monster nodes, use the #PBS -l feature=monster option in your batch script. For interactive jobs, use -l feature=<feature>.

QoS (Quality of Service)

Jobs are assigned a quality of service, or QoS, attribute. Jobs are given a specific QoS based on investment type. These different types are listed below. Table 6.1 provides the limitations placed on the various investment types.

  • Condo: instructs the scheduler to place a job in an individual condo; default QoS for individual condos
  • Overflow:instructs the scheduler to place a job first on an individual condo and then overflow into the user's institutional condo
  • Campus:instructs the scheduler to place job in institutional condo; default QoS for the institutional condo

To change the default QoS, use the #PBS -l qos=<qos> in a batch script for non-interactive jobs. For interactive jobs, use -l qos=<qos>.

Table 6.1 - Investment Types and Limitations
QoSMin. SizeMax. SizeWall Clock Limit
Condo 1 Node Condo Max. 28 Days
Campus 1 Node 24 Nodes 24 Hours
Overflow 1 Node 24 Nodes 24 Hours

Job Chaining


There may be occasions where you wish to chain your job submissions. For instance, you may need to complete a full simulation without constantly resubmitting jobs. The ACF supports these operations through batch scripts. These scripts allow for longer walltimes and semi-automation of non-interactive jobs. Do note that your simulation should have checkpoints and be prepared for production runs before attempting to run it in a job chain.

Before presenting the details of job chaining, it is essential to understand the potential dangers of this process. First, if you fail to monitor your research, important discoveries could be lost. Second, submitting too many jobs to the system could cause instability on the ACF. If this were the case, your jobs would be terminated, and a submission stop would likely be placed on your account. To mitigate these issues, first run your jobs without chaining and run a short job chain to ensure all operates as you intend. It is also advisable to read this entire document and understand the job submission process before attempting to run a job chain on the ACF.

Job dependencies facilitate job chaining. The #PBS -W depend=<dependency> option in a batch script allows you to specify a dependency type, followed by the job(s) that depend on the specified type. Table 7.1 lists the dependency types available in PBS. Figure 7.1 shows a sample usage of the before dependency type.

Table 7.1 - PBS Dependency Types
DependencyDescription
after Execute current job after listed jobs have begun.
afterok Execute current job after listed jobs have terminated without error.
afternotok Execute current job after listed jobs have terminated with an error.
afterany Execute current job after listed jobs have terminated for any reason.
before Listed jobs can be run after current job begins execution.
beforeok Listed jobs can be run after current job terminates without error.
beforenotok Listed jobs can be run after current job terminates with an error.
beforeany Listed jobs can be run after current job terminates for any reason.

#PBS –W depend=before:1187723:1187724
Figure 7.1 - The "before" PBS dependency type, followed by job identifiers

Once the initial submission script has been configured to your specifications, create another batch script using your desired text editor. After you create the script, give it execution privileges with the chmod u+x <scriptname>.sh command. Two examples of these additional batch scripts appear in Figures 7.2 and 7.3.

#!/bin/bash
one=$(qsub calc1.pbs)
echo $one
two=$(qsub -W depend=afterok:$one calc2.pbs)
echo $two
Figure 7.2 - Flat chain script

In a flat chain, a sequential series of calculations are executed. Additional jobs can be added by continuing the number sequence. To execute this script, type the ./script_name command.

#!/bin/bash
one=$(qsub submit.pbs)
echo $one 
for id in seq 2 4; do 
 two=$(qsub -W depend=afterok:$one submit.pbs)
 one=$two
done
Figure 7.3 - Looped chain script

In a looped chain, a single job is resubmitted multiple times. The script in Figure 7.3 resubmits the submit.pbs job four times, given that the previous runs of that script have successfully completed. Be aware of how many jobs you submit using this method. Execution is the same for the looped chain as it is for the flat chain.