The National Institute for Computational Sciences

Running jobs

  Running Jobs

General Information


When you log into Beacon, you will be directed to one of the login nodes. The login nodes should only be used for basic tasks such as file editing, code compilation, and job submission.

The login nodes should not be used to run production jobs. Production work should be performed on the system's compute resources. Serial jobs (pre- and post-processing, etc.) may be run on the compute nodes. For one or more single-processor jobs please refer to the Job Execution section for more information. Access to compute resources is managed by Torque (a PBS like system). Job scheduling is handled by Moab, which interacts with Torque and system software.

This page provides information for getting started with the batch facilities of Torque with Moab as well as basic job execution. Sometimes you may want to chain your submissions to complete a full simulation without the need to resubmit, you can read about this here .

Batch Scripts


Batch scripts can be used to run a set of commands on a system's compute partition. Batch scripts allow users to run non-interactive batch jobs, which are useful for submitting a group of commands, allowing them to run through the queue, and then viewing the results. However It is sometimes useful to run a job interactively (primarily for debugging purposes). Please refer to the Interactive Batch Jobs section for more information on how to run batch jobs interactively.

All non-interactive jobs must be submitted on Beacon using job scripts via the qsub command. The batch script is a shell script containing PBS flags and commands to be interpreted by a shell. The batch script is submitted to the resource manager, Torque, where it is parsed. Based on the parsed data, Torque places the script in the queue as a job. Once the job makes its way through the queue, the script will be executed on the head node of the allocated resources.

All job scripts start with an interpreter line, followed by a series of #PBS declarations that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable.

Batch scripts are divided into the following three sections:

  1. Shell interpreter (one line)
    • The first line of a script can be used to specify the script's interpreter.
    • This line is optional.
    • If not used, the submitter's default shell will be used.
    • The line uses the syntax #!/path/to/shell, where the path to the shell may be
      • /usr/bin/csh
      • /usr/bin/ksh
      • /bin/bash
      • /bin/sh
  2. PBS submission options
    • The PBS submission options are preceded by #PBS, making them appear as comments to a shell.
    • PBS will look for #PBS options in a batch script from the script's first line through the first non-comment line. A comment line begins with #.
    • #PBS options entered after the first non-comment line will not be read by PBS.
  3. Shell commands
    • The shell commands follow the last #PBS option and represent the executable content of the batch job.
    • If any #PBS lines follow executable statements, they will be treated as comments only. The exception to this rule is shell specification on the first line of the script.
    • The execution section of a script will be interpreted by a shell and can contain multiple lines of executables, shell commands, and comments.
    • During normal execution, the batch script will end and exit the queue after the last line of the script.

The following example shows a typical job script that includes the minimal requirements to submit a parallel job that executes ./a.out on 96 cores, charged to the fictitious account UT-NTNL0121 with a wall clock limit of one hour and 35 minutes:

#PBS -S /bin/bash
#PBS -A UT-NTNL0121
#PBS -l nodes=6,walltime=01:35:00

cd $PBS_O_WORKDIR
mpirun -n 96 ./a.out

Jobs should be submitted from within a directory in the Lustre file system. It is best to always execute cd $PBS_O_WORKDIR as the first command. Please refer to the PBS Environment Variables section for further details.

Documentation that describes PBS options can be used for more complex job scripts.

Unless otherwise specified, your default shell interpreter will be used to execute shell commands in job scripts. If the job script should use a different interpreter, then specify the correct interpreter using:

 #PBS -S /bin/XXXX

Altering Batch Jobs


This section shows how to remove or alter batch jobs.

Remove Batch Job from the Queue

Jobs in the queue in any state can be stopped and removed from the queue using the command qdel.

For example, to remove a job with a PBS ID of 1234, use the following command:

> qdel 1234

More details on the qdel utility can be found on the qdel man page.

Hold Queued Job

Jobs in the queue in a non-running state may be placed on hold using the qhold command. Jobs placed on hold will not be removed from the queue, but they will not be eligible for execution.

For example, to move a currently queued job with a PBS ID of 1234 to a hold state, use the following command:

> qhold 1234

More details on the qhold utility can be found on the qhold man page.

Release Held Job

Once on hold the job will not be eligible to run until it is released to return to a queued state. The qrls command can be used to remove a job from the held state.

For example, to release job 1234 from a held state, use the following command:

> qrls 1234

More details on the qrls utility can be found on the qrls man page.

Modify Job Details

Non-running (or on hold) jobs can only be modified with the qalter PBS command. For example, this command can be used to:

Modify the job´s name,

$ qalter -N <newname> <jobid>

Modify the number of requested nodes,

$ qalter -l nodes=<NumNodes> <jobid>

Modify the job´s wall time

$ qalter -l walltime=<hh:mm:ss> <jobid>

Set job´s dependencies

$ qalter -W  depend=type:argument <jobid>

Remove a job´s dependency (omit :argument):

$ qalter -W  depend=type <jobid>

Notes:

  • Use qstat -f <jobid> to gather all the information about a job, including job dependencies.
  • Use qstat -a <jobid> to verify the changes afterward.
  • Users cannot specify a new walltime for their job that exceeds the maximum walltime of the queue where your job is.
  • If you need to modify a running job, please contact us. Certain alterations can only be performed by NICS operators.

Interactive Batch Jobs


Interactive batch jobs give users interactive access to compute resources. A common use for interactive batch jobs is debugging. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.

Users are not allowed to run interactive jobs from login nodes. Running a batch-interactive PBS job is done by using the -I option with qsub. After the interactive job starts, the user should run the computationally intense applications on the lustre scratch space, and place the executable after the mpirun command.

Interactive Batch Example

For interactive batch jobs, PBS options are passed through qsub on the command line. Refer to the following example:

qsub -I -A UT-NTNL0121 -l nodes=1,walltime=1:00:00

Option

Description

-I Start an interactive session
-A Charge to the “UT-NTNL0121” project
-l nodes=1,walltime=1:00:00 Request 1 physical compute node (16 cores) for one hour

After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the standard input and standard output of this terminal will be linked directly to the head node of our allocated resource. The executable should be placed on the same line after the mpirun command, just like it is in the batch script.


> cd /lustre/medusa/$USER > mpirun -n 16 ./a.out

Issuing the exit command will end the interactive job.

Common PBS Options


This section gives a quick overview of common PBS options.

Necessary PBS options

Option

Use

Description

A #PBS -A <account> Causes the job time to be charged to <account>. The account string is typically composed of three letters followed by three digits and optionally followed by a subproject identifier. The utility showusage can be used to list your valid assigned project ID(s). This is the only option required by all jobs.
l #PBS -l nodes=<nodes> Number of requested nodes.
  #PBS -l walltime=<time> Maximum wall-clock time. <time> is in the format HH:MM:SS. Default is 1 hour.

Other PBS Options

Option

Use

Description

o #PBS -o <name> Writes standard output to <name> instead of <job script>.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
e #PBS -e <name> Writes standard error to <name> instead of <job script>.e$PBS_JOBID.
j #PBS -j {oe,eo} Combines standard output and standard error into the standard error file (eo) or the standard out file (oe).
m #PBS -m a Sends email to the submitter when the job aborts.
  #PBS -m b Sends email to the submitter when the job begins.
  #PBS -m e Sends email to the submitter when the job ends.
M #PBS -M <address> Specifies email address to use for -m options.
N #PBS -N <name> Sets the job name to <name> instead of the name of the job script.
S #PBS -S <shell> Sets the shell to interpret the job script.
q #PBS -q <queue> Directs the job to the specified queue.This option is not required to run in the general production queue.

Note:  Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_ needed_envars>. You can also include module load statements in the job script.

Example:

#PBS -v PATH,LD_LIBRARY_PATH,PV_NCPUS,PV_LOGIN,PV_LOGIN_PORT

Further details and other PBS options may be found using the man qsub command.

PBS Environment Variables


This section gives a quick overview of useful environment variable sets within PBS jobs.

  • PBS_O_WORKDIR
    • PBS sets the environment variable PBS_O_WORKDIR to the directory from which the batch job was submitted.
    • By default, a job starts in your home directory. Often, you would want to do cd $PBS_O_WORKDIR to move back to the directory you were in. The current working directory when you start mpirun should be on Lustre Space.

Include the following command in your script if you want it to start in the submission directory:

cd $PBS_O_WORKDIR
  • PBS_JOBID
    • PBS sets the environment variable PBS_JOBID to the job's ID.
    • A common use for PBS_JOBID is to append the job's ID to the standard output and error file(s).

Include the following command in your script to append the job's ID to the standard output and error file(s)

#PBS -o scriptname.o$PBS_JOBID
  • PBS_NNODES
    • PBS sets the environment variable PBS_NNODES to the number of logical cores requested (not nodes). Given that Beacon has 16 physical cores per node, the number of nodes would be given by $PBS_NNODES/16.
    • For example, a standard MPI program is generally started with mpirun -n $($PBS_NNODES) ./a.out. See the Job Execution section for more details.

Back to Contents

Monitoring Job Status


This page lists some ways to monitor jobs in the batch queue. PBS and Moab provide multiple tools to view the queues, system, and job status. Below are the most common and useful ones of these tools.

qstat

Use qstat -a to check the status of submitted jobs.


> qstat -a 
ocoee.nics.utk.edu: 
Job ID               Username    Queue    Jobname          SessID NDS   TSK    Memory   Time     S  Time
-----------------  -----------     --------   ----------------   ------    -----   ------  ------        --------   -  --------
102903              lucio       batch    STDIN              9317    --       16       --            01:00:00 C 00:06:17
102904              lucio       batch    STDIN              9590    --       16       --            01:00:00 R      -- 
>

The qstat output shows the following:

Job ID The first column gives the PBS-assigned job ID.
Username The second column gives the submitting user's login name.
Queue The third column gives the queue into which the job has been submitted.
Jobname The fourth column gives the PBS job name. This is specified by the PBS -n option in the PBS batch script. Or, if the -n option is not used, PBS will use the name of the batch script.
SessID The fifth column gives the associated session ID.
NDS The sixth column gives the PBS node count. Not accurate; will be one.
Tasks The seventh column gives the number of logical cores requested by the job's -size option.
Req’d Memory The eighth column gives the job's requested memory.
Req’d Time The ninth column gives the job's requested wall time.
S The tenth column gives the job's current status. See the status listings below.
Elap Time The eleventh column gives the job's time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank.

The job's current status is reported by the qstat command. The possible values are listed in the table below.

Status value

Meaning

E Exiting after having run
H Held
Q Queued
R Running
S Suspended
T Being moved to new location
W Waiting for its execution time
C Recently completed (within the last 5 minutes)

showq

The Moab showq utility gives a different view of jobs in the queue. The utility will show jobs in the following states:

Active These jobs are currently running.
Eligible These jobs are currently queued awaiting resources. A user is allowed five jobs in the eligible state.
Blocked These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold and the owner currently having five jobs in the eligible state.

checkjob

The Moab checkjob utility can be used to view details of a job in the queue. For example, if job 736 is currently in a blocked state, the following can be used to view the reason:

> checkjob 736

The return may contain a line similar to the following:

BLOCK MSG: job 736 violates idle HARD MAXIJOB limit of 5 for user <your_username>  partition ALL (Req: 1  InUse: 5) 

This line indicates the job is in the blocked state because the owning user has reached the limit of five jobs currently in the eligible state.

showstart

The Moab showstart utility gives an estimate of when the job will start.

> showstart 100315
job 100315 requires 16384 procs for 00:40:00

Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12
Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12

The start time may change dramatically as new jobs with higher priority are submitted, so you need to periodically rerun the command.

showbf

The Moab showbf utility gives the current backfill. This can help you create a job which can be backfilled immediately. As such, it is primarily useful for short jobs.

Scheduling Policy


Beacon uses TORQUE and Moab to schedule jobs. NICS is constantly reviewing the scheduling policies in order to adapt and better serve users.

The scheduler gives preference to large core count jobs. Moab is configured to do “first fit” backfill. Backfilling allows smaller, shorter jobs to use otherwise idle resources.

Users can alter certain attributes of queued jobs until they start running. The order in which jobs are run depends on the following factors:

  • number of cores requested - jobs that request more cores get a higher priority.
  • queue wait time - a job's priority increases as the time it waits to run.
  • account balance - jobs that use an account with a negative balance will have significantly lowered priority.
  • number of jobs - a maximum of five jobs per user, at a time, will be eligible to run. The rest will be blocked.

In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact NICS User Support. NICS will need the job ID and reason to submit the request.

More detailed information can be found in the Queues section.

Queues


Queues are used by the batch scheduler to aid in the organization of jobs. This section lists the available queues on Beacon. An individual user may have up to 5 jobs eligible to start at any one time (regardless of how many jobs may already be running), while a project may have a total of 10 jobs eligible to run across all the users charging against that project. Jobs in excess of these limits will not be considered for execution. Additionally, users are limited to 25 simultaneous running jobs and projects are limited to 40 simultaneous running jobs.

For example, if you submit 12 jobs, 5 would be eligible, and 7 would be blocked (with an "Idle" state). If three of the jobs run, some blocked jobs will be released so that there are still 5 eligible jobs, and 4 blocked jobs. This continues until all jobs are run. This is done to make it easier to schedule the jobs (there are fewer jobs to consider), and to prevent a single user from dominating the system with many small jobs.

Job priority on Beacon is based on the number of cores and wall clock time requested. Jobs with large core counts intentionally get the highest priority. Jobs with smaller core counts do run effectively on Beacon as backfill. While the scheduler is collecting nodes for larger jobs, those with short wall clock limits and small core counts may use those nodes temporarily without delaying the start time of the larger job.


Capability jobs on Beacon

Users are encouraged to submit capability jobs on Beacon at any time. Capability jobs are those jobs requesting 17 or more nodes. However, capability jobs are only executed at specific times at the discretion of NICS. Capability jobs are currently run weekly from Friday 8AM until Monday at 8AM.


Beacon Queues

Jobs on Beacon are sorted into queues based on size and walltime.

Beacon Queue

Min Size

Max Size

Max Wall Clock Limit

batch 1 16 24:00:00
capability 17 44 24:00:00

Job Execution


Once the access to compute resources has been allocated through the batch system, users have the ability to execute jobs on the allocated resources. This section gives examples of job execution and provides common tips.

The PBS script is executed on the service node (or login node for interactive jobs). All execution calls made directly to programs(eg ./a.out), they will be executed on the service node. This may be useful for records keeping, staging data, etc. Any memory- or computationally-intensive programs should be run using mpirun, otherwise it bogs down the node, and may cause system problems. You may run non-MPI programs on a compute node using mpirun, see the Single-Processor (Serial) jobs and Multiple Single-Processor Programs sections below.

To launch parallel jobs on one or more compute nodes, use the mpirun command. System specifications for Beacon should be kept in mind when running a job using mpirun. A Beacon node consists of two sockets, each with 8 physical cores, so there are 16 physical cores per node.

The following options are commonly used with mpirun:

Commonly used options for mpirun

-n Total number of MPI processes (default: 1)
-N Number of MPI processes per node (1-16)
-S Number of MPI processes per socket (1-8)
-d Specifies number of cores per MPI process (for use with OpenMP, 1-16)
-j 2 Turns on Hyper-threading on the processor (by default is always off) and indicates to allocate two processes per physical core

MPI/OpenMP

Beacon supports threaded programming within a node. The mpirun -d flag is used to specify the number of cores per MPI process, so with OpenMP, mpirun -d $OMP_NUM_THREADS uses one thread per core. When using every core, this would require at least n*d cores to be requested, the following examples assume that three nodes have been requested – #PBS -l nodes=24.

export OMP_NUM_THREADS=2
mpirun -n24 -N8 -S4 -d2 ./a.out

Here, each MPI process has two OpenMP threads, filling three whole nodes. For some codes, two OpenMP threads per MPI process may be optimal. If the reason for using OpenMP is instead to increase the available memory, you may want to use 8 or even 16 threads per MPI process instead, though there is some performance penalty for using OpenMP across sockets in XC30's current configuration (using QPI QuickPath Interconnect).

export OMP_NUM_THREADS=7
mpirun -n6 -N2 -S1 -d7 ./a.out

The -d flag specifies the depth, or number of cores to assign to each MPI process (when the MPI process spawns an OpenMP thread, it has a dedicated core to put it on). The -S option causes the second process to be put all on the second socket, rather than filling out the first socket first.

Single-Processor (Serial) Jobs

Serial programs which are memory or computationally intensive should never be run on the service nodes (anything outside of mpirun). Service nodes have limited resources shared by all users, and when they run out, system problems may take place. To run serial programs on the compute nodes, the program must be compiled with the compiler wrappers (cc, CC or ftn). You would then request one node (16 cores) with PBS (#PBS -l nodes=2). Use the following line to run a serial executable on a compute node:

mpirun -n 1 ./a.out 

Running Multiple Single-Processor Programs

If you need to run many instances of a serial code (as in a typical parameter sweep study for instance), we highly recommend using Eden. Eden is a simple script-based master-worker framework for running multiple serial jobs within a single PBS job. Detailed instructions for using Eden are found here.

Job Accounting


Projects are charged based on usage of compute resources. This section gives details on how each job’s usage is calculated. PBS allocates cores to batch jobs in units of the number of cores available per node. A node cannot be allocated to multiple jobs, so a job is charged for the entire node whether or not it uses all its cores.

Getting Accounting Information

This section illustrates the usage of two commonly  used utilities for obtaining accounting information.

showusage

The showusage utility can be used to view your project allocation and overall usage through the last job accounting posting (usually the previous night).

glsjob

More detailed accounting information can be obtained using the glsjob command:

glsjob -u <username>
Prints current accounting information for a particular user.
glsjob -J <jobid>.xt5
Can be used to find information for a particular job.
glsjob -p <project>
Prints current accounting information for all jobs charged to a particular project account.
glsjob --man
Displays documentation for glsjob

Note: The user can grep the particular information from the output as they need.

On Beacon the service unit charge for each job is:

16 x walltime x number of nodes

where walltime is the number of wall clock hours used by the job.

Negbal Jobs

If a project consumes its allocation, users can still submit jobs. However, they will be in negbal status (aka opportunistic mode) meaning they have negative priority and will only run in back fill mode. Jobs with priority will be scheduled and executed before negbal jobs.

  • Any secure communication with a MIC requires unique ssh keys that are automatically generated once the scheduler assigns compute nodes
  • Custom scripts have been created to use these ssh keys, which prevent prompts asking using users for passwords

Traditional Command Custom Beacon Script
ssh micssh
scp micscp
mpirun/mpiexec micmpiexec

Job schedulers

Beacon uses the MOAB workload manager and Torque resource manager for scheduling jobs. Official scheduler commands for MOAB can be found here and information for submitting and managing jobs with Torque can be found here.

Jobs can be submitted to the queue via the qsub command. Both batch and interactive sessions are available. Batch mode is the typical method to submit production simulations. If one is not certain on how to construct a proper job executable, it is beneficial to use the interactive queue.

Also, Ganglia CPU metrics have been enabled on the Xeon Phis. We have performed extensive benchmarking that indicates that any associated performance penalty in your applications should be negligible. However, if you notice this background process becoming a problem, you can disable Ganglia monitoring altogether by specifying "-l gres=noganglia" in your batch script or in your qsub command line. Please let help@nics.utk.edu know if you have any further questions.

Interactive submission

For interactive jobs, PBS options are passed through qsub on the command line.

qsub -I -A XXXYYY -l nodes=3,walltime=1:00:00

Options:

  • -I : Start an interactive session
  • -A : Charge to the "XXXYYY" project

Putting it together,"-l nodes=3,walltime=1:00:00" will request 3 compute nodes for one hour.

After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the std input and std output of this terminal will be linked directly to the head node of our allocated resource. Issuing the exit command will end the interactive job. From here commands may be executed directly instead of through a batch script.

If you are desiring to run a native OpenMP application on the Xeon Phi in interactive mode, you may:

  1. run with micmpiexec -n 1 -env OMP_NUM_THREADS=N
  2. micssh into the Xeon Phi and run from prompt
  3. micssh mic0 env LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH executable
  4. Change mic0 to the mic you will be working with.. mic0, mic1, mic2, or mic3

Accessing the Beacon node with 4x Xeon Phi 7120P

To access this node, please use the following syntax in your submission script: -l nodes=1:mic7120

Accessing the Beacon node with 4x NVIDIA Tesla K20Xm GPUs

To access this node, please use the following syntax in your submission script: -l nodes=1:nvidia:gpus=4

Using Intel's VTune Performance Analysis in Interactive Mode

Using Intel's VTune for performance optimization and profiling is a useful tool, and it is available as part of the Intel Cluster Studio XE suite on Beacon available to all users. We suggest using the following command to run from the command line without the GUI for purposes of X11 display lag reduction:

module load vtune

amplxe-cl -collect -app-working-dir --

Finally, you can open the GUI with "module load vtune" and then the command "amplxe-gui", load in the sampling files from the command line run, and begin exploring!

NOTE: YOU CANNOT RUN SAMPLING WITH INTEL TUNE ON THE LOGIN NODE. PLEASE SUBMIT AN INTERACTIVE JOB WITH QSUB -I -X TO USE INTEL VTUNE.

Please click here for an example on how to use VTUNE

Batch submission

All non-interactive jobs must be submitted on Beacon by a job script via the qsub command. All job scripts start with a series of #PBS directives that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable: the micmpiexec command is used to launch one or more parallel executables on the compute nodes and/or coprocessors.

The following example shows a typical job script that submits a parallel job that executes ./a.out on 2 compute nodes, charged to the fictitious account UT-AACE-TEST with a wall clock limit of one hour and 15 minutes:

#!/bin/bash
#PBS -A UT-AACE-TEST
#PBS -l nodes=2,walltime=01:15:00
cd $PBS_O_WORKDIR
micmpiexec -n 2 ./a.out

If you are desiring to run a native OpenMP program on the Xeon Phi in batch mode, you may use:

  1. micmpiexec -n 1 -env OMP_NUM_THREADS=N
  2. Use the following script within a script construct:

  3. #!/bin/bash
    .
    .
    .
    micssh $(hostname)-mic0 $TMPDIR/test.sh

    Where test.sh is:
    #!/bin/sh
    source /etc/profile
    export LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH
    executable

#2 is important because a simple micssh will not automatically pass the OpenMP environment and any other environment variables to the card without passing excessive information.

Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_needed_envars>. You can also include "module load" statements in the job script.

Running Jobs and Copying Files to the Xeon Phi Cards

After compiling source code, request a compute node to run the executable.

Once connected to a compute node, offload mode executables can be run directly. Native mode executables require manual copying of libraries, binaries, and input data to either the SSD scratch space
  • cp native_program.MIC $TMPDIR/mic0
  • cp necessary_library.so $TMPDIR/mic0/lib or to a folder in your lustre scratch space
  • mkdir/lustre/medusa/$USER/folder_name
  • cp native_program.MIC/lustre/medusa/$USER/folder_name
  • mkdir/lustre/medusa/$USER/folder_name/lib
  • cp necessary_library.so /lustre/medusa/$USER/folder_name/lib

Once files are copied over, direct access to a Xeon Phi is available through the micssh command

  • micssh beacon#-mic0

To see the files that were copied to the local SSD scratch space, you will have to change directory to TMPDIR.

  • cd $TMPDIR
  • ls

If native mode libraries were copied to the Lustre scratch space, then LD_LIBRARY_PATH needs to be modified accordingly.

  • export LD_LIBRARY_PATH=/LUSTRE/MEDUSA/$USER/folder_name/lib:$LD_LIBRARY_PATH
  • After the native mode application is run, type exit to return back to the compute node host. Output files located on the local SSD scratch space can then copied from $TMPDIR/mic#, where # is the Xeon Phi that was used, to the user's home directory or to the Lustre scratch space. Files not copied from the local SSD scratch space will be lost once the interactive session is over.

    If you are planning to run MPI on MICs on multiple nodes with the local SSD scratch space, you also need to copy files to the MICs you plan to use located on the other assigned compute nodes.

    This can be done manually by first determining which nodes you have been assigned using cat $PBS_NODEFILE.

    Then, for each assigned node, copying the necessary files using micssh or micscp:

    • micssh beacon# cp absolute_path/file_to_copy $TMPDIR/mic#
    • or
    • micscp absolute_path/file_to_copy beacon#:$TMPDIR/mic#
    • Instead of doing this manually, the custom allmicput script can be used

      Allmicput

      The allmicput script can easily copy files to $TMPDIR on all assigned MICs

      Usage: allmicput [[-t] FILE...] [-l LIBRARY...] [-x BINARY...] [-d DIR FILE...]

      Copy listed files to the corresponding directory on every MIC card in the current PBS job.

    • [-t] FILE... the specified file(s) are copied to $TMPDIR on each mic
    • -T LISTFILE the files in LISTFILE are copied to $TMPDIR on each mic
    • -l LIBRARY... the specified file(s) are copied to $TMPDIR/lib on each mic
    • -L LISTFILE the files in LISTFILE are copied to $TMPDIR/lib on each mic
    • -x BINARY... the specified file(s) are copied to $TMPDIR/bin on each mic
    • -X LISTFILE the files in LISTFILE are copied to $TMPDIR/bin on each mic
    • -d DIR FILE... the specified file(s) are copied to $TMPDIR/DIR on each mic
    • -D DIR LISTFILE the files in LISTFILE are copied to $TMPDIR/DIR on each mic
    • Back to Contents

      Current Intel recommended approach for file I/O from within an offload section


      • Files can be read from the NFS mount of /global, but only in read only mode

      • Files can be directly read from and written to the MIC's internal memory

      • To transfer a file directly to a MIC's internal memory use micscp file beaconXXX-micX:/User or /tmp

      • Be sure to reset the permissions on the file such that 'other' has read/write permissions (all I/O in offload region is executed as 'micuser')

      • Use the absolute path as the argument to fopen()

      • Remember to copy any output files off of the MICs before exiting a job

      • Files can also be read from and written to /lustre/medusa/$USER, and be sure to reset file permission (o+rw) as above and directory permission (o+x)

      • $HOME is not mounted on the MICs


      Accessing the MICSMC software

      The micsmc software, used for calculating CPU/Threading usage on thePhi/MIC cards is installed on Beacon and can only be used on the compute nodes after forwarding X. In order to do this, you must first connect to Beacon using ssh with X forwarding enabled:

ssh -X username@beacon.nics.utk.edu

Next, enable X forwarding again through the queueing system by issuing the following command:

qsub -X -I -A PROJECT_ACCOUNT

Once connected to a compute node, you may issue the following command to bring up the micsmc GUI:

[username@beacon### ~]$ micsmc & 

The status panel will then be launched in the background, allowing you to observe the real time utilization of the Xeon Phis.

-->