The National Institute for Computational Sciences

Running jobs

Running your applications

Custom Beacon scripts

  • Any secure communication with a MIC requires unique ssh keys that are automatically generated once the scheduler assigns compute nodes
  • Custom scripts have been created to use these ssh keys, which prevent prompts asking using users for passwords

Traditional Command Custom Beacon Script
ssh micssh
scp micscp
mpirun/mpiexec micmpiexec

Job schedulers

Beacon uses the MOAB workload manager and Torque resource manager for scheduling jobs. Official scheduler commands for MOAB can be found here and information for submitting and managing jobs with Torque can be found here.

Jobs can be submitted to the queue via the qsub command. Both batch and interactive sessions are available. Batch mode is the typical method to submit production simulations. If one is not certain on how to construct a proper job executable, it is beneficial to use the interactive queue.

Also, Ganglia CPU metrics have been enabled on the Xeon Phis. We have performed extensive benchmarking that indicates that any associated performance penalty in your applications should be negligible. However, if you notice this background process becoming a problem, you can disable Ganglia monitoring altogether by specifying "-l gres=noganglia" in your batch script or in your qsub command line. Please let help@nics.utk.edu know if you have any further questions.

Interactive submission

For interactive jobs, PBS options are passed through qsub on the command line.

qsub -I -A XXXYYY -l nodes=3,walltime=1:00:00

Options:

  • -I : Start an interactive session
  • -A : Charge to the "XXXYYY" project

Putting it together,"-l nodes=3,walltime=1:00:00" will request 3 compute nodes for one hour.

After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the std input and std output of this terminal will be linked directly to the head node of our allocated resource. Issuing the exit command will end the interactive job. From here commands may be executed directly instead of through a batch script.

If you are desiring to run a native OpenMP application on the Xeon Phi in interactive mode, you may:

  1. run with micmpiexec -n 1 -env OMP_NUM_THREADS=N
  2. micssh into the Xeon Phi and run from prompt
  3. micssh mic0 env LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH executable
  4. Change mic0 to the mic you will be working with.. mic0, mic1, mic2, or mic3

Accessing the Beacon node with 4x Xeon Phi 7120P

To access this node, please use the following syntax in your submission script: -l nodes=1:mic7120

Accessing the Beacon node with 4x NVIDIA Tesla K20Xm GPUs

To access this node, please use the following syntax in your submission script: -l nodes=1:nvidia:gpus=4

Using Intel's VTune Performance Analysis in Interactive Mode

Using Intel's VTune for performance optimization and profiling is a useful tool, and it is available as part of the Intel Cluster Studio XE suite on Beacon available to all users. We suggest using the following command to run from the command line without the GUI for purposes of X11 display lag reduction:

module load vtune

amplxe-cl -collect -app-working-dir --

Finally, you can open the GUI with "module load vtune" and then the command "amplxe-gui", load in the sampling files from the command line run, and begin exploring!

NOTE: YOU CANNOT RUN SAMPLING WITH INTEL TUNE ON THE LOGIN NODE. PLEASE SUBMIT AN INTERACTIVE JOB WITH QSUB -I -X TO USE INTEL VTUNE.

Please click here for an example on how to use VTUNE

Batch submission

All non-interactive jobs must be submitted on Beacon by a job script via the qsub command. All job scripts start with a series of #PBS directives that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable: the micmpiexec command is used to launch one or more parallel executables on the compute nodes and/or coprocessors.

The following example shows a typical job script that submits a parallel job that executes ./a.out on 2 compute nodes, charged to the fictitious account UT-AACE-TEST with a wall clock limit of one hour and 15 minutes:

#!/bin/bash
#PBS -A UT-AACE-TEST
#PBS -l nodes=2,walltime=01:15:00
cd $PBS_O_WORKDIR
micmpiexec -n 2 ./a.out

If you are desiring to run a native OpenMP program on the Xeon Phi in batch mode, you may use:

  1. micmpiexec -n 1 -env OMP_NUM_THREADS=N
  2. Use the following script within a script construct:

  3. #!/bin/bash
    .
    .
    .
    micssh $(hostname)-mic0 $TMPDIR/test.sh

    Where test.sh is:
    #!/bin/sh
    source /etc/profile
    export LD_LIBRARY_PATH=$MIC_LD_LIBRARY_PATH
    executable

#2 is important because a simple micssh will not automatically pass the OpenMP environment and any other environment variables to the card without passing excessive information.

Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_needed_envars>. You can also include "module load" statements in the job script.

Running Jobs and Copying Files to the Xeon Phi Cards

After compiling source code, request a compute node to run the executable.

Once connected to a compute node, offload mode executables can be run directly. Native mode executables require manual copying of libraries, binaries, and input data to either the SSD scratch space
  • cp native_program.MIC $TMPDIR/mic0
  • cp necessary_library.so $TMPDIR/mic0/lib or to a folder in your lustre scratch space
  • mkdir/lustre/medusa/$USER/folder_name
  • cp native_program.MIC/lustre/medusa/$USER/folder_name
  • mkdir/lustre/medusa/$USER/folder_name/lib
  • cp necessary_library.so /lustre/medusa/$USER/folder_name/lib

Once files are copied over, direct access to a Xeon Phi is available through the micssh command

  • micssh beacon#-mic0

To see the files that were copied to the local SSD scratch space, you will have to change directory to TMPDIR.

  • cd $TMPDIR
  • ls

If native mode libraries were copied to the Lustre scratch space, then LD_LIBRARY_PATH needs to be modified accordingly.

  • export LD_LIBRARY_PATH=/LUSTRE/MEDUSA/$USER/folder_name/lib:$LD_LIBRARY_PATH
  • After the native mode application is run, type exit to return back to the compute node host. Output files located on the local SSD scratch space can then copied from $TMPDIR/mic#, where # is the Xeon Phi that was used, to the user's home directory or to the Lustre scratch space. Files not copied from the local SSD scratch space will be lost once the interactive session is over.

    If you are planning to run MPI on MICs on multiple nodes with the local SSD scratch space, you also need to copy files to the MICs you plan to use located on the other assigned compute nodes.

    This can be done manually by first determining which nodes you have been assigned using cat $PBS_NODEFILE.

    Then, for each assigned node, copying the necessary files using micssh or micscp:

    • micssh beacon# cp absolute_path/file_to_copy $TMPDIR/mic#
    • or
    • micscp absolute_path/file_to_copy beacon#:$TMPDIR/mic#
    • Instead of doing this manually, the custom allmicput script can be used

      Allmicput

      The allmicput script can easily copy files to $TMPDIR on all assigned MICs

      Usage: allmicput [[-t] FILE...] [-l LIBRARY...] [-x BINARY...] [-d DIR FILE...]

      Copy listed files to the corresponding directory on every MIC card in the current PBS job.

    • [-t] FILE... the specified file(s) are copied to $TMPDIR on each mic
    • -T LISTFILE the files in LISTFILE are copied to $TMPDIR on each mic
    • -l LIBRARY... the specified file(s) are copied to $TMPDIR/lib on each mic
    • -L LISTFILE the files in LISTFILE are copied to $TMPDIR/lib on each mic
    • -x BINARY... the specified file(s) are copied to $TMPDIR/bin on each mic
    • -X LISTFILE the files in LISTFILE are copied to $TMPDIR/bin on each mic
    • -d DIR FILE... the specified file(s) are copied to $TMPDIR/DIR on each mic
    • -D DIR LISTFILE the files in LISTFILE are copied to $TMPDIR/DIR on each mic
    • Back to Contents

      Current Intel recommended approach for file I/O from within an offload section


      • Files can be read from the NFS mount of /global, but only in read only mode

      • Files can be directly read from and written to the MIC's internal memory

      • To transfer a file directly to a MIC's internal memory use micscp file beaconXXX-micX:/User or /tmp

      • Be sure to reset the permissions on the file such that 'other' has read/write permissions (all I/O in offload region is executed as 'micuser')

      • Use the absolute path as the argument to fopen()

      • Remember to copy any output files off of the MICs before exiting a job

      • Files can also be read from and written to /lustre/medusa/$USER, and be sure to reset file permission (o+rw) as above and directory permission (o+x)

      • $HOME is not mounted on the MICs


      Accessing the MICSMC software

      The micsmc software, used for calculating CPU/Threading usage on thePhi/MIC cards is installed on Beacon and can only be used on the compute nodes after forwarding X. In order to do this, you must first connect to Beacon using ssh with X forwarding enabled:

ssh -X username@beacon.nics.utk.edu

Next, enable X forwarding again through the queueing system by issuing the following command:

qsub -X -I -A PROJECT_ACCOUNT

Once connected to a compute node, you may issue the following command to bring up the micsmc GUI:

[username@beacon### ~]$ micsmc & 

The status panel will then be launched in the background, allowing you to observe the real time utilization of the Xeon Phis.