The National Institute for Computational Sciences

Lustre Troubleshooting

Lustre's lfs utility provides several subcommands for monitoring and configuring the Lustre environment. Some useful ones for troubleshooting Lustre behavior are given here. As always, see the lfs man page or use lfs help for more information.

Listing OSTs

The lfs osts command gives a listing of all Object Storage Targets (OSTs) on the system. The list will contain lines such as:

10: scratch-OST000a_UUID ACTIVE
11: scratch-OST000b_UUID ACTIVE

Note that there are two ways to identify an OST: by the decimal id given first (e.g. 10) and by the Lustre UUID which uses hexadecimal (e.g. scratch-OST000a_UUID).

You'll find that there are 90 OSTs listed for /lustre/medusa and 16 for /lustre/snx on Darter.

OST Usage

The lfs df command shows usage information for each OST as well as summary usage for the entire system. Its output will look something like:

UUID                  ... Use% Mounted on
medusa-OST000c_UUID  ... 72% /lustre/medusa[OST:12]
medusa-OST000d_UUID  ... 70% /lustre/medusa[OST:13]

The listing will also include numbers of used and available blocks for each OST (omitted here for space). Here again we have the OST UUID using hexadecimal given first, and the decimal id given at the end of each line. This command can be used to troubleshoot Lustre behavior by pinpointing OSTs that have become unresponsive or are in danger of filling up.

The lfs quota command lists your total usage on the Lustre system. You must specify your username and the Lustre path, for example:

> lfs quota -u <username> /lustre/medusa

OST Information for Files

Use lfs getstripe to see what OSTs are used in storing a particular file.

> lfs getstripe
lmm_stripe_count:   4
lmm_stripe_size:    1048576
lmm_stripe_offset:  186
        obdidx           objid          objid            group
           186        52153455      0x31bcc6f                0
           258        53124880      0x32a9f10                0
            25        52477227      0x320bd2b                0
            97        52444876      0x3203ecc                0

In this example, the file is striped across 4 OSTs with a stripe size of 1 MB. The obdidx numbers listed are the decimal indices of the OSTs used in the striping of this file.

Locating Files on Lustre

The lfs find command, similar to GNU find, is used to efficiently search for and list files on Lustre. Without any flags, lfs find simply lists files recursively for a given directory. You may use the --maxdepth option to limit the recursion.

A common use for lfs find is to locate all files that are in danger of being purged. For example, on Darter, the following command:

> lfs find /lustre/medusa/$USER -mtime +30 -type f

would list all regular files (-type f) that have last been modified 30 or more days ago (-mtime +30).

When there are known issues with particular OSTs, you may need to know what files are affected. For this scenario, use the -O option to find all files that are using a particular OST, specifying the Lustre UUID. For example:

> lfs find -O scratch-OST000b_UUID