Anaconda is a popular data science platform used in a wide variety of fields. Conda is Anaconda’s package, dependency, and environment management solution. Originally designed for Python, Conda can handle nearly any programming language with ease. It takes care of the tedious tasks related to packages and their dependencies so that you can focus on your work. In this guide, you will learn how to use Conda for your projects on the cluster.
Anaconda is a loadable module on the cluster. At the time of this writing, five versions of Anaconda are installed on the cluster. Table 2.1 lists these versions. To verify which versions are installed on the cluster, execute
module avail anaconda. If you are unsure which distribution and version to use, review the documentation for the software package(s) you require. They should indicate which distribution and version you should use.
To load the appropriate Anaconda module, execute module load <anaconda-version>. Replace <anaconda-version> with the necessary distribution and version number. For example, to load Anaconda 2 version 4.3.1, execute
module load anaconda2/4.3.1. If you do not require a specific version of Anaconda, you should execute the same command but exclude the version number. The default version for that Anaconda distribution will be loaded. To determine which version is the default, review the output of
module avail anaconda. The term “default” will appear in parentheses next to the module.
To configure your shell to use Conda, execute Conda’s setup script. An environment variable defines the path to this script. The name of the environment variable is
ANACONDA2_SH for the Anaconda 2 distribution and
ANACONDA3_SH for the Anaconda 3 distribution. Figure 2.1 shows how to execute this script for the Anaconda 2 distribution. Replace the "2" with "3" if you use the Anaconda 3 distribution.
To automate the configuration process, add
module load <anaconda-version> to your .bashrc file and then insert the path to the script into the file. You may use the echo command with redirection to insert the script path into your .bashrc file. For instance, to automate the configuration of Anaconda 3 version 5.1.0, review the example in Figure 2.2. Always open your .bashrc file after you modify it to ensure that it was correctly changed. Also ensure that you use two greater than (>) symbols.
Test the changes to your environment by logging out of the cluster and logging back in. You should be able to activate the base environment with the
conda activate command.
In Anaconda, an environment is an isolated space where packages and dependencies can be installed without affecting other environments. To create a basic environment with no packages, execute the command shown in Figure 3.1.
Replace <env-dir-name> with a pathname. For instance, to create an environment in a directory named basic_env, you would execute
conda create -p ~/basic_env. Always verify that the directory you specify is the correct one before you create the environment. Only create Conda environments in directories to which you have access such as your home directory or Lustre scratch space.
Conda permits you to install packages when you create an environment. To do this, specify the packages after the environment’s path. Consider a new environment in the py2_env directory that should have Python 2.7, NumPy, and SciPy installed. In this case, you would execute the command displayed in Figure 3.2. All three packages would be installed as part of the environment’s creation.
There is no limit on the amount of packages you may install at the time an environment is created; however, remember that your home directory is limited to 10GB of storage space. If you require more than 10GB of storage for your environment, create the environment in Lustre space with the command presented in Figure 3.3.
Do be mindful of the 30-day purge policy on Lustre. For further considerations relevant to filesystems, please review the File Systems document.
With an environment successfully created, you may activate it with the
conda activate command. Execute
conda activate <path-to-env> to make that environment your active one. Replace <path-to-env> with the path to the environment. Environments in your home directory can be activated by using the tilde (~) character followed by the name of the directory in which the environment is installed. If you created an environment in Lustre space, execute
conda activate $SCRATCHDIR/<env-dir-name>.
It is likely that after you create an environment you will modify it as new packages are required and existing packages are no longer necessary. It may also become necessary to remove an existing environment to make room for a new one. Conda makes all these tasks simple and easy to manage.
To install new packages, you should first search for the package you wish to install. The
conda searchcommand allows you to search for packages in the Anaconda repository. Figure 4.1 shows the syntax for this command.
Take the ipython package, which is an interactive Python shell that provides several helpful features. You would first execute
conda search ipython to ensure that the package exists in the Anaconda repositories.
Depending on the package, you may see extensive output because it has multiple versions. In ipython’s case, the Anaconda repository lists 167 different versions of the package. To narrow down the results to only version seven, execute
conda search ipython=7. This output lists 35 ipython versions, which is easier to review.
Once you identify the correct package and version, use the command presented in Figure 4.2. It will download and configure the specified package.
Replace the <path-to-env> with the absolute path to the environment. Conda will also determine any dependencies and install them with the package. Note that Conda will prompt you to confirm the installation. If you are absolutely certain that you are installing the correct package and its dependencies, execute
conda install -y -p <env-dir-name> <package-name>. This will override the confirmation prompt during the installation. Be aware that you may specify the version of the package you wish to install. Using ipython as an example, you could install ipython version 7.0.1 with the
conda install -p <path-to-env> ipython=7.0.1 command.
To remove old packages, you should first review the list of existing packages in your environment. Figure 4.3 shows the command to use for this task. It outputs the names, versions, builds, and channels of every package in the environment you specify.
If you are in the environment from which you plan to remove packages, execute
conda list without any additional options or arguments.
After you determine which packages to remove, execute the command displayed in Figure 4.4. Conda will prompt you to confirm the removal unless you provide the -y flag to the conda remove command. Only use the -y flag if you are absolutely sure you wish to remove the specified package from your environment.
Updates may need to be applied to the packages within an environment. Generally, this will not be necessary and, in certain circumstances, could be destructive. Carefully consider if an update is essential to your work before attempting one. If you wish to perform an update on the environment, use the command shown in Figure 4.5. Every package in the environment will be updated to the latest version. To update a specific package, use
conda update -p <path-to-env> <package-name>. In either case, you will be prompted to permit the update unless you provide the -y option.
If you wish to switch from one environment to another, you first deactivate your existing environment, then activate the one you wish to use. The
conda deactivate command will gracefully shut down your active environment and return to you to a standard terminal prompt. Alternatively, you may use
conda activate with no arguments to return to the base environment, then switch to the environment you wish to use.
To completely remove an environment from your storage space, you should first verify you are removing the correct one. Execute the
conda info --envs command to determine which environments belong to you. Because all Conda environments on the cluster should be created with the -p option, you will only see the pathnames of the environments. Figure 4.6 shows an example of two environments that only display pathnames.
Identify the environment you wish to remove, then execute the command displayed in Figure 4.7. Every package in the environment will be removed, then the environment itself will be deleted. The directory in which the environment resided will also be deleted.
Issues with Conda generally involve environment- and package-related issues These can usually be remedied without assistance. Always double-check the spelling, capitalization, and punctuation of any names and paths. It is also important to review the documentation for the package(s) you use to understand what it requires. Other issues may require additional investigation; in these situations, use the relevant conda command with the --help option. For example, to review the options available to the conda install command, execute
conda install --help. If you are unable to remedy the issue using the steps outlined in this guide, please submit a ticket to firstname.lastname@example.org.
Loading the base Environment
In some cases, Conda may load the global base environment when you first log in to the cluster. Unless you switch to another environment with the conda activate command, you will be unable to install, remove, or modify packages. Please note that if you choose not to automatically initialize Conda when you log in, it is unlikely you’ll encounter this situation.
If you encounter this issue, open your ~/.bashrc file. Search for the line that reads “conda activate”. Using your preferred text editor, insert a hashtag (#) before this line. Once you insert this hashtag, save the file. Next time you log in to the cluster, Conda should be available to you without the base environment being automatically loaded.
If you need to move an environment from one location to another, Conda provides the --clone option for the conda create command. Do not attempt to use the mv or cp commands on a Conda environment. If you do, it will no longer be recognized by Conda. To use the --clone option, specify the environment you wish to copy. For instance, to clone an existing environment named matplot_env in your home directory and place it in Lustre scratch space, use the command displayed in Figure 5.1. Executing this would copy the environment from your home directory and place it in your Lustre scratch space. It would then appear in the output of
conda info --envs.
Using pip and Conda
Conda and pip both serve the same purpose: package and dependency management. Therefore, using these two tools together can create conflicts. It is best to use these tools separately and not combine them. Please consult our guide on pip and virtualenv for more information. If you do use Conda and pip together, install as many packages as possible with Conda before using pip. Additionally, if you need to make changes to the environment, it is best practice to recreate it entirely. For more information on using pip and Conda together, consult Anaconda’s official documentation on the relationship between these tools. Please be aware that some of the features mentioned by Anaconda’s official documentation may not be available on the cluster.
Last Updated: 03 / 16 / 2020