Conda for bioinformatics

This document accompanies the DIYtranscriptomics course and is intended to provide basic guidance in the installation of various bioinformatics softwares using Conda. If you have problems...don't worry, we're here to help.

What is Conda and why should you install it?
Install Miniconda
Mac OS
Windows OS
Configuring your Conda installation
Create your first Conda environment
Rinse and repeat
Install other software we'll use for the course
Useful Conda tips
Generally useful Conda commands
Don't get carried away with your 'base' conda environment
Backup plan if Conda doesn't work for you
General troubleshooting tips for using Conda

What is Conda and why should you install it?

Taken directly from the Conda manual:

Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. It was created for Python programs but it can package and distribute software for any language.

Note: when you read 'package' in the text above, just think 'software'. An environment, on the other hand, is the software plus everything else that this software needs to run properly. This point is key to understanding why Conda has become a preferred way for installing a wide range of bioinformatics software – because it does a pretty good job (not perfect) of avoiding Dependency Hell.

Install Miniconda

Conda comes in two flavors: Anaconda and Miniconda. We want to install Miniconda, because it's much more lightweight while still meeting all of our needs. Importantly, when we install Miniconda, we'll be getting the Python programming language as part of that installation.

Mac OS

Download the Miniconda install script from here

Move this shell script (.sh) file to your home folder on your Mac, and enter the following line into your terminal application

bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p $HOME/miniconda

Now 'source' conda so that it is available to you from the command line regardless of which directory you're in

source $HOME/miniconda/bin/activate

This next step may only be necessary if you're running a newer MacOS that uses the zsh shell.

conda init zsh

Windows OS

Windows OS sometimes presents a little more of a challenge for getting started in this course, so there’s a bit more work that has to be done before you can get started install software with Conda. Fortunately, these steps only need to be done once, and the end result will be a computer that is be much happier running any kind of bioinformatics software. The instructions below should work on Windows 10 or 11.

First things first, we have to make sure that your computer is ready to play nice with some of the software we want to install. To do this, search for “Turn Windows Features on or off” and open this application. Make sure that you select “Windows Subsystem for Linux” if it is not already. Once selected, click OK and restart your computer for the changes to take effect.

After your computer has booted back up, search for ‘PowerShell’, right click this application, and choose to run as administrator.
In the PowerShell, run the following command to install Windows Subsystem for Linux (WSL).

wsl --install

WSL will be installed. Follow instructions to finish install and set up username and password. Restart machine when prompted do so.

🚨

I cannot emphasize this enough. If you are a Windows user, to participate in the course you MUST install WSL before trying to install other software for the course. Do not proceed with the steps below if you have not completed the steps above.

With WSL installed, you now have a Ubuntu Linux operating system running happily alongside your Windows OS. But lets just check to be sure. Search for and open the ‘Ubuntu’ application on your computer. Then run the following command:

cat /etc/os-release

If Ubuntu installation is successful you should see something like this after the command:

🚨

From this point forward, if I tell you to run something in your ‘terminal’ or ‘bash’, you should be using the Ubuntu application. DO NOT use PowerShell for this course.

Now we can finally get down the business of downloading and installing miniconda on our machine! To do this, run the following in the Ubuntu application to download the software.

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Now we install miniconda. To do this, run the command below. During the installation, choose all the default options, and in the final step select ‘yes’

bash ~/Miniconda3-latest-Linux-x86_64.sh

Activate conda

source ~/.bashrc

You should now see that your bash prompt is prepended with “(base)”. We talk about what this means in the course.
You can now move to the next section below

Configuring your Conda installation

Now make sure Conda works and explore a bit using the lines below

conda info #to view all the details about your conda set-up
conda info --envs #to view all the environments available to you (note, since you just installed miniconda, you'll only have a 'base' environment available)

One of the things that makes Conda so great for software installation is that it has access to various channels where many pre-packaged bioinformatics programs can be downloaded with all their dependencies. Let's configure our Conda installation now so that it knows which channels to look for.

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false

Create your first Conda environment

Some of the most basic pieces of command-line software we discuss and use at the beginning of course aren't available in R/bioconductor. Instead, we'll install these into a single 'environment' using Conda, which makes managing dependencies much less frustrating. We'll be using Conda to install Kallisto, fastqc, and MultiQC.

Begin by creating an empty environment called 'rnaseq'...or name it whatever you'd like

conda create --name rnaseq

Now activate your newly created environment

conda activate rnaseq

Notice that your terminal should now show that you have now entered the 'rnaseq' environment (example image below).

Now let's install some commonly used RNA-seq software inside this environment. Begin with Kallisto, which is our go-to tool for read mapping.

Note: if you get a y/n question during installation, respond yes by typing 'y' and enter.

Note: the most important piece of software here is Kallisto. If you encounter issues installing FastQC and/or MultiQC, just move on...it will not impact your ability to participate in the course.

conda install -c bioconda kallisto

Test that it works!

kallisto

If your Kallisto installation worked, then you should see something in your terminal that resembles the output below (basically, Kallisto is saying "I'm here, now what would you like me to do?!"). If so, take a second to pat yourself on the back – you just installed your first piece of software using Conda! 🎉 🎊

Note: if you are using Windows and the kallisto installation using conda was unsuccessful, follow the instructions in the Plan B for Windows OS section.

‼️

Mac users: If you have one of the the newer mac laptops with a M1/M2/M3 ARM chip, everything above may work fine, but then when you actually try to use Kallisto for the first time to map reads, you may get an ‘illegal instruction’ error. If this happens to you, it’s because Conda installed the newest version of Kallisto, and this isn’t compatible with your computer hardware. To remedy this, you can trash your rnaseq conda environment with conda remove --name rnaseq --all. Then create a new conda environment and explicitly ask for an older (compatible) version of Kallisto. To do this, you would activate your new Conda environment and run: conda install -c bioconda kallisto=0.48.0 . This is a good example of when it can be really useful to specify exactly which version of software is installed in your environment.

Rinse and repeat

Now that you have Kallisto installed, you're going to install additional software into the same 'rnaseq' environment.

Run conda install -c bioconda fastqc and conda install -c bioconda multiqc

Check that both installed correctly.

Note: if your laptop runs Windows, you may encounter some issues with fastqc. It should install without issue but fastqc may not be recognized as an internal or external command, operable program or batch file. If so, no worries, it won't affect your ability to participate in the course. However, you may want to try installing a similar program for quality control analysis of raw reads, called fastp. You can install fastp using conda install -c bioconda fastp. Another alternative is to install FastQC manually and use it in its interactive mode. Instructions for this can be found in the Plan B for Windows OS section

Install other software we'll use for the course

Now that we are done installing software in our rnaseq environment, we can exit this environment by typing conda deactivate. Let's create some additional environments for other software you might want to use on your laptop.

Note: for the purposes of this course, the most important piece of software listed below is Kallisto-bustools in python (kb-python) for single cell RNA-seq analysis at the end of the course. If you encounter issues installing sourmash or centrifuge, just move on...it will not impact your ability to participate in the course.

Note: run each line below separately, rather than copying/pasting the entire block of code.

conda create -y --name kb python=3.8 #create an environment, specifying python v3.8
conda activate kb #activate that newly created environment
pip install kb-python #install kb-python in the environment.  Note: if this fails because of an issue with pysam, then do 'conda install pysam' then retry this line.
kb #test that it works!

We'll use sourmash for creating and analyzing 'sketches' of HTS data later in the course. Let's create a new and separate environment for this

conda create --name sourmash #create an empty environment
conda activate sourmash #activate that newly created environment
conda install -c bioconda sourmash #install sourmash in the environment
sourmash #test that it works!

We'll use Centrifuge for rapid and memory-efficient classification of DNA sequences from microbial samples

conda create --name centrifuge #create an empty environment
conda activate centrifuge #activate that newly created environment
conda install -c bioconda centrifuge #install centrifuge in the environment
conda #test that it works!

Useful Conda tips

Check out this article for a nice breakdown of the between Conda and the package manager, Pip.

Generally useful Conda commands

conda info # Displaying useful info related to conda on your machine
conda list # shows you everything installed in your current environment
conda list -n [ENV NAME] # shows you everything installed in the specified environment
conda config --show # shows your config file, which you may need to see/change at some point in your work
conda config --show channels # shows you channels from the config file
conda remove --name myenv --all # remove any environment (substitute your env name for 'myenv')
conda search myenv # search your channels for a specific package called 'myenv'
nano $HOME/.condarc #view your list of channels 
conda update --all # update conda

Don't get carried away with your 'base' conda environment

When you install conda, you automatically get a 'base' environment. In fact, you may find that when you open your terminal or shell application, that you are placed in the base env by default. Avoid installing lots of software in base or, eventually, you will run into conflicts.

Backup plan if Conda doesn't work for you

You should only be reading this if the steps above failed. So, what do you do if Conda doesn't install properly or you aren't able to install the software above? No worries, we can probably help in the lab session devoted to troubleshooting software installation. In the event that we can't resolve your IT issues, we have a backup plan to help you get the most essential software for the course installed.

Conda is not the only game in town when it come to package managers. If you were unable to get Kallisto installed, give it a try with Homebrew. Although this isn't essential for the class, it will make your life a lot easier when you try to install software in future. To get Homebrew, enter the following line into your terminal (Mac) or Ubuntu application (Windows running WSL).

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Some, but not all, of the software we installed using Conda above is also available for MacOS using Homebrew. Go ahead and install as follows:

brew install kallisto
brew install fastqc

Still having issues? it’s probably a relatively easy fix and something that many others have encountered before you. If you haven’t done so already, you should join the course Discord page. Once on the Discord page, check out the #lecture-01 channel, where you’ll see many posts from other students who have worked through various issues with installing software for the course. If you don’t see a post that helps solve your problem, then post your question in the channel with as many details as possible (screenshots are essential), and either myself or the TAs (or other students!) will spring into action to help.

General troubleshooting tips for using Conda

If you use conda long enough, it’s only a matter of time before you will run into issues where an environment will fail to install (usually with an error like “failed to resolve conflicts”, or it might just hang forever on ‘solving environment’). If this happens, you may be able to fix the issue simply by changing the way your Conda installation handles channels.

first check your channel priority setting the conda configuration file with

conda config --show channel_priority

then change to the opposite (e.g. if set to strict, change to flexible):

conda config --set channel_priority flexible
#alternatively, conda config --set channel_priority strict

Now retry installing your conda environment.