This document accompanies the DIYtranscriptomics course and is intended to provide basic guidance in the installation of various bioinformatics softwares using Conda. If you have problems...don't worry, we're here to help.
- What is Conda and why should you install it?
- Install Miniconda
- Mac OS
- Windows OS
- Configuring your Conda installation
- Create your first Conda environment
- Rinse and repeat
- Install other software we'll use for the course
- Useful Conda tips
- Generally useful Conda commands
- Don't get carried away with your 'base' conda environment
- Backup plan if Conda doesn't work for you
- General troubleshooting tips for using Conda
What is Conda and why should you install it?
Taken directly from the Conda manual:
Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. It was created for Python programs but it can package and distribute software for any language.
Note: when you read 'package' in the text above, just think 'software'. An environment, on the other hand, is the software plus everything else that this software needs to run properly. This point is key to understanding why Conda has become a preferred way for installing a wide range of bioinformatics software â because it does a pretty good job (not perfect) of avoiding Dependency Hell.
Install Miniconda
Conda comes in two flavors: Anaconda and Miniconda. We want to install Miniconda, because it's much more lightweight while still meeting all of our needs. Importantly, when we install Miniconda, we'll be getting the Python programming language as part of that installation.
Mac OS
Download the Miniconda install script from here
Move this shell script (.sh) file to your home folder on your Mac, and enter the following line into your terminal application
bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p $HOME/miniconda
Now 'source' conda so that it is available to you from the command line regardless of which directory you're in
source $HOME/miniconda/bin/activate
This next step may only be necessary if you're running a newer MacOS that uses the zsh shell.
conda init zsh
Windows OS
Windows OS sometimes presents a little more of a challenge for getting started in this course, so thereâs a bit more work that has to be done before you can get started install software with Conda. Fortunately, these steps only need to be done once, and the end result will be a computer that is be much happier running any kind of bioinformatics software. The instructions below should work on Windows 10 or 11.
- First things first, we have to make sure that your computer is ready to play nice with some of the software we want to install. To do this, search for âTurn Windows Features on or offâ and open this application. Make sure that you select âWindows Subsystem for Linuxâ if it is not already. Once selected, click OK and restart your computer for the changes to take effect.
- After your computer has booted back up, search for âPowerShellâ, right click this application, and choose to run as administrator.
- In the PowerShell, run the following command to install Windows Subsystem for Linux (WSL).
- WSL will be installed. Follow instructions to finish install and set up username and password. Restart machine when prompted do so.
wsl --install
- With WSL installed, you now have a Ubuntu Linux operating system running happily alongside your Windows OS. But lets just check to be sure. Search for and open the âUbuntuâ application on your computer. Then run the following command:
- Now we can finally get down the business of downloading and installing miniconda on our machine! To do this, run the following in the Ubuntu application to download the software.
- Now we install miniconda. To do this, run the command below. During the installation, choose all the default options, and in the final step select âyesâ
- Activate conda
- You should now see that your bash prompt is prepended with â(base)â. We talk about what this means in the course.
- You can now move to the next section below
cat /etc/os-release
If Ubuntu installation is successful you should see something like this after the command:
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ~/Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
Configuring your Conda installation
Now make sure Conda works and explore a bit using the lines below
conda info #to view all the details about your conda set-up
conda info --envs #to view all the environments available to you (note, since you just installed miniconda, you'll only have a 'base' environment available)
One of the things that makes Conda so great for software installation is that it has access to various channels where many pre-packaged bioinformatics programs can be downloaded with all their dependencies. Let's configure our Conda installation now so that it knows which channels to look for.
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false
Create your first Conda environment
Some of the most basic pieces of command-line software we discuss and use at the beginning of course aren't available in R/bioconductor. Instead, we'll install these into a single 'environment' using Conda, which makes managing dependencies much less frustrating. We'll be using Conda to install Kallisto, fastqc, and MultiQC.
Begin by creating an empty environment called 'rnaseq'...or name it whatever you'd like
conda create --name rnaseq
Now activate your newly created environment
conda activate rnaseq
Notice that your terminal should now show that you have now entered the 'rnaseq' environment (example image below).
Now let's install some commonly used RNA-seq software inside this environment. Begin with Kallisto, which is our go-to tool for read mapping.
Note: if you get a y/n question during installation, respond yes by typing 'y' and enter.
Note: the most important piece of software here is Kallisto. If you encounter issues installing FastQC and/or MultiQC, just move on...it will not impact your ability to participate in the course.
conda install -c bioconda kallisto
Test that it works!
kallisto
If your Kallisto installation worked, then you should see something in your terminal that resembles the output below (basically, Kallisto is saying "I'm here, now what would you like me to do?!"). If so, take a second to pat yourself on the back â you just installed your first piece of software using Conda! đ đ
Note: if you are using Windows and the kallisto installation using conda was unsuccessful, follow the instructions in the Plan B for Windows OS section.
Rinse and repeat
Now that you have Kallisto installed, you're going to install additional software into the same 'rnaseq' environment.
Run conda install -c bioconda fastqc
and conda install -c bioconda multiqc
Check that both installed correctly.
Note: if your laptop runs Windows, you may encounter some issues with fastqc. It should install without issue but fastqc may not be recognized as an internal or external command, operable program or batch file. If so, no worries, it won't affect your ability to participate in the course. However, you may want to try installing a similar program for quality control analysis of raw reads, called fastp. You can install fastp using conda install -c bioconda fastp
. Another alternative is to install FastQC manually and use it in its interactive mode. Instructions for this can be found in the Plan B for Windows OS section
Install other software we'll use for the course
Now that we are done installing software in our rnaseq environment, we can exit this environment by typing conda deactivate
. Let's create some additional environments for other software you might want to use on your laptop.
Note: for the purposes of this course, the most important piece of software listed below is Kallisto-bustools in python (kb-python) for single cell RNA-seq analysis at the end of the course. If you encounter issues installing sourmash or centrifuge, just move on...it will not impact your ability to participate in the course.
Note: run each line below separately, rather than copying/pasting the entire block of code.
conda create -y --name kb python=3.8 #create an environment, specifying python v3.8
conda activate kb #activate that newly created environment
pip install kb-python #install kb-python in the environment. Note: if this fails because of an issue with pysam, then do 'conda install pysam' then retry this line.
kb #test that it works!
We'll use sourmash for creating and analyzing 'sketches' of HTS data later in the course. Let's create a new and separate environment for this
conda create --name sourmash #create an empty environment
conda activate sourmash #activate that newly created environment
conda install -c bioconda sourmash #install sourmash in the environment
sourmash #test that it works!
We'll use Centrifuge for rapid and memory-efficient classification of DNA sequences from microbial samples
conda create --name centrifuge #create an empty environment
conda activate centrifuge #activate that newly created environment
conda install -c bioconda centrifuge #install centrifuge in the environment
centrifuge #test that it works!
Useful Conda tips
Check out this article for a nice breakdown of the between Conda and the package manager, Pip.
Generally useful Conda commands
conda info # Displaying useful info related to conda on your machine
conda list # shows you everything installed in your current environment
conda list -n [ENV NAME] # shows you everything installed in the specified environment
conda config --show # shows your config file, which you may need to see/change at some point in your work
conda config --show channels # shows you channels from the config file
conda remove --name myenv --all # remove any environment (substitute your env name for 'myenv')
conda search myenv # search your channels for a specific package called 'myenv'
nano $HOME/.condarc #view your list of channels
conda update --all # update conda
Don't get carried away with your 'base' conda environment
When you install conda, you automatically get a 'base' environment. In fact, you may find that when you open your terminal or shell application, that you are placed in the base env by default. Avoid installing lots of software in base or, eventually, you will run into conflicts.
Backup plan if Conda doesn't work for you
You should only be reading this if the steps above failed. So, what do you do if Conda doesn't install properly or you aren't able to install the software above? No worries, we can probably help in the lab session devoted to troubleshooting software installation. In the event that we can't resolve your IT issues, we have a backup plan to help you get the most essential software for the course installed.
Conda is not the only game in town when it come to package managers. If you were unable to get Kallisto installed, give it a try with Homebrew. Although this isn't essential for the class, it will make your life a lot easier when you try to install software in future. To get Homebrew, enter the following line into your terminal (Mac) or Ubuntu application (Windows running WSL).
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Some, but not all, of the software we installed using Conda above is also available for MacOS using Homebrew. Go ahead and install as follows:
brew install kallisto
brew install fastqc
Still having issues? itâs probably a relatively easy fix and something that many others have encountered before you. If you havenât done so already, you should join the course Discord page. Once on the Discord page, check out the #lecture-01 channel, where youâll see many posts from other students who have worked through various issues with installing software for the course. If you donât see a post that helps solve your problem, then post your question in the channel with as many details as possible (screenshots are essential), and either myself or the TAs (or other students!) will spring into action to help.
General troubleshooting tips for using Conda
If you use conda long enough, itâs only a matter of time before you will run into issues where an environment will fail to install (usually with an error like âfailed to resolve conflictsâ, or it might just hang forever on âsolving environmentâ). If this happens, you may be able to fix the issue simply by changing the way your Conda installation handles channels.
- first check your channel priority setting the conda configuration file with
conda config --show channel_priority
- then change to the opposite (e.g. if set to strict, change to flexible):
conda config --set channel_priority flexible
#alternatively, conda config --set channel_priority strict
- Now retry installing your conda environment.