Git basics
Git basics

Git basics

This document accompanies Lab #3 from the DIYtranscriptomics course.

Set-up Visual Studio Code to work with Git and Github

There are many ways to work with Git and Github, including the command-line as well as third part point-and-click applications. I think that VS Code is probably one of the easiest ways to interact with both Git and Github and eliminates the need to remember command-line functions.

  1. To being, open up VS Code and click on the Git icon on the left-hand toolbar
  2. image

  1. You may see that VS Code is telling you that you need to download and install Git in order to use this feature. If so, go ahead and do that now.
  2. Windows users: even if you already have Git installed via WSL, you will still need to do it again. VS Code lives on the portion of your harddrive that has Windows, and therefore doesn’t ‘see’ WSL and all the other stuff you installed with WSL. VS Code will install Git for Windows, not Linux.

    Mac users: You may be asked to install developer tools. If so, do it.

  3. Click on the extension tab of your toolbar and search for “Github Repositories” in the search bar. Click on the this extension and choose ‘install’.
  4. image

  1. Start a new folder on your computer. Open this directory in VS Code (file → open folder) and click on the the Git tab of VS Code again. You should now see a option to ‘Initialize Repository’. Choose that option.
  2. image

  1. After we’ve initialized version control in our folder, let’s take a look at the contents of that folder in our bash application. Navigate to your folder using Terminal (Mac) or CommandPrompt (Windows), and type ls -a to view all files including hidden files. Notice that you can now see a hidden folder .git. This is your local git database that was created when you clicked “initialize repository” above.
  2. Now let’s drag and drop our chocolate muffin recipe file into to our working directory. What do see in VS Code? This is a great opportunity to talk about key git terms, like ‘stage’, ‘commit’, and ‘diff’. Note that none of these operations require a connection to Github…it’s all happening locally and can continue that way as long as you want.

Connecting our local git repo with Github

  1. Right now, we have a version controlled working directory that is not connected to Github. Git operates independent of Github, but it’s incredibly useful to connect our local Git version database to Github so we can store our version history on the web. Let’s revisit our Git tab in VS Code and choose ‘Publish to Github’. If you don’t see the large blue button at the top of VS Code, you should look for the small cloud icon with an up arrow at the bottom of VS Code (see screenshot below). Both accomplish the same thing. If you’ve never used VS Code to access Github, you’re going to be asked to authenticate. You’ll only need to do this once. Just say yes/OK to any all questions during this authentication. Once you’ve authenticated, you will be asked if you’d like to publish to Github as a public or private repo (your choice!).
  2. image
  3. Now that our local repo is connected to a remote repo on github, we can continue to work on our files locally, commit changes locally, and then at any point decide when we want to push these changes up to Github. Awesome!

Time to make some muffins!

  1. For the reminder of the lab, let’s use our chocolate muffin recipe to get more comfortable using VS Code to work with Git and Github. I suggest you use branches in Git to make three separate changes to our muffin recipe (each change that we’ll be making is documented at the bottom of the recipe, but feel free to make your own changes). At the end of the demo, you will have your original recipe on the ‘main’ development branch, and each of the three recipe changes living on its own named branch.
  2. To create a new branch in VS Code, simply click on the ‘main’ branch icon at the bottom of your screen, which will reveal a dropdown menu at the top of VS Code. Choose “+ Create new branch” and give your branch an informative name. From now on you can toggle between main and any branch using by revisiting this button at the bottom of your screen.
  3. image
  4. After you’ve made a few branches and committed some changes locally, try pushing your changes up to Github. This is as easy as clicking on the “Sync Changes” button at the top, or selecting “sync” from the source control menu, or clicking the circular arrow at the bottom….lots of options (see screenshot below)! Congratulations, you’re now familiar with 95% of Git/Github functionality that you’ll ever need for your research!
  5. image

Understanding .gitignore files

  1. Everything we’ve discussed above dealt with simple text files (e.g. our recipe). One critical thing to understand is that Git and Github are not intended for large files! For example, large raw data files (e.g.,fastq files) should not be tracked with git because they are raw data and should never be changing. If a file doesn’t or shouldn’t change over time, then we don’t need to keep a version history of it! Even if we did accidentally start tracking a large file, we would not be able to push this version history up to Github because Github blocks file that are greater than 100 Mb. To avoid these issues, we could simply leave large files untracked, but it’s easy to accidentally commit or push large files. To avoid this, we will use a special hidden file that git can read, called a .gitignore.
  2. A .gitignore file tells your local Git program which files is should always leave untracked. If a file isn’t tracked, then it has no version history stored in that hidden .git database on your computer, and therefore cannot be pushed up to Github.
  3. Use the gitignore.txt file I provided in the section above. Move this file to your working directory and rename it from gitignore.txt to .gitignore.
  4. With the .gitignore file in place we can now test it out. Drag and drop a file type that is included in your gitignore (e.g. one of the small fastq files from the last lab) into your working directory. This file will appear in your working directory (because you just put it there) but is not visible in VS Code’s file browser….because you told it to ignore this file type.

Tips and tricks for getting the most from Git

  1. Make meaningful commit messages (i.e. not just “I updated some stuff”). Good commit messages say why you made a change, not what change was made (anyone can figure that out from just diff’ing a file).
  2. Smaller commits are better than bigger ones. My developer friends like to say that they have never seen a commit that was too small, but frequently see ones that are too big. Big commits are much more likely to have multiple problems that needs to be separated out.
  3. You don’t necessarily need to use branches. Different people have different styles of working with Git. If you’re the only working on a repo, for example, then you could consider never creating a branch and just committing all your changes to main. You’ll never lose anything, because it’s all version controlled and you can always return to any previous state of a document. On the other hand, if you’re working on a repo with other people, you will definitely want to work on a branch to avoid problems.
  4. Merged branches should be deleted. If you ever get to the point where you merge a branch with main, then you’ll want to delete the branch afterward so no one (including your future self) mistakingly thinks that the branch is still under active development.
  5. Always begin your work with a ‘Pull’. If you’re collaborating on a shared github repo, you need to start work with the most current version of the code, so always begin by pulling the repo from Github to ensure that you’re up-to-date.
  6. Take the time to write a thorough README. All github repos include a README text file. When you published your local repo to Github for the first time, Github took the liberty of create a blank README. It’s fine to leave this blank when your repo is just getting started, but eventually you’ll want to use this file to explain to others what the repo contains and how one might use this code moving forward. You can edit this file locally (after pulling from Github) or directly on Github. The more details you can provide, the better. Here’s an example README from one of my lab’s recent papers.