Using the CHMI Linux server
👩🏽‍💻

Using the CHMI Linux server

About our server

hardware

The CHMI server, nicknamed ‘Acadia’, is a powerful Linux workstation running Ubuntu 23.10 It has 2Tb of RAM, and two 26-core Intel Xeon 5320 'Ice Lake-SP' 2.2 GHz CPUs (total of 52 CPU cores, or 104 threads). This machine is also equipped with two NVIDIA RTX6000 GPUs with ‘Ada Lovelace’ architecture, each with 48GB of memory. You can read more about these GPUs here.

software

Software available on this machine changes from time to time, but you can always view a relatively up-to-date listing of informatics tools we have installed here. In addition to this software, we also host RStudio Server Pro on this machine, which allows anyone with an account on the server to run RStudio in their browser with full access to this machine's compute capabilities. We also have Geneious Prime installed and accessible remotely through a desktop sharing protocol.

drives and filesystem

Our Linux server is equipped with roughly 100Tb of storage. There are two main directories you will need to be familiar with:

  • /home - this is where everyone’s home directories live. Each user will have a folder with their name in /home. This directory resides on a single 8Tb NVMe solid-state drive (SSD) with read/write speeds of > 3000 MB/s. Roughly 5Tb of this drive are devoted to /home. There is no back-up for this drive, which means that you should never use this drive for storing important data. There are situations in which you may want to temporarily store data on /home, particularly when fast read/writes speeds are needed. For example, during basecalling for Oxford Nanopore data.
  • /data - this is where you should store raw data. Each user will have a folder with their name in /data. This directory lives on an array of 6 separate 20Tb drives formatted as a single RAID5, giving us ~90Tb of total storage! RAID5 arrays can tolerate a single drive failure without any data loss. Despite the large size and RAID format of this drive array, it is not designed for long-term data storage. You should contact PennVet IT support to get access to the school’s archival storage server.

General rules

Use of the server is free-of-charge to approved PennVet labs. However, to protect the machine from abuse and prevent issues with storage, we have a few rules all users must follow. Failure to adhere to these rules will result in a warning. Additional violations will result in suspension of your account. Here are the rules:

  • We will set-up your account and install any software you may need. Please do not install software yourself unless you’ve cleared it with me (Dan Beiting) first. You can find a full list of of all the software installed system-wide here.
  • In addition to your home account which is located at /home/[USERNAME] on the server, you will also be given a data folder that is located on /data/[USERNAME]. All data must be stored in this directory not your /home/[USERNAME] folder.
  • Data storage space is limited, so we periodically monitor storage space and may request that data not currently in use be moved off the server.
  • Please be courteous and clean up after yourself. Delete files that aren't needed (e.g. .gz/.tar/.zip files that have been uncompressed).
  • We require a strong password for each account. We will set this password for you, and ask that you do not change it. Please do not share this password with anyone outside of your lab. If you need to send this password to a lab mate, please do so using a secure method. We use password pusher for this.
  • We are not responsible for backing up your raw data or analysis scripts and files. You should use the PennVet storage server for secure backup (contact PennVet IT for more info). See below for more information on how to transfer data from our Linux server to your account on the PennVet Storage server.

Connect to CHMI linux server via SSH

📌
Accounts on our Linux server will only be granted to labs at PennVet. Contact Dan Beiting if you're at PennVet and would like to request an account.
📌
The steps outlined below require you to work in your terminal, bash, or command prompt program (exact name depends on your operating system). We have found that the terminal available through RStudio provides the most stable connection with few, if any, broken connections.

In Sept 2020, UPenn began prohibiting all SSH connections through its firewall. As a result, you will need to first use UPenn's GlobalProtect VPN client. Once connected to the VPN (requires your PennKey and password), then you will be able to use SSH to connect to our server (requires your server username and password). Here's how you can connect to the VPN:

  • Open your browser and navigate to UPenn's VPN client. You'll be prompted to download GlobalProtect for your computer (you'll only need to do this once).
  • Once installed and launched, you should see a small globe icon in your toolbar. Click the globe icon, paste vpn.upenn.edu and click 'connect'.
  • You'll be directed back to your browser to authenticate with your PennKey and password. Once connected to the VPN, you can SSH to our server as indicated below.
# connect by entering the line below in your terminal or bash program.  Replace 'username' with your username
ssh username@130.91.254.180

Transfer files to/from our linux server via SFTP

As above with SSH, this will only work if you are connected to UPenn's VPN. There are many ways to transfer files between your local computer and our linux server. We recommend using a file transfer protocol (FTP) client, and find that the FileZilla Client is a good choice because it is free and easy to use.

image

Connecting to the PennVet Storage server from our Linux

Our Linux server is for computing, not for storage. It is the responsibility of each user (that's you!) to make sure you have your data securely backed-up.

We strongly recommend that you use the PennVet Storage Server, which provides archival tape-based backup of data for PennVet labs. If your lab does not have an account on the storage server, please contact your PI and have them work with PennVet IT support to get this set up. Once set-up you can easily connect to the PennVet Storage Server from our linux.

To move data from our Linux server to the storage server you will either need to be sitting at our Linux, or you will need to contact Dan Beiting to gain access to the Linux via our remote desktop connection service. It is also possible to connect via CLI.

Once connected to (or sitting in front of) our Linux, follow the 4 steps shown in the screenshot below to connect to smb://fileserver.vet.upenn.edu/groups$/

image

Once you've connected, you'll be asked to enter your credentials and click connect again. See screenshot below:

image

If you entered the correct credentials, you will see the PennVet Storage Server connection listed on the left-hand side of your file browser. Click on this and you'll see your lab folder(s) displayed (screenshot below). You should now be able to drag files from our Linux to your lab's storage! Transfer time can be lengthy and will depend on how much data you're trying to move.

image

Understanding Linux permissions

If you're trying to do something to a file or folder on the linux (e.g., delete, move, edit, create) and are unable to, chances are you need to modify the permissions (chmod), the owner (chown) or the group (chgrp). See below and read more about permissions here. You can also use this handy webtool for translating your permissions settings to the correct ‘octals’.

image

Helpful command line tips

If you’re new to the command-line in Linux, there are lots of online resoures for learning, but here are a few of the commands that will help you move around and carry out basic tasks. Note that many of these may only work if run as sudo.

Also, check out this github page for a really awesome list of modern unix tools (each tool requires download and installation if you're not working on our Linux). In particular, dust, lsd and broot are all awesome.

Common bash commands

typing this at the promptdoes this
ctrl + r (on a Mac)
allows you to search your terminal for previous commands…just start typing
tar -xvzf [fileName.tar.gz]
unzip a .tar file
standard would be ls and ls -l, but lsd and lsd -l is way better 🙂
list all files and folders in your working diretory with info on permissions
standard would be du -a -h --max-depth=1 | sort -hr, but dust is way better 🙂
lists all files and folders in your working directory sorted by size
standard would be du -sh *, but dust is way better 🙂
simpler version of the command above. lists all files in a folder and shows their file size
standard would be df -h, but duf is way better 🙂
view free/used disk space by drive
lsd -l | wc -l
counts ALL files in a directory
lsd -l | grep -c 'sam’
counts ONLY files in a directory that are of a certain type (in this example, .sam files)
lsd -l -t | head -n1
shows only the most recently modified file or directory in your current working directory
ls -X
group files in directory by file type (extension)
tree -d, but broot is way better 🙂
lists all files and folders in your working directory as a tree structure
lsblk
lists drives and their size (as well as used/free space on each)
pressing up arrow
recalls previous command
cd /
takes you to the root directory
cd ~
takes you to your home directory
cd ..
takes you up one level in your file directory
cd ../..
takes you up two levels in your file directory
chmod u+x [fileName]
edits permissions on file
chown [yourUserName] [fileName]
makes you the owner of a file
chgrp [yourUserName] [fileName]
assigns you as the group for the file
rm -rf [directoryName]
removes a folder and all of its contents
wget [URLtoFile]
downloads a file from a website
defaults write com.apple.finder AppleShowAllFiles YES
show all hidden files in the finder (Mac only)
pip freeze
lists all the python packages (and their versions) installed on the server
sudo nano /etc/profile
opens up the system profile where new program paths can be added to the system PATH
export PATH="/path/to/your/software/:$PATH"
add a new piece of software to the system PATH so it is executable from anywhere
alias something="something else"
add lines like this to your ~/.bash_profile to create a keyboard shortcut, in this case typing 'something' actually does 'something else'
progress or watch progress
displays the progress of file manipulation jobs (e.g. from cp, mv, etc)
whereis
locates the binarysource, and manual page files for a command.
micro [FILENAME]
Micro is a text editor built right into the terminal. You can call ‘micro’ and any text file by name to open and edit that file. Micro is particularly nice since it has some of the ease of. use you are probably arelady familiar with for stand-alone text editors. User-specific customization of micro’s settings can be done by modifying the config file that lives at /home/[USERNAME]/.config/micro/settings.json
Ctrl + R
Lets you search your bash history to quickly recall commands
htop
monitor CPU and RAM usage for all jobs being run on the server
nvidia-smi
monitor GPU usage
column -s, -t < somefile.csv | less -#2 -N -S
the column program lets you view csv and tsv files with proper display directly in the terminal