Skip to content

Getting Started

Onboarding for new lab members

You will be spending alot of time in the terminal! That is why we recommend ghostty. ghostty is a newer, more customizable terminal emulator available for Linux and macOS. Here is how to get started!

The TERM environment variable is crucial for telling command-line interface (CLI) programs the capabilities of your terminal.

To configure this variable, add the following to your ~/.ssh/config file (discussed later in detail)

~/.ssh/config
SetEnv TERM=xterm-256color

ghostty is extremely configurable. However none of this is required, so feel free to skip to the next topic.

To start configuring ghostty, start by creating the config file:

Terminal window
mkdir -p ~/.config/ghostty
vim ~/.config/ghostty/config

If you prefer nano:

Terminal window
nano ~/.config/ghostty/config
Here is an example config to get started:
```python title="~/.config/ghostty/config"
shell-integration-features = ssh-terminfo,ssh-env
theme = Monokai Remastered
font-size = 18
background-opacity = 0.5
macos-titlebar-style = hidden
window-padding-y = 8,8
window-padding-x = 8,8
window-padding-balance = true

A note on shell-integration-features All the feautres here are mainly for looks. However shell-integration-features is important! Upon the first SSH connection to a host, this settign means ghostty will try to copy its terminfo entry onto the host server. This will ensure the native TERM environment variable (in this case xterm-ghostty) is set correctly. If this fails though, it will fall backs to xterm-256color. All of this happens behind the scenes, but is useful to know incase the TERM environment becomes improperly configured at a later point for some reason.

Refer to the ghostty documentation to see all available configuration options.

Refer to the UMass Chan IT Deparment’s instructions on VPN setup here (you will be prompted to login to your umassmed.edu Microsoft account).

Linux-specific VPN setup instructions You can find all the available package distributions for the VPN client here

If on windows, you will have to install WSL to launch a Linux VM from which you can SSH into the ZLab servers.

Basic WSL Commands

From powershell start your Linux (virtual machine) VM by running:

Terminal window
wsl.exe -d ubuntu

The secure shell protocol (SSH) is how you will connect to the servers. However it requires some configuration first!

Start by creating the ~/.ssh directory and ~/.ssh/config file by running:

Terminal window
mkdir -p ~/.ssh/config

Edit the config file in a text editor ():

On command-line text editors Learning to proficiently use a basic, no-frills command-line text editor can be immensely useful for making quick, small edits to files (such as now).

The text editors found on most Unix-like operating systems are:

  • nano
  • vim
  • emacs

However there are some modern choices as-well:

  • neovim
  • helix
  • micro
Terminal window
EDITOR ~/.ssh/config`

Add the following to the config file:

~/.ssh/config
Host *
TCPKeepAlive = yes
ServerAliveCountMax = 3
ServerAliveInterval = 30
ForwardX11 = yes
ForwardX11Trusted = yes
ControlMaster auto
ControlPath ~/.ssh/sockets/%r@%h:%p
ControlPersist 60
Host z010 z011 z012 z013 z014
ProxyCommand=ssh -W %h:%p -l %r bastion.wenglab.org
ForwardX11 yes
ForwardAgent yes
ForwardX11Timeout 7d
ServerAliveCountMax 3
ServerAliveInterval 15
GSSAPIAuthentication yes
User USERNAME

To login to the servers, open an SSH connection by running:

Terminal window
ssh USERNAME@bastion.wenglab.org
Storage partitioning layout
%%{init: {'theme': 'dark', 'themeVariables': { 'fontFamily': 'arial', 'textColor': '#e0e0e0', 'lineColor': '#e0e0e0' }}}%%
graph TD
    root["/"] --> tank["/tank"]
    root --> zata["/zata"]
    root --> data["/data"]
    
    tank --> tank_home["/tank/home"]
    tank_home --> regular["Regular Users<br/>(directories in tank/home)"]
    tank_home --> datasets["ZFS Datasets<br/>(mounted to /home/username)"]
    
    zata --> zata_data["data (4.1P Ceph)"]
    zata_data --> zlab["zlab/"]
    data -.->|bind mount| zlab
    
    zata --> zippy["zippy (259T Ceph)"]
    zata --> public["public_html (450T Ceph)"]

    classDef default stroke:#e0e0e0
    classDef ceph stroke:#ff88ff,color:#ff88ff
    
    class zata,zata_data,zlab,zippy,public ceph

If it’s your first time logging in, you will be prompted to scan the QR code (OTP). Register the OTP with a 2FA application (i.e. Microsoft Authenticator, Google Authenticator, Authy, Ente Auth, etc.).

Afterwards, you should be prompted to change your password.

After logging in, you will be on the bastion server. This is a security feature, which air gaps the internal servers.

Once you’ve authenticated with the bastion server, you can SSH into any of the ZLab servers by running:

Terminal window
ssh HOSTNAME

You can find all the software tutorials here. For now we’ll go through some need-to-know software to get you up and running.

Virtual Enviornment

Typically, a collection of dependencies (could be language specific) that ensure the application or program of interest runs in isolation from global, system dependencies. conda is a widely-used in bioinformatics as the defacto virtual environment manager and dependency resolver.

Containers

A different technology altogether, ‘containerization’ isolates the application or program of interest in a virtual process. This method usually offers a higher level of abstraction/isolation, in which each virtual process can have its own space, file system, network space, etc. Two widely-used programs for containerization are docker and singularity.

To install Conda, run the following:

Terminal window
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh -p /zata/zippy/$(whoami)/Miniforge3
source ~/.bashrc

Conda too slow? While Conda has come a long way in recent years it still can be quite slow. This is especially the case for large, complex environment definitions, where resolving pacakge depdencies can take minutes.

In these situations, you may want to use mamba, which is a C++ reimplementation of the core parts of the conda package manger. Put simply, it is faster than conda, while maintaining compatiblity with conda.

There is also a standalone version of mamba you can install called micromamba Refer to the installation instructions here.

Now create your first conda environment:

Terminal window
mamba create -n myenv jupyterlab numpy pandas matplotlib bedtools -c bioconda

The remote docker image clarity001/bioinformatics:latest contains a full suite of bioinformatics software that you will most likely need.

Build the singularity image file (.sif):

Terminal window
singularity build /zata/zippy/$(whoami)/bin/bioinformatics.sif docker://clarity001/bioinformatics:latest

To start an interactive shell in the container (optional):

Terminal window
singularity shell -B /data,/zata /zata/zippy/$(whoami)/bin/bioinformatics.sif
Terminal window
mkdir -p ~/.config/docker/
echo '{"data-root":"/rootless/docker/'$(whoami)'/docker"}' > ~/.config/docker/daemon.json
dockerd-rootless-setuptool.sh install

An integrated development environment (IDE) is where most of your code editing and other tasks with take place as it pertains to computational bioinformatics work. There are a couple of choices here, so lets go over the process to set them up.

Jupyterlab is a web-based IDE specifically designed for development in notebooks. The definition of a “notebook”, as per Project jupyter is:

A [notebook is a] shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls. A notebook, along with an editor (like JupyterLab), provides a fast interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.

  1. Terminal window
    singularity exec -B /data,/zata /zata/zippy/$(whoami)/bin/bioinformatics.sif jupyter lab --port=8888 --ip=HOSTNAME --no-browser --notebook-dir=/data/GROUP/$(whoami)
  2. Terminal window
    conda activate myenv
    jupyter-lab --port=8888 --ip=HOSTNAME --no-browser
  3. To access the Jupyterlab server on your client device, setup an SSH port forward by running:

    Terminal window
    ssh -N -L 8888:HOSTNAME:8888 USERNAME@HOSTNAME
    Description of SSH command

    You can read up on SSH port forwarding here.

    CommandDescription
    sshThe secure shell program that creates encrypted connections
    -NFlag that means “don’t execute a remote command/shell” - just forward ports
    -LFlag for “local port forwarding”
    8888:The local port on your computer
    HOSTNAME:The destination server’s address (check your ssh config for available hosts)
    8888The remote port on the destination server
    USERNAME@HOSTNAMEUsername and server address to login to
  4. Go to http://127.0.0.1:8888/lab in your browser.

Compared to Jupyterlab, which caters mainly towards notebook, Python, Bash, and R users, VSCode is an industry standard IDE for all code-editing needs. Here is how you can setup a remote VSCode server on the servers.

  1. To download the Dockerfile, run:

    Terminal window
    mkdir -p ~/.config/code-server/config
    git clone https://github.com/christian728504/code-server.git
    cd code-server
  2. The Dockerfile in code-server is meant to be a blank canvas. If you require additional dependencies, add them to the Dockerfile and rebuild/restart the container. Or if you would like to upstream these changes, make a pull request to christian728504/bioinformatics which contains the Dockerfile for the base image.

  3. Now run docker compose in the directory with docker-compose.yml.

    Terminal window
    docker compose up -d
  4. If the Docker container started successfully, run docker compose logs code-server. Your are looking for an output like this:

    code-server | *
    code-server | * Visual Studio Code Server
    code-server | *
    code-server | * By using the software, you agree to
    code-server | * the Visual Studio Code Server License Terms (https://aka.ms/vscode-server-license) and
    code-server | * the Microsoft Privacy Statement (https://privacy.microsoft.com/en-US/privacystatement).
    code-server | *
    code-server | [2024-12-31 22:33:18] info Using GitHub for authentication, run `code tunnel user login --provider <provider>` option to change this.
    code-server | To grant access to the server, please log into https://github.com/login/device and use code XXXX-XXXX

    Take note of:

    1. The github link https://github.com/login/device
    2. and the code XXXX-XXXX (placeholder)

    Visit the device login page, you should be prompted to authenticate with the 8 character code. Go to the VSCode IDE on your client machine and open the command palette with CMD + SHIFT + P (macOS) and type Remote-Tunnels: Connect to Tunnel. Select the Github authentication option. Wait a bit, and you should see one remote resource “online.” Once you’ve added the remote connection and opened a remote directory, you should be all set!


  5. After you close VSCode, the tunnel will automatically close. However, the server will still be running on the remote machine. To reconnect the tunnel, you will need to SSH back into the server from which you started the remote container.

  6. If you want to make use of jupyter notebooks, you’ll need to install the Jupyter extension (on both server and client device). After this you should be able to select from the available kernels (Python, Bash, R).