Getting Started

Onboarding for new lab members

Please be aware of the following placeholder terms:

Placeholder	Meaning
`USERNAME`	your ZLab username
`GROUP`	- `zusers` if you are in ZLab - `musers` if you are in Moore Lab - `rusers` if you are a rotation student in either group.
`HOSTNAME`	hostname of server
`EDITOR`	text editor

This page is meant to be read in sequential order.

In addition to this page, you may want to also check out the onboarding document.

Remote Access

Terminal Emulator

You will be spending alot of time in the terminal! That is why we recommend ghostty. ghostty is a newer, more customizable terminal emulator available for Linux and macOS. Here is how to get started!

`TERM`

The TERM environment variable is crucial for telling command-line interface (CLI) programs the capabilities of your terminal.

To configure this variable, add the following to your ~/.ssh/config file (discussed later in detail)

SetEnv TERM=xterm-256color

Configuration

ghostty is extremely configurable. However none of this is required, so feel free to skip to the next topic.

To start configuring ghostty, start by creating the config file:

mkdir -p ~/.config/ghostty
vim ~/.config/ghostty/config

If you prefer nano:

nano ~/.config/ghostty/config

Here is an example config to get started:

```python title="~/.config/ghostty/config"
shell-integration-features = ssh-terminfo,ssh-env
theme = Monokai Remastered
font-size = 18
background-opacity = 0.5
macos-titlebar-style = hidden
window-padding-y = 8,8
window-padding-x = 8,8
window-padding-balance = true

A note on shell-integration-features

All the feautres here are mainly for looks. However shell-integration-features is important! Upon the first SSH connection to a host, this settign means ghostty will try to copy its terminfo entry onto the host server. This will ensure the native TERM environment variable (in this case xterm-ghostty) is set correctly. If this fails though, it will fall backs to xterm-256color. All of this happens behind the scenes, but is useful to know incase the TERM environment becomes improperly configured at a later point for some reason.

Refer to the ghostty documentation to see all available configuration options.

VPN Setup

Refer to the UMass Chan IT Deparment’s instructions on VPN setup here (you will be prompted to login to your umassmed.edu Microsoft account).

Linux-specific VPN setup instructions

You can find all the available package distributions for the VPN client here

WSL (Windows-only)

If on windows, you will have to install WSL to launch a Linux VM from which you can SSH into the ZLab servers.

Basic WSL Commands

From powershell start your Linux (virtual machine) VM by running:

wsl.exe -d ubuntu

SSH

The secure shell protocol (SSH) is how you will connect to the servers. However it requires some configuration first!

Start by creating the ~/.ssh directory and ~/.ssh/config file by running:

mkdir -p ~/.ssh/config

Edit the config file in a text editor ():

On command-line text editors

Learning to proficiently use a basic, no-frills command-line text editor can be immensely useful for making quick, small edits to files (such as now).

The text editors found on most Unix-like operating systems are:

nano
vim
emacs

However there are some modern choices as-well:

neovim
helix
micro

EDITOR ~/.ssh/config`

Add the following to the config file:

1
Host *
2
     TCPKeepAlive = yes
3
     ServerAliveCountMax = 3
4
     ServerAliveInterval = 30
5
     ForwardX11 = yes
6
     ForwardX11Trusted = yes
7
     ControlMaster auto
8
     ControlPath ~/.ssh/sockets/%r@%h:%p
9
     ControlPersist 60
10

11
Host z010 z011 z012 z013 z014
12
     ProxyCommand=ssh -W %h:%p -l %r bastion.wenglab.org
13
     ForwardX11 yes
14
     ForwardAgent yes
15
     ForwardX11Timeout 7d
16
     ServerAliveCountMax 3
17
     ServerAliveInterval 15
18
     GSSAPIAuthentication yes
19
     User USERNAME

Make sure to create the ~/.ssh/sockets directory

mkdir -p ~/.ssh/sockets

This is required since our config set ControlMaster to auto which enables SSH multiplexing for OpenSSH. This feature allows multiple SSH connections to share the same underlying network connection, which can speed up subsequent connections by avoiding repeated authentication.

To login to the servers, open an SSH connection by running:

ssh USERNAME@bastion.wenglab.org

Storage partitioning layout

%%{init: {'theme': 'dark', 'themeVariables': { 'fontFamily': 'arial', 'textColor': '#e0e0e0', 'lineColor': '#e0e0e0' }}}%%
graph TD
    root["/"] --> tank["/tank"]
    root --> zata["/zata"]
    root --> data["/data"]
    
    tank --> tank_home["/tank/home"]
    tank_home --> regular["Regular Users<br/>(directories in tank/home)"]
    tank_home --> datasets["ZFS Datasets<br/>(mounted to /home/username)"]
    
    zata --> zata_data["data (4.1P Ceph)"]
    zata_data --> zlab["zlab/"]
    data -.->|bind mount| zlab
    
    zata --> zippy["zippy (259T Ceph)"]
    zata --> public["public_html (450T Ceph)"]

    classDef default stroke:#e0e0e0
    classDef ceph stroke:#ff88ff,color:#ff88ff
    
    class zata,zata_data,zlab,zippy,public ceph

If it’s your first time logging in, you will be prompted to scan the QR code (OTP). Register the OTP with a 2FA application (i.e. Microsoft Authenticator, Google Authenticator, Authy, Ente Auth, etc.).

Afterwards, you should be prompted to change your password.

After logging in, you will be on the bastion server. This is a security feature, which air gaps the internal servers.

Once you’ve authenticated with the bastion server, you can SSH into any of the ZLab servers by running:

ssh HOSTNAME

Server Usage Guidelines

Software

You can find all the software tutorials here. For now we’ll go through some need-to-know software to get you up and running.

Enviornment management

Types

Virtual Enviornment

Typically, a collection of dependencies (could be language specific) that ensure the application or program of interest runs in isolation from global, system dependencies. conda is a widely-used in bioinformatics as the defacto virtual environment manager and dependency resolver.

Containers

A different technology altogether, ‘containerization’ isolates the application or program of interest in a virtual process. This method usually offers a higher level of abstraction/isolation, in which each virtual process can have its own space, file system, network space, etc. Two widely-used programs for containerization are docker and singularity.

Conda

To install Conda, run the following:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh -p /zata/zippy/$(whoami)/Miniforge3
source ~/.bashrc

Conda too slow?

While Conda has come a long way in recent years it still can be quite slow. This is especially the case for large, complex environment definitions, where resolving pacakge depdencies can take minutes.

In these situations, you may want to use mamba, which is a C++ reimplementation of the core parts of the conda package manger. Put simply, it is faster than conda, while maintaining compatiblity with conda.

There is also a standalone version of mamba you can install called micromamba Refer to the installation instructions here.

Now create your first conda environment:

mamba create -n myenv jupyterlab numpy pandas matplotlib bedtools -c bioconda

Singularity

Build the singularity image

The remote docker image clarity001/bioinformatics:latest contains a full suite of bioinformatics software that you will most likely need.

Build the singularity image file (.sif):

singularity build /zata/zippy/$(whoami)/bin/bioinformatics.sif docker://clarity001/bioinformatics:latest

To start an interactive shell in the container (optional):

singularity shell -B /data,/zata /zata/zippy/$(whoami)/bin/bioinformatics.sif

Docker

mkdir -p ~/.config/docker/
echo '{"data-root":"/rootless/docker/'$(whoami)'/docker"}' > ~/.config/docker/daemon.json
dockerd-rootless-setuptool.sh install

IDE

An integrated development environment (IDE) is where most of your code editing and other tasks with take place as it pertains to computational bioinformatics work. There are a couple of choices here, so lets go over the process to set them up.

Jupyterlab

Jupyterlab is a web-based IDE specifically designed for development in notebooks. The definition of a “notebook”, as per Project jupyter is:

A [notebook is a] shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls. A notebook, along with an editor (like JupyterLab), provides a fast interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.

Start JupyterLab with Singularity:

singularity exec -B /data,/zata /zata/zippy/$(whoami)/bin/bioinformatics.sif jupyter lab --port=8888 --ip=HOSTNAME --no-browser --notebook-dir=/data/GROUP/$(whoami)

Start Jupyterlab with Conda:
Section titled “Start Jupyterlab with Conda:”
Terminal window
```
conda activate myenv
jupyter-lab --port=8888 --ip=HOSTNAME --no-browser
```

Connect to Jupertylab

To access the Jupyterlab server on your client device, setup an SSH port forward by running:

ssh -N -L 8888:HOSTNAME:8888 USERNAME@HOSTNAME

Description of SSH command

You can read up on SSH port forwarding here.

Command	Description
`ssh`	The secure shell program that creates encrypted connections
`-N`	Flag that means “don’t execute a remote command/shell” - just forward ports
`-L`	Flag for “local port forwarding”
`8888:`	The local port on your computer
`HOSTNAME:`	The destination server’s address (check your ssh config for available hosts)
`8888`	The remote port on the destination server
`USERNAME@HOSTNAME`	Username and server address to login to

Open Jupyterlab
Section titled “Open Jupyterlab”
Go to http://127.0.0.1:8888/lab in your browser.

VSCode

Compared to Jupyterlab, which caters mainly towards notebook, Python, Bash, and R users, VSCode is an industry standard IDE for all code-editing needs. Here is how you can setup a remote VSCode server on the servers.

Clone the code-server repository
Section titled “Clone the code-server repository”
To download the Dockerfile, run:
Terminal window
```
mkdir -p ~/.config/code-server/config
git clone https://github.com/christian728504/code-server.git
cd code-server
```
Inspect the Dockerfile
Section titled “Inspect the Dockerfile”
The Dockerfile in code-server is meant to be a blank canvas. If you require additional dependencies, add them to the Dockerfile and rebuild/restart the container. Or if you would like to upstream these changes, make a pull request to christian728504/bioinformatics which contains the Dockerfile for the base image.
A Dockerfile is a set of build instructions for building a docker container. For more info on Docker, please read our tutorial
If you do install any packages during runtime, remember they are not persistent!
Run docker-compose
Section titled “Run docker-compose”
Now run docker compose in the directory with docker-compose.yml.
Terminal window
```
docker compose up -d
```
Authenticate with GitHub
Section titled “Authenticate with GitHub”
Before proceeding with this step, make sure you have a github account.
If the Docker container started successfully, run docker compose logs code-server. Your are looking for an output like this:
```
code-server  | *
code-server  | * Visual Studio Code Server
code-server  | *
code-server  | * By using the software, you agree to
code-server  | * the Visual Studio Code Server License Terms (https://aka.ms/vscode-server-license) and
code-server  | * the Microsoft Privacy Statement (https://privacy.microsoft.com/en-US/privacystatement).
code-server  | *
code-server  | [2024-12-31 22:33:18] info Using GitHub for authentication, run `code tunnel user login --provider <provider>` option to change this.
code-server  | To grant access to the server, please log into https://github.com/login/device and use code XXXX-XXXX
```
Take note of:
1. The github link https://github.com/login/device
2. and the code XXXX-XXXX (placeholder)
Visit the device login page, you should be prompted to authenticate with the 8 character code. Go to the VSCode IDE on your client machine and open the command palette with CMD + SHIFT + P (macOS) and type Remote-Tunnels: Connect to Tunnel. Select the Github authentication option. Wait a bit, and you should see one remote resource “online.” Once you’ve added the remote connection and opened a remote directory, you should be all set!
Reconnect to the tunnel
Section titled “Reconnect to the tunnel”
After you close VSCode, the tunnel will automatically close. However, the server will still be running on the remote machine. To reconnect the tunnel, you will need to SSH back into the server from which you started the remote container.
Usage of Jupyterlab
Section titled “Usage of Jupyterlab”
If you want to make use of jupyter notebooks, you’ll need to install the Jupyter extension (on both server and client device). After this you should be able to select from the available kernels (Python, Bash, R).

Getting Started

Remote Access

Terminal Emulator

`TERM`

Configuration

VPN Setup

WSL (Windows-only)

SSH

Server Usage Guidelines

Software

Enviornment management

Types

Conda

Singularity

Build the singularity image

Docker

IDE

Jupyterlab

Start JupyterLab with Singularity:

Start Jupyterlab with Conda:

Connect to Jupertylab

Open Jupyterlab

VSCode

Clone the `code-server` repository

Inspect the Dockerfile

Run `docker-compose`

Authenticate with GitHub

Reconnect to the tunnel

Usage of Jupyterlab

Getting Started

Remote Access

Terminal Emulator

TERM

Configuration

VPN Setup

WSL (Windows-only)

SSH

First-time Login

Server Usage Guidelines

Software

Enviornment management

Types

Conda

Singularity

Build the singularity image

Docker

IDE

Jupyterlab

Start JupyterLab with Singularity:

Start Jupyterlab with Conda:

Connect to Jupertylab

Open Jupyterlab

VSCode

Clone the code-server repository

Inspect the Dockerfile

Run docker-compose

Authenticate with GitHub

Reconnect to the tunnel

Usage of Jupyterlab

`TERM`

Clone the `code-server` repository

Run `docker-compose`