Getting Started
Onboarding for new lab members
Remote Access
Section titled “Remote Access”Terminal Emulator
Section titled “Terminal Emulator”You will be spending alot of time in the terminal! That is why we recommend ghostty. ghostty is a newer, more customizable terminal emulator available for Linux and macOS. Here is how to get started!
The TERM environment variable is crucial for telling command-line interface (CLI) programs the capabilities of your terminal.
To configure this variable, add the following to your ~/.ssh/config file (discussed later in detail)
SetEnv TERM=xterm-256colorConfiguration
Section titled “Configuration”ghostty is extremely configurable. However none of this is required, so feel free to skip to the next topic.
To start configuring ghostty, start by creating the config file:
mkdir -p ~/.config/ghosttyvim ~/.config/ghostty/configIf you prefer nano:
nano ~/.config/ghostty/config
Here is an example config to get started:
```python title="~/.config/ghostty/config"shell-integration-features = ssh-terminfo,ssh-envtheme = Monokai Remasteredfont-size = 18background-opacity = 0.5macos-titlebar-style = hiddenwindow-padding-y = 8,8window-padding-x = 8,8window-padding-balance = trueA note on
All the feautres here are mainly for looks. However shell-integration-featuresshell-integration-features is important! Upon the first SSH connection to a host, this settign means ghostty will try to copy its terminfo entry onto the host server. This will ensure the native TERM environment variable (in this case xterm-ghostty) is set correctly. If this fails though, it will fall backs to xterm-256color. All of this happens behind the scenes, but is useful to know incase the TERM environment becomes improperly configured at a later point for some reason.
Refer to the ghostty documentation to see all available configuration options.
VPN Setup
Section titled “VPN Setup”Refer to the UMass Chan IT Deparment’s instructions on VPN setup here (you will be prompted to login to your umassmed.edu Microsoft account).
Linux-specific VPN setup instructions
You can find all the available package distributions for the VPN client here
WSL (Windows-only)
Section titled “WSL (Windows-only)”If on windows, you will have to install WSL to launch a Linux VM from which you can SSH into the ZLab servers.
From powershell start your Linux (virtual machine) VM by running:
wsl.exe -d ubuntuThe secure shell protocol (SSH) is how you will connect to the servers. However it requires some configuration first!
Start by creating the ~/.ssh directory and ~/.ssh/config file by running:
mkdir -p ~/.ssh/configEdit the config file in a text editor ():
On command-line text editors
Learning to proficiently use a basic, no-frills command-line text editor can be immensely useful for making quick, small edits to files (such as now).
The text editors found on most Unix-like operating systems are:
nanovimemacs
However there are some modern choices as-well:
neovimhelixmicro
EDITOR ~/.ssh/config`Add the following to the config file:
Host * TCPKeepAlive = yes ServerAliveCountMax = 3 ServerAliveInterval = 30 ForwardX11 = yes ForwardX11Trusted = yes ControlMaster auto ControlPath ~/.ssh/sockets/%r@%h:%p ControlPersist 60
Host z010 z011 z012 z013 z014 ProxyCommand=ssh -W %h:%p -l %r bastion.wenglab.org ForwardX11 yes ForwardAgent yes ForwardX11Timeout 7d ServerAliveCountMax 3 ServerAliveInterval 15 GSSAPIAuthentication yes User USERNAMEFirst-time Login
Section titled “First-time Login”To login to the servers, open an SSH connection by running:
ssh USERNAME@bastion.wenglab.orgStorage partitioning layout
%%{init: {'theme': 'dark', 'themeVariables': { 'fontFamily': 'arial', 'textColor': '#e0e0e0', 'lineColor': '#e0e0e0' }}}%%
graph TD
root["/"] --> tank["/tank"]
root --> zata["/zata"]
root --> data["/data"]
tank --> tank_home["/tank/home"]
tank_home --> regular["Regular Users<br/>(directories in tank/home)"]
tank_home --> datasets["ZFS Datasets<br/>(mounted to /home/username)"]
zata --> zata_data["data (4.1P Ceph)"]
zata_data --> zlab["zlab/"]
data -.->|bind mount| zlab
zata --> zippy["zippy (259T Ceph)"]
zata --> public["public_html (450T Ceph)"]
classDef default stroke:#e0e0e0
classDef ceph stroke:#ff88ff,color:#ff88ff
class zata,zata_data,zlab,zippy,public cephIf it’s your first time logging in, you will be prompted to scan the QR code (OTP). Register the OTP with a 2FA application (i.e. Microsoft Authenticator, Google Authenticator, Authy, Ente Auth, etc.).
Afterwards, you should be prompted to change your password.
After logging in, you will be on the bastion server. This is a security feature, which air gaps the internal servers.
Once you’ve authenticated with the bastion server, you can SSH into any of the ZLab servers by running:
ssh HOSTNAMEServer Usage Guidelines
Section titled “Server Usage Guidelines”Software
Section titled “Software”You can find all the software tutorials here. For now we’ll go through some need-to-know software to get you up and running.
Enviornment management
Section titled “Enviornment management”Virtual Enviornment
Typically, a collection of dependencies (could be language specific) that ensure the application or program of interest runs in isolation from global, system dependencies. conda is a widely-used in bioinformatics as the defacto virtual environment manager and dependency resolver.
Containers
A different technology altogether, ‘containerization’ isolates the application or program of interest in a virtual process. This method usually offers a higher level of abstraction/isolation, in which each virtual process can have its own space, file system, network space, etc. Two widely-used programs for containerization are docker and singularity.
To install Conda, run the following:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"bash Miniforge3-$(uname)-$(uname -m).sh -p /zata/zippy/$(whoami)/Miniforge3source ~/.bashrcConda too slow?
While Conda has come a long way in recent years it still can be quite slow. This is especially the case for large, complex environment definitions, where resolving pacakge depdencies can take minutes.
In these situations, you may want to use mamba, which is a C++ reimplementation of the core parts of the conda package manger. Put simply, it is faster than conda, while maintaining compatiblity with conda.
There is also a standalone version of mamba you can install called micromamba Refer to the installation instructions here.
Now create your first conda environment:
mamba create -n myenv jupyterlab numpy pandas matplotlib bedtools -c biocondaSingularity
Section titled “Singularity”Build the singularity image
Section titled “Build the singularity image”The remote docker image clarity001/bioinformatics:latest contains a full suite of bioinformatics software that you will most likely need.
Build the singularity image file (.sif):
singularity build /zata/zippy/$(whoami)/bin/bioinformatics.sif docker://clarity001/bioinformatics:latestTo start an interactive shell in the container (optional):
singularity shell -B /data,/zata /zata/zippy/$(whoami)/bin/bioinformatics.sifDocker
Section titled “Docker”mkdir -p ~/.config/docker/echo '{"data-root":"/rootless/docker/'$(whoami)'/docker"}' > ~/.config/docker/daemon.jsondockerd-rootless-setuptool.sh installAn integrated development environment (IDE) is where most of your code editing and other tasks with take place as it pertains to computational bioinformatics work. There are a couple of choices here, so lets go over the process to set them up.
Jupyterlab
Section titled “ Jupyterlab”Jupyterlab is a web-based IDE specifically designed for development in notebooks. The definition of a “notebook”, as per Project jupyter is:
A [notebook is a] shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls. A notebook, along with an editor (like JupyterLab), provides a fast interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.
Start JupyterLab with Singularity:
Section titled “Start JupyterLab with Singularity:”Terminal window singularity exec -B /data,/zata /zata/zippy/$(whoami)/bin/bioinformatics.sif jupyter lab --port=8888 --ip=HOSTNAME --no-browser --notebook-dir=/data/GROUP/$(whoami)Start Jupyterlab with Conda:
Section titled “Start Jupyterlab with Conda:”Terminal window conda activate myenvjupyter-lab --port=8888 --ip=HOSTNAME --no-browserConnect to Jupertylab
Section titled “Connect to Jupertylab”To access the Jupyterlab server on your client device, setup an SSH port forward by running:
Terminal window ssh -N -L 8888:HOSTNAME:8888 USERNAME@HOSTNAMEDescription of SSH command
You can read up on SSH port forwarding here.
Command Description sshThe secure shell program that creates encrypted connections -NFlag that means “don’t execute a remote command/shell” - just forward ports -LFlag for “local port forwarding” 8888:The local port on your computer HOSTNAME:The destination server’s address (check your ssh config for available hosts) 8888The remote port on the destination server USERNAME@HOSTNAMEUsername and server address to login to Open Jupyterlab
Section titled “Open Jupyterlab”Go to http://127.0.0.1:8888/lab in your browser.
VSCode
Section titled “ VSCode”Compared to Jupyterlab, which caters mainly towards notebook, Python, Bash, and R users, VSCode is an industry standard IDE for all code-editing needs. Here is how you can setup a remote VSCode server on the servers.
Clone the
Section titled “Clone the code-server repository”code-serverrepositoryTo download the Dockerfile, run:
Terminal window mkdir -p ~/.config/code-server/configgit clone https://github.com/christian728504/code-server.gitcd code-serverInspect the Dockerfile
Section titled “Inspect the Dockerfile”The Dockerfile in
code-serveris meant to be a blank canvas. If you require additional dependencies, add them to the Dockerfile and rebuild/restart the container. Or if you would like to upstream these changes, make a pull request to christian728504/bioinformatics which contains the Dockerfile for the base image.Run
Section titled “Run docker-compose”docker-composeNow run docker compose in the directory with
docker-compose.yml.Terminal window docker compose up -dAuthenticate with GitHub
Section titled “Authenticate with GitHub”If the Docker container started successfully, run
docker compose logs code-server. Your are looking for an output like this:code-server | *code-server | * Visual Studio Code Servercode-server | *code-server | * By using the software, you agree tocode-server | * the Visual Studio Code Server License Terms (https://aka.ms/vscode-server-license) andcode-server | * the Microsoft Privacy Statement (https://privacy.microsoft.com/en-US/privacystatement).code-server | *code-server | [2024-12-31 22:33:18] info Using GitHub for authentication, run `code tunnel user login --provider <provider>` option to change this.code-server | To grant access to the server, please log into https://github.com/login/device and use code XXXX-XXXXTake note of:
- The github link
https://github.com/login/device - and the code
XXXX-XXXX(placeholder)
Visit the device login page, you should be prompted to authenticate with the 8 character code. Go to the VSCode IDE on your client machine and open the command palette with
CMD + SHIFT + P(macOS) and typeRemote-Tunnels: Connect to Tunnel. Select the Github authentication option. Wait a bit, and you should see one remote resource “online.” Once you’ve added the remote connection and opened a remote directory, you should be all set!- The github link
Reconnect to the tunnel
Section titled “Reconnect to the tunnel”After you close VSCode, the tunnel will automatically close. However, the server will still be running on the remote machine. To reconnect the tunnel, you will need to SSH back into the server from which you started the remote container.
Usage of Jupyterlab
Section titled “Usage of Jupyterlab”If you want to make use of jupyter notebooks, you’ll need to install the
Jupyterextension (on both server and client device). After this you should be able to select from the available kernels (Python, Bash, R).