Introduction

This book is written for less experienced Computer Science students who are interested in learning more about tools that are commonly used in the industry. And, among those commonly used tools, two stand out because of the sheer scale of their adoption: Linux and git.

Because of their adoption, essentially ever developer (unless they are working on a legacy code base) will use these technologies on a daily basis. For this reason, and because knowledge of Linux is required for upper level classes at the Washkewicz College of Engineering at CSU (like C Programming and Operating Systems), we, the Computer Peer Teachers (CPTs), have decided to offer a short introductory workshop on both of these topics. This is the accompanying book for that workshop.

During this workshop, we seek to answer the following questions for Linux:

And for git:

Workshop Recording

What is Linux?

Linux, or (more technically) GNU/Linux, is an open source operating system (or family of operating systems) created by Linus Torvalds in 1991. It has two main parts: the linux kernel, which interacts with the hardware, manages memory and processes, and does many other low-level tasks, and the user-land programs, which run on top of the kernel. User-land programs are supported by a set of system libraries. The most popular set of system libraries is the GNU C Library (hence the name GNU/Linux).

More info: https://en.wikipedia.org/wiki/Linux

However, you need more than just the kernel and set of system libraries to create a useful operating system setup. This is where Linux distributions comes in.

What is a Linux distribution?

A Linux distribution (or "distro") is a collection of software that is bundled with the linux kernel (and often the GNU C Library) in order to provide a more complete operating system experience. This software bundle almost always include a package manager and a shell, and sometimes includes a graphical user interface (GUI). These distros are also characterized by how often they are updated. Some distros focus on stability, releasing twice a year, while others focus on the latest features, releasing ever one or two months (with package managers that update packages as often as possible).

There are many different operating systems for many different purposes, but some of the most popular ones are:

  • Ubuntu
    • One of the most popular Linux distributions
    • Focused on stability, with updates every 6 months
    • Based on Debian
  • Debian
    • Another very popular Linux distribution
    • Uses the Debian package manager and Apt
  • Fedora
  • Mint
  • Arch
    • One of the most popular rolling-release Linux distribution
    • Uses the pacman package manager
    • For more advanced users, since it doesn't come with default desktop environment
  • Manjaro
    • Based on Arch
  • Kali Linux
    • Based on Debian
    • Includes many cyber security tools
  • Tails
    • An ephemeral Linux distribution, used only on live boot USBs

More info: https://en.wikipedia.org/wiki/Linux_distribution

What is a Desktop Environment?

A desktop environment is a graphical user interface (GUI) that provides users with a way to interact with the operating system. It handles all graphical program windows, menus, and icons, along with the taskbar and other desktop widgets. A few popular desktop environments include:

  • GNOME
  • KDE
  • Xfce
  • MATE
  • Cinnamon

More info: https://en.wikipedia.org/wiki/Desktop_environment

Most linux distributions meant for desktop use come with a pre-installed desktop environment. For example, Ubuntu comes with GNOME while Kubuntu, a spinoff of Ubuntu, comes with KDE. It is important to note basically any linux distribution can be used with any desktop environment. In fact, some distro websites (like https://fedoraproject.org/spins/) provide a download for each popular desktop environment. Arch is somewhat unique in that it doesn't come with a desktop environment at all. Instead, users will need to install one after installing the distro itself.

Why should I care about Linux?

Because of Linux's quality and free licensing, it has become a popular choice for servers. In fact, "96.3% of the top one million web servers are running Linux" 1. That means that a strong majority developers need to develop products that run on Linux. Because of this and because Linux is free and flexible, many developers use Linux as their desktop operating system as well 1.

Since there is no escaping linux, we strongly encourage young developers to start learning it.

How to install Linux

Linux can be installed in a variety of ways, but it almost always starts with downloading the distribution's install ISO. Here are the links to a few common distros:

Installing on a Physical Machine

Once you have downloaded the ISO, use a USB flashing program (any of the below) to write the ISO to a USB drive.

Once the USB drive is written, boot the machine from the USB drive (using a BIOS or UEFI boot menu) and follow the instructions provided by the distro's installer.

Installing on a Virtual Machine

Setting up a virtual machine is a great way to play with and learn Linux without having to install it on a physical machine. We recommend using Oracle VirtualBox for this purpose since it's free and open source. You can download it here: https://www.virtualbox.org/wiki/Downloads.

To setup a linux Virtual Machine (VM), follow this guide: https://ubuntu.com/tutorials/how-to-run-ubuntu-desktop-on-a-virtual-machine-using-virtualbox#1-overview. It's written for Ubuntu, but the steps for booting from the ISO image should be almost identical for other distros.

Using the Linux Terminal

The Linux command line is a text interface to your computer. Often referred to as the shell, terminal, console, prompt or various other names, it can give the appearance of being complex and confusing to use. However, the basics are actually quite simple and easy to learn.

Side note: If you are interested in learning more about the history of the terminal, read Section 1.2.10 for more information.

Accessing the Terminal Over SSH

Most of you are probably running Windows or MacOS on your personal computer, so you will need to access a linux terminal remotely using a Secure Shell (SSH) connection. On Windows, you can do this using PuTTY. Read Section 1.2.1 for more information.

On MacOS (or Linux), you can create an SSH connection using the ssh command. For example, to connect to a remote server named example.com using the username johndoe and the password mypassword, you would run the following command:

ssh johndoe@example.com

You will be prompted to enter your password.

Accessing CSU's Linux Servers

If you are taking a CIS course this semester that requires Linux, you should have access to CSU's Linux servers. These servers have the following layout:

CSU Linux Servers

Your username for all of CSU's servers is <first two letters of your first name><up to the first six letters of your last name>. Your password (unless you have changed it) is <CSU ID><capital initial of your last name.

If your name is John Doe and you CSU ID is 1234567, then you would have the following:

Username: jodoe

Password: 1234567D

To access spirit with the account above, use the following command:

ssh jodoe@spirit.eecs.csuohio.edu

Using Google Cloud Shell

You also have the option of using Google Cloud Shell, which is free. Simply go to https://shell.cloud.google.com/. After the shell is provisioned, you should see the environment below:

Google Cloud Shell

Your linux shell should be available at the bottom of the page.

Running your First Command

To run your first command, click inside the terminal window to ensure it's active, then type the following in lowercase and press Enter:

pwd

This will display your current directory path (likely something like /home/YOUR_USERNAME), followed by the prompt text again.

The prompt indicates the terminal is ready for your next command. When you see references to "command prompt" or "command line," they simply mean the place where you type commands in the terminal.

When you run a command, any output will typically appear in the terminal. Some commands display a lot of text, while others may not show anything if they complete successfully. If a new prompt appears right away, the command likely succeeded.

The pwd command (print working directory) shows your current location in the file system. The working directory is where file operations take place by default unless specified otherwise. To check where you are, use pwd.

To change the working directory, use cd (change directory):

  • Move to the root directory:

    cd /
    pwd
    
  • Move to the "home" directory from root:

    cd home
    pwd
    
  • Go up one level to the parent directory:

    cd ..
    pwd
    
  • To return to your home directory (also represented by the ~ path):

    cd
    pwd
    

You can also move up multiple levels:

cd ../..
pwd

To go directly to the "etc" directory from your home directory:

cd ../../etc
pwd

Paths can be relative (depending on your current directory) or absolute (starting with /).

Most examples so far have used relative paths, meaning the location you navigate to depends on your current directory. For instance, moving to the "etc" directory works from the root:

cd /
cd etc

But if you're in your home directory and try cd etc, you'll get an error because the command is relative to your current location.

Absolute paths, however, work regardless of your current directory. These paths start with a /, indicating the root directory. For example:

cd /etc

This will always take you to the "etc" directory, no matter where you are. Similarly, running cd alone returns you to your home directory. Another absolute path shortcut is using ~, which refers to your home directory:

cd ~
cd ~/Desktop

To navigate directly to a specific folder, use an absolute path with /home/USERNAME/:

cd /home/USERNAME/Desktop

The prompt updates to reflect your current location in the file system, with ~ representing your home directory. Understanding absolute paths is key as you work with files and directories.

Creating and Opening Folders and Files

To safely experiment with files, let's create a directory away from your home folder:

mkdir /tmp/tutorial
cd /tmp/tutorial

This creates a new directory, "tutorial," inside /tmp using an absolute path. Now, let's create a few subdirectories:

mkdir dir1 dir2 dir3

This command creates multiple directories at once. If you'd like to create nested directories, use the -p option (short for "make all Parent directories"):

mkdir -p dir4/dir5/dir6

Here, -p ensures parent directories (dir4 and dir5) are created if they don't exist.

To create folders with spaces in the names, use quotes or a backslash to escape the space:

mkdir "folder 1"
mkdir folder\ 3

Avoid spaces in file names where possible by using underscores or hyphens for easier command-line use.

Listing and Creating Files

Let's create some files and work with them. Start by listing the contents of your current directory:

ls

To capture the output of this command into a file, use redirection (>):

ls > output.txt

This creates a file called output.txt with the list of directory contents. To view the file:

cat output.txt

The echo command can also create files with content:

echo "This is a test" > test_1.txt
echo "This is a second test" > test_2.txt
echo "This is a third test" > test_3.txt

You can view their contents using cat. To combine multiple files:

cat test_1.txt test_2.txt test_3.txt > combined.txt
cat combined.txt

Wildcards simplify commands when file names follow patterns. For example, these commands all achieve the same result:

cat test_?.txt
cat test_*

If you want to append text to an existing file, use >>:

echo "Appending a line" >> combined.txt
cat combined.txt

To view long files one page at a time, use less:

less combined.txt

You can navigate using arrow keys and exit with q. This basic workflow helps in creating and managing files with content efficiently.

Case Sesitivity

Unix systems are case-sensitive, meaning files like A.txt and a.txt are treated as entirely different. For example:

echo "Lower case" > a.txt
echo "Upper case" > A.TXT
echo "Mixed case" > A.txt

This creates three distinct files. It’s best to avoid file names that only differ by case to prevent confusion, especially when transferring files to case-insensitive systems like Windows. There, all three names would be treated as the same file, which could lead to data loss.

Rather than relying on upper case names (which would require frequent Caps Lock toggling), many users stick to lower case file names. This prevents case-related issues and keeps typing consistent with most shell commands, which are lower case. This habit helps avoid complications and reduces the chances of filename collisions.

Nope, don't wanna Shout.

A good practice for file naming on Unix systems is to use only lower-case letters, numbers, underscores, and hyphens. File names typically include a dot followed by a few characters as the file extension (e.g., .txt, .jpg). Sticking to this pattern avoids issues with case sensitivity and escaping, and simplifies command-line usage. Although it may seem limiting, this approach will save time and prevent errors when working in the terminal regularly.

File Manipulation

Moving Files:

  • To move a file into a directory:
    mv combined.txt dir1
    
  • To move it back to the current directory:
    mv dir1/* .
    

Moving Multiple Files:

  • To move several files and directories at once:
    mv combined.txt test_* dir3 dir2
    

Moving Across Nested Directories:

  • To move combined.txt from one directory to another nested location:
    mv dir2/combined.txt dir4/dir5/dir6
    

Copying Files:

  • To copy a file from one location to the current directory:

    cp dir4/dir5/dir6/combined.txt .
    
  • To create a copy with a different name:

    cp combined.txt backup_combined.txt
    

Renaming Files:

  • To rename backup_combined.txt to combined_backup.txt:
    mv backup_combined.txt combined_backup.txt
    

Renaming Directories:

  • To rename directories (use the Up Arrow for quicker edits):
    mv "folder 1" folder_1
    mv "folder 2" folder_2
    

Use ls to verify the results of each operation. These commands help manage files and folders efficiently without needing to change directories or use the mouse.

Deleting Files:

  • To delete files:
rm dir4/dir5/dir6/combined.txt combined_backup.txt

Deleting Directories:

  • To delete directories, use rmdir for empty folders:
rmdir folder_*
  • If a directory contains files or subdirectories, rmdir will fail. To delete non-empty directories, use rm with the recursive -r option:
rm -r folder_6

This is a quick and efficient way to clean up files and folders without unnecessary repetition.

Safety Warning

When using the rm command, be extremely cautious, as it permanently deletes files without moving them to a trash folder. Accidental deletions can easily occur, especially when using wildcards. For example, rm t* deletes all files starting with "t," while rm t * could delete everything in the directory.

To prevent unintended deletions, consider using the -i (interactive) option with rm. This option prompts you to confirm each deletion, allowing you to type Y to delete, N to keep, or Ctrl-C to cancel the operation. Always double-check your commands before executing them to avoid irreversible loss.

Recursive Deletion: Be especially cautious when using rm -r, as it will delete everything within the directory, including all files and subdirectories. It’s often safer to explicitly delete files first and remove the directory afterward.

Piping

Modern computers and phones have advanced capabilities, yet text remains crucial for organizing files, from filenames to metadata. The Linux command line offers powerful tools for text manipulation, particularly through piping, which allows the output of one command to feed directly into the input of another.

Piping examples:

  1. Count Files in a Directory: To count the number of lines in an output without creating a temporary file (which is required for > redirection), use:
    ls ~ | wc -l
    
  2. View Large Outputs: For lengthy outputs, use less:
    ls /etc | less
    
  3. Find Unique Lines: To count unique lines in combined.txt, chain commands:
    cat combined.txt | uniq | wc -l
    
    If few duplicates are removed, it’s likely because uniq only removes adjacent duplicates.
  4. Check Command Documentation: Use the man command for details on how commands work:
    man uniq
    
  5. Sort Before Uniquing: To prepare for using uniq, sort the file first:
    sort combined.txt | uniq | wc -l
    
  6. Searching for a string in an input: To search for a string in a file, use grep:
    cat combined.txt | grep "string"
    

SSH on Windows using PuTTY

PuTTY can be downloaded from https://www.putty.org/.

More information coming soon.

Until we complete this section, this tutorial should suffice: https://www.ssh.com/academy/ssh/putty/windows.

Origin of the Linux Terminal

During the early development of the computer industry, Unix emerged as a multi-user operating system for mainframe computers. Users connected remotely via basic terminals, which featured only a keyboard and screen, sending keystrokes to the server and receiving text-based outputs. Programs had to handle text input and output due to the lack of graphical interfaces.

Text is resource-light, enabling users to interact efficiently with programs even over slow network connections. The command structure was concise to minimize keystrokes, contributing to the continued popularity of text interfaces.

Unix users managed file operations such as creating, renaming, and organizing files using a textual interface. Each task required specific commands (e.g., cd for changing directories and ls for listing contents). These commands were coordinated by a master program known as the “shell,” which also allowed for command chaining and automation through shell scripts. The original Unix shell, sh, has evolved into modern shells like bash.

Linux is a descendant of Unix, designed to function similarly, allowing many old Unix programs to run effectively. While old terminals could connect to modern Linux systems, it is more common to use software terminals that provide a Unix-style text interface alongside graphical programs.

What is Git?

Git is a version control system (VCS), which means that it saves, compares, and manages changes to files (usually code files) over time.

This may seem innocuous, but it's actually vital to the development cycle. Virtually all professional code is managed by some VCS, and git is by far the most common.

Use cases

Tracking file changes over time

Have you ever been scared to work on your code after completing a feature because you didn't break it? Git fixes this by allowing you to take a snapshot (or "commit") of your code, which you can always go back to by running a single command (git checkout ... or git reset ...).

Making multiple commits (best practice is a commit for each feature) throughout the progression of a project allows you to revert at any time to any "milestone." You'll never lose your work again1.

Working on different versions of your code at the same time

Git branches allow you to work on multiple versions of your code base at the same time. For example, if you want to maintain an old, released version (e.g. adding security patches) while continuing feature development for the latest version, you could create a branch called v1.0.0 and a branch called dev.

By default, git repositories have one branch named master or main. This is often used either for development or for the latest stable version of the software.

Code collaboration

Branches (or forked repositories on a platform like GitHub) can be used by multiple developers to collaborate on the same code base. Usually, each developer works on their own branch and implements a feature. Once that feature is complete, they merge the branch into the main or master branch.

Storing code remotely

Git can be used with platforms like GitHub or GitLab to manage code stored in the cloud. This is an important part of the modern development cycle.

Typically, a developer creates a branch, implements their feature (or reaches some stopping point), and then commits their changes. After they commit their changes, they push it up to the server hosting the main code base so that others can access it.

For smaller projects, developers may skip the "creating a branch" step and commit new code directly to the main branch.

1

Unless you delete or lose the git repository itself, or manually delete commits.

How to Install git

Linux

Install it using your favorite package manager.

Debian / Ubuntu (apt)

sudo apt-get install git-all

Fedora (dnf)

sudo dnf install git

Arch (pacman)

sudo pacman -S git

For other Linux distros you can use the same name for the package git-all

MacOS

MacOS will already have git installed by default.

Windows

  1. Go to the latest Git Windows installer and download the latest version.

  2. Follow the instructions as provided by the installer wizard.

  3. Open the windows command prompt.

  4. Type git version to verify Git was installed.

Using git

How does it work?

You will first create what is called a repository, this is the place where your code will be.

In our case main will be our master repository. This is the main code that will be used.

Branches

Branches allow you to work on different versions of your project simultaneously without affecting the main codebase

From this main code you can create branches, this allows you to make changes without the chance of ruining the main code. Think of it as a copy of your code that you can now change with zero risks.

Use this command to create a new branch:

git checkout -b "name_of_the_branch"

To switch branches

git checkout "name_of_the_branch"

Drawing-0

Staging

Once we made some changes we will want to commit them, before we can do that we want to stage the files. This is like another layer to saving your files. This is so if you have two files you changed but only one is ready to commit, you can select the file to stage.

To stage your file run this command:

git add file_name/folder_name

TIP:

use this command to see the changes made.

git status

Committing

Creating a commit creates a snapshot of all the staged files

TIP: For commit messages do you not use past tense, such as "I made headings blue". Use language like "Make headings blue", as if you are giving orders to the codebase.

This command creates a commit:

git commit -m "description of the commit"

Drawing-1

A good way of visualizing how this works

Drawing-2

Merging branches

To bring the latest changes in branch_a into main, we would first switch to the main branch...

git checkout main

and then run the git merge command.

git merge branch_a

If the files modified in main and branch_a (after branch_a was created or after the last merge) are mutually exclusive, then git should be able to do this automatically. However, if the same file was modified in both branches, you probably encounter this:

CONFLICT (content): Merge conflict in my_file.txt
Automatic merge failed; fix conflicts and then commit the result.

Before you do anything else, you will need to fix the merge conflicts. Git represents these conflicts within each file using the following template (or something similar, if you've modified the default settings):

<<<<<<< HEAD
Change in the main branch
=======
Change in branch_a
>>>>>>> branch_a

To resolve this, replace the text above with the final desired version (it could be one or the other, or a combination of both). Once you've resolved all of the conflicts in (and saved) each file, stage the files and create a new commit. This completes the merge process.

Afterwards, you may delete branch_a.

git branch -d branch_a