Forem: Augusto Pascutti

Shell for anxious developers

Augusto Pascutti — Tue, 10 Jun 2025 03:16:55 +0000

There are great guides on bash (or Bourne-compatible shell: sh, zsh, ksh) out there. I don't want to teach you bash, or any special trick. I want to ~~convince~~ show you why I think it is worth learning. It won't be much but, hopefully, it is enough, if the following itches your curiosity:

$ git log --name-only --pretty="format:" \
  | sed '/^\s*$/'d \
  | sort \
  | uniq -c \
  | sort -rn \
  | head

I assume you already know how to use a shell to run commands, and that you have git installed.

Composition using Pipes

On a POSIX shell, bash for example, you can use pipes (|) to use the output of a program as input of another:

$ seq 1 5
1
2
3
4
5

$ seq 1 5 | sort -n -r
5
4
3
2
1

To learn what a command does you can use man <command>, <command> --help, info <command> or help <command>. An excerpt from the man pages of commands above shows:

seq <first> <last> prints a sequence of numbers from first to last.
sort [options] [file] sort lines of text files. Without a file, it reads from standard input.

Notice thinks between [brackets] and <less-greater signs> ? This means <required> and [optional], a convention mostly everyone follows. All programs used in these examples are available even on most basic distributions. Even [alpine][], which is known for being very small and lean:

$ docker run --rm -it alpine sh
# seq 1 3 | sort -n -r
3
2
1

It is worth noting that man (and its counterparts) work offline. Getting to know how them and [the pager] will give you access to invaluable knowledge (git man pages are a treat).

Loops and Conditionals

You can think about a shell as a "place to run other programs". When it really is an infinite loop running one command: readline. Once you wrap your head around that, you can quickly develop and debug small programs. Like a never-ending running test-suite.

I like to approach this is by using history expansion (zsh, macOS default shell, also has it):

!! executes last successful command.
!<prefix> executes last command that matches prefix.
!$ expands to the last ($ on regex is used as "the end of a string") argument of the last executed program (only works on bash).

Some great CLI citizens use them. After a git clone <repo> [dir], for example, you can cd !$ to enter the directory you've just cloned. Notice how the last option is useful for other commands. That, my great comrade, is good design. Remember this and you will remember the order of argument for some pretty usefull programs:

ln <path/to/file> <path/to/symlink>: The symlink is the useful part, so it is last. You can !$ to run the binary or cd !$ if it as directory.
cp <source [source [source]]> <dest>: You can copy multiple files and directories to one destination, which is the useful part. So it is last.

Back to our "never-ending test-suite": I try a command until I am satisfied with its result and then pass it on with history expansion to a loop or another command.

Suppose you want to update all Git repositories inside your $HOME directory. The outline of the idea: (1) find all directories with .git inside of them, (2) for every repository cd <repo> into it and (3) run git pull.

$ find "$HOME" -type d -name ".git"
/home/augustohp/.tmux/plugins/tpm/.git
/home/augustohp/.vim/bundle/vim-nerdtree-tabs/.git
/home/augustohp/.vim/bundle/nvim-lspconfig/.git
/home/augustohp/.vim/bundle/trouble.nvim/.git
/home/augustohp/src/github.com/expressjs/.git

The command above lists all .git (-name) directories (-type d) inside $HOME. Note the results have .git on them - we want its parent directory. So I will try and use sed to remove .git from the end of each line, I will keep trying until I have:

$ !find | sed 's/\/\.git$//'
/home/augustohp/.tmux/plugins/tpm
/home/augustohp/.vim/bundle/vim-nerdtree-tabs
/home/augustohp/.vim/bundle/nvim-lspconfig
/home/augustohp/.vim/bundle/trouble.nvim
/home/augustohp/src/github.com/expressjs

sed accepts any regular expression delimiter, we are using / (which most examples you see use it as well) but when dealing with paths (which use / as directory separator) it is useful to use another - avoiding the escape (\). Dot is also an special character we need to escape (\), it is used to match "any character". Using another delimiter the command becomes:

$ find "$HOME" -type d -name ".git" | sed 's#/\.git$##'
/home/augustohp/.tmux/plugins/tpm
/home/augustohp/.vim/bundle/vim-nerdtree-tabs
/home/augustohp/.vim/bundle/nvim-lspconfig
/home/augustohp/.vim/bundle/trouble.nvim
/home/augustohp/src/github.com/expressjs

Bash, as other shells, have conditions and loops. With variables and command substitution, we can start to compose more complex instructions:

$ find "$HOME" -type d -name ".git" | sed 's/\/\.git$//'
$ repositories=$(!!)
$ for repo in $repositories
do
  cd "$repo"
  git pull --auto-stash
  cd -
done

$(!!) executes the previous command (!!) inside a sub-shell and returns its output.
repositories=$(!!) defines the contents of the previous command executed ($(!!)) into repositories variable.
for name [ [in [words …] ] ; ] do commands; done executes a loop:
- cd "$repo" enters the repository. It is good to always quote (") paths because they might have spaces on their names.
- git pull --auto-stash will update the repository and save (stash) any uncommitted changes.
- cd - returns to previous directory, before the first cd was made.
- If you want to do that in one line, you need to change \n (new line) to ;. If you search the command using history, you will see it on that short format.

Let's say you don't want to update repositories that have uncommitted changes in them. For that, the output of git status should be empty which can be tested with test -z (man test to see available operators for if conditions):

$ for repo in $(find "$HOME" -type d -name ".git" | sed 's/\/\.git$//')
do
  cd "$repo"
  git_status_output="$(git status)"
  if [ ! -z "$git_status_output" ]
  then
    git pull --auto-stash
  else
    echo "Error: $repo has uncommitted changes."
  fi
  cd -
done

Conditionals and exit codes

You know conditionals right? On shells they look the same but they have a twist, one that is useful for running commands: The return of a command can always be evaluated as a conditional. If it runs successfully, it is true. Every command that return 0 (zero), is successful. So commands can have as many error codes they want.
I've made the instructions bigger to improve understanding, usually I'd one-line them with && (AND) and || (OR) operators:

$ cd /tmp/non-existing-directory
-bash: cd /tmp/non-existing-directory: No such file or directory
$ echo $?
1

The special variable $? has the return code of the previous command. Since it is 1 it was an error, if the error message did not give it away. As you've guessed, you can do this:

$ if cd /tmp/non-existing-directory
then
    echo "great success!"
else
    echo "not"
fi
-bash: cd /tmp/non-existing-directory: No such file or directory
not

You can, of course, get rid of these error messages using redirections:

$ cd /tmp/non-existing-directory 2> /dev/null
$ echo $?
1

The 2> redirects file descriptor 2 (stderr) to /dev/null. You can also shorten every conditional using || and && operators:

$ test -z "$git_status_output" || git pull --auto-stash

This would just execute git pull if the result of test -z would be false - return status code ($?) different than 0 (success). As the shell already has conditions builtin the REPL, the test programs just have some handy operators:

-z for testing for empty strings and -n for non empty strings.
-f for existing file and -d for directories.
-lt and -le for "less than or equal".

How do you see other conditional operators? Since it is a program: man test.

What can you do with it?

This may look like "too much" at first glance but think about it: How many things you could automate since everything is a program and follows the same conventions?

If, for example, you have gh (GitHub CLI program) installed, you can clone all the repositories of an organisation with:

for repo in $(gh repo list --limit 200 --source --no-archived "$owner" | awk '{print $1 }')
do
  gh repo clone "$repo"
done

As long programs return text (spoiler alert: they will) you can compose them with other programs. If you need to transform text, for example, you have some great tools already available. Here are the ones I've used the most:

$ alias rank="sort | uniq -c | sort -nr"
$ alias second_column_only='awk "{ print \$2 }"'
$ alias top10="rank | head -n 10 | second_column_only"
$ history | second_column_only | top10
awk
column
sed
cut
cat
tr
split
mktemp
fg
z - (zoxide, this one needs installation)
fzf - (fuzzy finder, this too needs installation)

What seems like a limitation at first, the output is just text, is actually great software design. You will notice everything is already done for you: from getting the nth column of an output to splitting a huge file into smaller ones (with split).

What now?

Time to make your own history. Make sure it is configured right on your shell, I like to:

Keep it big, disk space is cheap. The default usually only holds a couple of hundred commands. I like to have it a lot. You can use CTRL-R to search and use it, since its output is text you can... you get the idea.
Ignore entries that start with space. You will always type something (e.g.: API Key) you don't want to keep saved in a file somewhere.

I know it is tempting to Google for one-liners and such, try not to. The best feature of a shell is to make it your own. Different from an IDE or GUI, it expects you to customize it: to make its output your own. So use it: find a pattern, create a shortcut to it and learn something new (man pages). All shells allow you to load custom files on startup, use them.

The shell is a program. If you are a programmer, make it a good one. The journey will teach you a lot.

Cleaning Quake server logs to generate score boards

Augusto Pascutti — Mon, 03 Mar 2025 21:29:09 +0000

It is a common challenge for technical interviews to parse Quake 3 server logs and display:

Players in a match
Player score card, listing player names and kill count:
1. Ignore <world> as a player
2. If <world> kills a player, add -1 to player's kill count
(optional) Group outputs above by match
(optional) Death cause report by match

Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so:

The language is built for (1) text matching and (2) manipulation.
Working with small files is as easy as it is working with very large files.

Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk.

The beginning of a not so usual program

As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain.
Here are the different programs we are going to create:

clean.awk will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use.
scoreboard.awk will use the output from the previous program to produce the score boards for each game.

Let's create a walking skeleton to run and debug our progress while tackling the challenge:

$ mkdir /tmp/awk-quake
$ cd !$
$ curl --remote-name -L https://gist.githubusercontent.com/augustohp/073936cc213fe96bc99a498932c18be7/raw/9e52e4da221f2f0ce1dfc11f57c1679a2cdb77f5/qgames.log
$ tail qgames.log
 13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET
 13:55 Exit: Fraglimit hit.
 13:55 score: 20  ping: 8  client: 3 Oootsimo
 13:55 score: 19  ping: 14  client: 6 Zeh
 13:55 score: 17  ping: 1  client: 2 Isgalamido
 13:55 score: 13  ping: 0  client: 5 Assasinu Credi
 13:55 score: 10  ping: 8  client: 4 Dono da Bola
 13:55 score: 6  ping: 19  client: 7 Mal
 14:11 ShutdownGame:
 14:11 ------------------------------------------------------------
$ cat clean.awk
{ print }
$ watch gawk -f clean.awk qgames.log

Above we:

Downloaded qgames.log
Created clean.awk that prints everything passed to it
Executed the program every couple of seconds (with watch) to see its result while we change it in another session (to stop watch, use CTRL-C)

Let's change clean.awk to filter just the lines useful to us, and help us debug what to do with them:

BEGIN {
    FS = " "
    LFS = "\n"
}
/Init/ { print }
/kill/ { debug_fields() }

function debug_fields()
{
    for (i = 1; i <= NF; i++) {
        printf("%d: %s\n", i, $i)
    }
}

Don't despair yet, it is pretty simple what we are doing:

BEGIN is a special block, that gets executed once at the start of the parsing:
1. We use it to (re-)define some special variables:
  1. FS defines the field separator (space). It is used to break a matching line into a smaller array of objects.
  2. LFS defines the line separator (new line). Everything until that character will be treated as a line.
/match/ { action } blocks execute a set of actions when a match (regex supported) is found:
1. /Init/ { print } prints every line that has Init on it, without doing anything more.
2. /kill/ { debug_fields() } executes the debug_fields() function for every line that has a matching kill string on it.
3. Every line that doesn't match the rules above is ignored.
function debug_fields() prints all fields identified after breaking the line with FS:
1. NF is a special variable containing the number of fields parsed for the current line.
2. $n is the field n parsed. Inside the loop $i will become $1,
  $2 and $3 allowing us to retrieve the contents of every field on that
  line, displaying something like:
```
1: 20:54                                     
2: Kill:
3: 1022
4: 2
5: 22:
6: <world>
7: killed
8: Isgalamido
9: by
10: MOD_TRIGGER_HURT
```
3. The output above is useful to debug the current line contents we can work with. Try changing debug_fields() action to print $6 " killed " $8.

With little changes, we can use $6 (killer) and $8 (killed) to display who killed who, which is pretty much everything we need.

🐛 If player names would not contain spaces we'd be ready. But Assassinu Credi, for example, breaks our algorithm because we use spaces to separate fields.
When he kills someone $8 will killed instead of the other player name.

Let's see this happening:

BEGIN {
    FS = " "
    LFS = "\n"
}
/Init/ { next }
/Assas/ { print $6 " killed " $8 }

The program above ignores (with next action) lines matching Init and prints just lines matching Assas.

$ awk -f clean.awk qgames.log
Zeh killed Assasinu
<world> killed Assasinu
Isgalamido killed Assasinu
Zeh killed Assasinu
Assasinu killed killed

Note that Assasinu killed killed line is wrong. It doesn't have the name of the killed player. Let's fix this!

Making things more reliable with regex

The end clean.awk program is below. It substitutes some strings by nfs (new file separator) variable and removes the prefix on lines that notifies of a kill:

BEGIN {
    FS = " "
    LFS = "\n"
    nfs = "|"
    current_game = 0
}
/Init/ { current_game++ }
/kill/ {
    sub(/^[ 0-9:]+ Kill: [0-9: ]+/, "", $0)
    sub(/ killed /, nfs, $0)
    sub(/ by /, nfs, $0)
    print $0 nfs current_game
}

On the BEGIN section, declares 2 new variables:
1. nfs to separate output by something other than spaces, so next programs easily support player names with them.
2. current_game is a variable that gets incremented every time a new game starts.
/Init/ marks a new game:
1. Increments the variable current_game for the next time it gets used
For every /kill/:
1. sub(regex, replacement, target) will put replacement into every matching regex on target, replacing target. $0 is the whole current line.
2. sub(/^[ ... removes the prefix of the line until the player name.
3. sub(/ by... and sub(/ killed... replaces these matches by nfs (the new field separator), allowing us to easily identify ($1) the killer, ($2) who got killed and ($3) how he got killed.
4. print will print the current line ($0) with the current game as a suffix:
  - As every sub() replaces the current line ($0), we now have only what we needed.
  - As awk programs operate on lines, it is easier to have everything we need on them. That is why we add current game to every line.

Executing the program above, produces:

$ awk -f clean.awk qgames.log | tee qgames-clean.log
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
Isgalamido|Mocinha|MOD_ROCKET_SPLASH|2
Isgalamido|Isgalamido|MOD_ROCKET_SPLASH|2
Isgalamido|Isgalamido|MOD_ROCKET_SPLASH|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_FALLING|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
Isgalamido|Mocinha|MOD_ROCKET|3
<world>|Zeh|MOD_TRIGGER_HURT|3

With the qgames-clean.log file we can now easily achieve every objective of the original challenge without having to deal with:

Unneeded context.
Space separators. With FS = "|" we use | as field separator and have:
- $1 as the killer
- $2 who got killed
- $3 how killer killed killed
- $4 in which game that happened
A "checkpoint". If the log changes format, or we discover a bug, as long we produce an output conforming the current format we are good to use the next programs.

Next steps

How about you try to figure out the rest? I will post my solution and, if you learned something from this, I promise you will learn something else on the next one as well.

The Gnu awk's manual is really good - from a time technical documents were worth reading. You don't need to read everything, the index will take you where you need. Pinky promise!

I won't leave you without anything though, here is a beginning for scoreboard.awk:

BEGIN {
    FS = "|"   
}
{
    # Sets a player as a key in the players array
    players[$1] = $1
    players[$2] = $2
}
END {
    # Removes <world> from players
    for (name in players) {
        if (name == "<world>")
            continue
        print name
    }
}

Let me know of your solution, suggestions or doubts in the comments! ❤️

Files with the most changes on Git repository

Augusto Pascutti — Sun, 26 Apr 2020 22:31:34 +0000

Reading the history of a repository is useful for multiple things. There are many ways go through it, below we will list the files with most changes and then filter changes made just on them:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r | head
$ git log --stat -- $(!!)

Other than $(!!) which tells bash to "run the latest successful command" (!!) inside a "sub-shell" ($()), I will detail what we executed below.

Using the history

You can read commits on the current branch using git log, but what if you focus on the changes of a single file?

$ git log -- src/The/Path/To/The/File/With/Most/Changes.js

Explaining the command above:

git log shows the changes, from the most recent to older, made to the repository. By default it displays only the commit message of each change;
-- tells Git to stop trying to parse options (stuff like -p or --reverse) and start parsing arguments. For git log arguments are paths of the repository (or or more);
src/The/Path/To/The/File/With/Most/Changes.js is a file that exists on our hypothetical repository, it makes git log filter changes affecting only that path. You could use other things, instead of a single path pointing to a file:
- src/** to filter changes made just to inside this path,
- *.txt to filter changes made to files with the txt extension

Focusing on one or more paths allows you to go deeper on the history of a single part of the project, which could provide a rough idea of what the team can achieve and on what period for example.

How to list files with the most changes?

You know how navigate the history of specific files, now we want to know which files changed the most on our repository. That can be achieved in 3 steps:

List files changed in a commit, for every commit;
Count how many times each file appears on that list;
Display only the top ones

List files changed in a commit

git log has the option --name-only which will display the path to all files changed in a commit. Formatting the commit message to an empty format will only display the files:

$ git log --name-only --pretty="format:"

If you try the command above, you will notice that for every commit an empty line appears. Those empty lines are the commit messages we removed, to get rid of that empty line we can sed '/^\s*$/', making the whole command:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d

Count how many times each file appears in the list

You can use uniq to avoid listing duplicate items, with -c as option you count their occurrences.

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | uniq -c

Since uniq only joins consecutive lines, we need to sort our list before passing it to uniq:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c

The output of the command above will be <count> <path>, so we can use sort with the --reverse option to display the files with the most occurrences:

 $ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r

Limiting output

We can used head to filter only the first lines, or tail to filter only the last ones. The -n <itens> tells how many occurrences we want to limit:

$ git log --name-only --pretty="format:" | sed '/^\s*$/'d | sort | uniq -c | sort -r | head -n 10

What else?

I usually limit changes made in the last year (`git log --since "1 year ago"). I use this every time I get in touch with a new team, allows me to get to know them better.

I also don't blindly go into the "most changed files" in the project. As I want to know more about the project and people, I try to focus on controllers or models first so I get a grasp on what kind of changes they suffer.

Do you think this will help you? In what way?