Ops Scripting with Perl: Frequency

Update: 2021年5月15日 updated for clarity

Automation with scripting languages is required core skills for operations oriented roles. Long ago, when Unix (later Linux) scripting involved shell programming and tools like , , , , , etc. and later Perl became the popular go-to language for systems administrators.

Perl is still popular today for systems automation and operating systems configuration. It cannot be removed from a Linux distro, for example, as many underlying automation depends on Perl to function. For these examples, Perl still outperforms other languages: 1100% faster than Python, 800% than faster than Ruby.

Below is a small problem and solution using a concept called a frequency hash or frequency counting using a hash. Two interesting features worthy of attention is that Perl can automatically initialize variables (Perl has three types: scalars, lists, and hash), when first used, and that Perl can self-modify data structures as it is creating them in memory. This is something that even Python and Ruby cannot do.

The Problem

The problem is to print a formatted summary of data from a file, whose records (lines) are fields (columns) separated by colons (). The summary of the number of users that use a particular shell.

In Perl, the data structure used to capture this will be a frequency hash called to store the counts.

The Data

Here’s the local copy of file used for this exercise:

The Output

The output given the report given the data from above would have these counts:

Shell Summary Report:
==================================================
Shell # of Users
----------------- ------------
/bin/bash 3 users
/bin/false 7 users
/bin/sync 1 users
/usr/sbin/nologin 17 users

The Starting Code

Here’s some initial starting code to get you started, which, given a hash of , prints out a formatted report.

Output Report with %count hash

For those that are familiar with the C Programming Language function , this should not be unfamiliar. Perl also has a repetition operator to repeat several characters.

What might be a surprise for those unfamiliar with Perl is inline loop or a loop in a single line (line #9). This is quite common, for a loop that has a single line. This can be unfolded into the following:

foreach (sort keys %counts) {
printf "%-17s %3d users\n", $_, $counts{$_};
}

Other Perl language features include:

  • implicit scalar that is used as a default index.
  • passing a parameters after a space instead being enclosed in parenthesis

If we used an explicit scalar and and parenthesis, this could be rewritten as:

foreach $shell (sort keys %counts) {
printf("%-17s %3d users\n", $shell, $counts{$shell});
}

The Solutions

Unlike the previous articles I wrote for Ruby and Python, I decided to put this in a single article. I assume most of the audience is looking at this article out of curiosity to see how Perl compares to either Ruby or Python.

Solution 1: Slicing Shell First

Here’s a common method to open a file, check for an error, and then use the spaceship operator to return a list to a conditional loop.

Each iteration of the loop will have the line available as the implicit scalar.

Slice out the Shell

For each cycle of this loop, we process the each line saved into the scalar:

  1. off the newline character from the line
  2. the line into a anonymous list
  3. list slice off the 7th item (indexed by ) and save it as .
  4. conditionally increment frequency index by .

For step four item, a single line is compounding at least three operations:

$counts{$shell}++ if defined $shell

This single line will do the following:

  1. Test if is valid and not an undefined value
  2. Fetch the hash data indexed by returning if it is already set.
  3. Increment the existing hash data value by

Perl is kind enough to use an intelligent default of if we have not initialized the value. This saves unnecessary branch logic to check if key exists before, and initialize it to a default, as one has to do with Ruby, Python, and several languages.

Solution 2: Regex Filter

This is similar to the problem before, except that we check to see if we have a valid string first using a regex (regular expression) to filter out invalid lines. The default scratch variable is used here to represent the line.

Regex to filter out invalid lines

Because we filter out invalid lines that do not have a shell specified, we can just slice out the shell key and both create the key-value pair as well as increment it in one line, essentially doing at least four operations in one line:

$counts{(split /:/)[6]}++ if ($_ !~ /:$/);

This would be the same as this:

if ($line !~ /:$/) {
my $shell = (split($line,/:/))[6];
$counts{$shell}++;
}

Solution 2: Functional Style

Instead of using a conditional loop with this solution, we use a function pipeline to feed into and .

Grep and Map

This will run the following pipeline logic:

lines_from_file | filter_valid_lines | extract_shell | build_hash

Alright, so this is something truly magical, and may need some explaining. We’ll dissect each link of this pipeline, as if it was written like this:

my @valid_lines = grep { $_ !~ /:$/  } (<PASSWD_FILE>);
my @shell_list = map { chomp; (split /:/)[6] } @valid_lines;
%counts = map { $_ => $counts{$_}++ } @shell_list;

The first part of the link, reading from back to front, is the filtering out bad lines, that is, ones that do not have a 7th column and end with a colon character:

@valid_lines = grep { $_ !~ /:$/ } (<PASSWORD_FILE>};

The next pipeline link transforms the incoming items to just the shell column:

@shell_list = map { chomp; (split /:/)[6] } @valid_lines;

For the Perl , it is using the default fed into this code block, and thus could be rewritten as this to be explicit:

@shell_list = map { chomp; (split(/:/,$_))[6] } @valid_lines;

In the last link, we output a hash by specifying a key-value pair, instead of a regular single element.

Perl will see that you are using key-value pair and know this map is producing a hash, and not a list. In other languages, you would need to explicitly coerce it to a hash (or dictionary).

Thus we are essentially doing this:

my %hash = map { $key => $value } @list

Here are some examples of using this method:

my %squares = map { $_ => $_**2 } @numbers
my %capitals = map { $_ => uc $_ } @words

So in our example, we produce pairs and send them to our hash:

%counts = map { $_ => $counts{$_}++ } @shell_list;

But wait, there’s something else interesting happening here, see if you can catch it?

Here, map is dynamically building a hash as it is processing, which overwrites the hash.

The Conclusion

I wanted to give people a sense of the richness of Perl and show how it may compare to Ruby and Python for similar problems, as well as offer an introduction to Perl.

These are some general takeaways for the language and this tutorial:

  • Variables in Perl are , , or and will default to or empty string (in case that was not obvious).
  • The variable can return strings, integers, or floats depending on how it is used, e.g. .
  • A implicit scratch scalar variable is used in looping constructs and is the default variable input for many functions or operators in Perl.
  • Formatted output with and repetition operator
  • Opening a file with and handling errors with
  • Spaceship operator to generate lines from a file handle
  • Splitting strings into lists with
  • Slicing off list elements
  • Regex match operators with and
  • filtering elements with
  • transforming elements with
  • building hashes with

Linux NinjaPants Automation Engineering Mutant — exploring DevOps, o11y, k8s, progressive deployment (ci/cd), cloud native infra, infra as code