Ops Scripting w. Python: Frequency 2
Updated: 2020–05–24 for clarity
Previously I presented the problem on how to use count frequency using a frequency hash, or
dict in Python.
In this article, I will present solutions and discussion about Python language features.
These solutions will use a collection loop
for. I call them collection loop as the loop construct iterates over a collection, which in our case a collection of lines from the
Solution 1: Basic Collection loop
We open the file and iterate line by line in this example. To keep things simple, we will not handle errors:
For every line, we only care about the shell and without a newline character polluting our string. So we do a few operations, strip off the newline, split the string into a list, and a list slice. This can be broken up into these steps:
line = line.rstrip() # strip newline
line_items = line.split(':') # split up line by ':' divider
shell = line_items # slice off 7th item
This can all be done in a single line.
shell = line.rstrip().split(':')
Now that we have a have the shell, we need to check if we actually got a shell. Sometimes, though rarely, there may not actually be a shell defined for that user.
# do stuff with that shell as a key
Each item in the
counts dictionary will have a key that represents the shell, and a value that represents frequency of shell used in our data file
We simply need to increment the value. As Python does not initialize values when first used, we have to do this manually.
# initialize new key if key doesn't exist in dict
if shell not in counts:
counts[shell] = 0
# increment the count
counts[shell] += 1
Solution 2: Dict get Method
Instead conditionally setting the frequency count, we can use the get method that comes with the
dict class. This will return a default value if the key is not found, which should be
0, or it will return the value. Either way, we increment the value by one to increase the count.
Solution 3: DefaultDict
Another method is to just auto-initialize all keys that are referenced for the first time to 0 with
dict subclass called defaultdict from the
With this, python now behaves like other languages, but is more powerful as we can control the behavior of the default with a custom lambda.
From these solutions, the you should have picked up the following takeaways for Python:
- Collection Loop (
- Splitting a String
- List Slicing (or indexing in this case)
- Testing variable is initialized
- 3 ways to initialize default value in dict class
In the next article, I will show how to use lambda and dict comprehensions to solve the same problem.