Different Approaches to Filtering a CSV file

Using the best tool you have for the job

Courtney Zhan
2 min readNov 24, 2024

At work, I was tasked to extract a column data from a CSV File, like below:

A sample enrolments table

The purpose was to get the number of unique course IDs that was being offered. For the example above, there are three (1,2 and 3).

I used an one-line command to accomplish the task:

% cut -f 1 -d "," enrolments.csv | tail -n+2 $1 | sort | uniq | wc -l

The output is 3.

Script Design

  1. Remove the header line (CourseID …)
    This can be achieved using unix’s tail n+2 <file>
  2. Extract the first column only
    Using unix command cut -f 1 -d “,”. -f is the column number to keep (1 => first column), -d specifies the delimiter (in a CSV, a comma).
  3. Remove the duplicate rows
    In unix use sort followed by uniq. uniq only removes the neighbouring duplicates, so you must sort first.
  4. Count the result
    In unix use wc -l. -l means count the number of lines returned.

The unix pipes (|), pass the result to the next command.

Alternative Approaches

A junior developer on our team also had the same task, and they wrote a Python script to remove the duplicates and count the course IDs. This was not a bad solution; we went to the same university and I fully understand why they took this approach. And a senior engineer on our team that saw my approach recommended that I try out xcut (an enhanced cut) instead as well!

My father told me story about a .NET project he worked on. The tech lead, a senior .NET contractor, proposed a solution for a task that could have been completed with a simple one-line shell command. Instead, the suggestion was to develop a C# application that would allow users to select an input CSV file, process it, and save the output to a destination file!

This story tells us that software engineers, like many other professionals other fields, should have various tools in their tool box, and use the best one for job.

--

--

No responses yet