Over a million developers have joined DZone.
Platinum Partner

csvfilter - A Python Command-line Tool for Manipulating CSV Data

· Web Dev Zone

The Web Dev Zone is brought to you in partnership with Mendix. Download this Forrester Report to gain a better understanding of the low-code platform market and how to make a strategic platform selection you won’t regret.

Problem

You want a unix-like tool for manipulating CSV data from the command-line.

The standard tools cut and awk aren't always suitable as they don't handle quoting and escaping which are common in CSVs.

Solution

Use the CSV manipulation function csvfilter, a simple Python library I've put together.

Install with:

pip install csvfilter

Sample usage:

# Pluck columns 2, 5 and 6
cat in.csv | csvfilter -f 2,5,6 > out.csv

# Pluck all columns except 4
cat in.csv | csvfilter -f 4 -i > out.csv

# Skip header row
cat in.csv | csvfilter -s 1 > out.csv

# Work with pipe-separated data
cat in.csv | csvfilter -s 1,3 --delimiter="|" > out.csv

The above examples show csvfilter processing sys.STDIN but it can also act directly on a file:

csvfilter -f 2,5,6 in.csv  > out.csv

Help:

$ csvfilter --help
Usage: csvfilter [options]

Options:
-h, --help            show this help message and exit
-f FIELDS, --fields=FIELDS
                        Specify which fields to pluck
-s SKIP, --skip=SKIP  Number of rows to skip
-d DELIMITER, --delimiter=DELIMITER
                        Delimiter of incoming CSV data
-i, --inverse         Invert the filter - ie drop the selected fields
--out-delimiter=OUT_DELIMITER
                        Delimiter to use for output
--out-quotechar=OUT_QUOTECHAR
                        Quote character to use for output

There is also a simple python API that allows you to add validators to determine which rows are filtered out:

from csvfilter import Processor

def contains_cheese(row):
    return 'cheese' in row

processor = Processor(fields=[1,2,3])
processor.add_validator(contains_cheese)
generator = processor.process(sys.stdin)

for cheesy_row in generator:
    do_something(cheesy_row)

Discussion

It's possible to do basic CSV manipulation from the command-line using cut or awk - for example:

cat in.csv | cut -d "," -f 0,1,2 > out.csv

or :

cat in.csv | awk 'BEGIN {FS=","} {print $1,$2,$3}' > out.csv

However neither cut or awk make it easy to handle CSVs with escaped characters - hence the motivation for this tool.

I'm not the first to write such a utility - there are several others out there (although none had quite the API I was looking for):

Source available on Github.

 

The Web Dev Zone is brought to you in partnership with Mendix. Better understand the aPaaS landscape and how the right platform can accelerate your software delivery cadence and capacity with the Gartner 2015 Magic Quadrant for Enterprise Application Platform as a Service.

Topics:

Published at DZone with permission of David Winterbottom , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}