sequential.dev

Using the IFS variable to do things with CSV files

February 05, 2020

The Problem

Earlier this week, I had to do a mass-insert into a database table from a CSV file.

The CSV file had ~20 columns and ~500k rows. I only really cared about 4 of the columns.

Visually, the problem can represented accurately as the following:

A diagram showing a CSV file and a database with witty labels

Previously I had seen this problem solved with massive amounts of over-engineering, and sometimes with clever usage of Microsoft Excel. However, in my never-ending quest to use the tools available to me more effectively. I challenged my self to see if I could generate the inserts I needed without leaving my terminal window.

First Steps

Some Example Data

For this post I am going to use the heart.csv data set sourced from Kaggle

It has the following (14) columns:

Enter bash and IFS

Let’s read this using a small script that leverages IFS:

We can use this script by running the following:

Which should yield:

Let’s further assume that we have a database schema DATA that contains a table called HEART in it.

The table was created with the following SQL statement:

Generating SQL Inserts

Let’s modify our pre-existing script to create a bunch of insert statements from our heart.csv file for this table:

We can now run this with a second parameter (the output file name):

./<SCRIPT_NAME> <IN_FILE_NAME> <OUT_FILE_NAME>

Which gives us something like this:

And you’re done! (kind of, you’d still have to actually run the INSERT script)

Caveats

This post doesn’t cover how to deal with internal commas and special character escaping weirdness. But it is a good starting point for a bare-bones CSV parser.

I might follow this up with another small post about using this script to easily pull out and aggregate different columns.

One of the things I’ve learned from using this approach to deal with CSV files is that the simplest tool is often the best and can yield great results without much overhead.

If you enjoyed this post or have any thoughts, feel free to comment or find me on Twitter.

Thanks for reading!

References

IFS Usage - https://bash.cyberciti.biz/guide/$IFS


A collection of thoughts from Shivan Moodley regarding software engineering, startups and products. Find more thoughts on Twitter.

© 2021