Using the IFS variable to do things with CSV filesprogrammingbashold
The Problem #
Earlier this week, I had to do a mass-insert into a database table from a CSV file.
The CSV file had ~20 columns and ~500k rows. I only really cared about 4 of the columns.
Visually, the problem can represented accurately as the following:
Previously I had seen this problem solved with massive amounts of over-engineering, and sometimes with clever usage of Microsoft Excel. However, in my never-ending quest to use the tools available to me more effectively. I challenged my self to see if I could generate the inserts I needed without leaving my terminal window.
First Steps #
Some Example Data #
For this post I am going to use the heart.csv data set sourced from Kaggle
It has the following (14) columns:
Enter bash and IFS #
Let's read this using a small script that leverages
We can use this script by running the following:
Which should yield:
Let's further assume that we have a database schema DATA that contains a table called HEART in it.
The table was created with the following SQL statement:
Generating SQL Inserts #
Let's modify our pre-existing script to create a bunch of insert statements from our
heart.csv file for this table:
We can now run this with a second parameter (the output file name):
./<SCRIPT_NAME> <IN_FILE_NAME> <OUT_FILE_NAME>
Which gives us something like this:
And you're done! (kind of, you'd still have to actually run the INSERT script)
This post doesn't cover how to deal with internal commas and special character escaping weirdness. But it is a good starting point for a bare-bones CSV parser.
I might follow this up with another small post about using this script to easily pull out and aggregate different columns.
One of the things I've learned from using this approach to deal with CSV files is that the simplest tool is often the best and can yield great results without much overhead.
If you enjoyed this post or have any thoughts, feel free to comment or find me on Twitter.
Thanks for reading!
IFS Usage - https://bash.cyberciti.biz/guide/$IFS