Using the IFS variable to do things with CSV files
February 05, 2020
Earlier this week, I had to do a mass-insert into a database table from a CSV file.
The CSV file had ~20 columns and ~500k rows. I only really cared about 4 of the columns.
Visually, the problem can represented accurately as the following:
Previously I had seen this problem solved with massive amounts of over-engineering, and sometimes with clever usage of Microsoft Excel. However, in my never-ending quest to use the tools available to me more effectively. I challenged my self to see if I could generate the inserts I needed without leaving my terminal window.
Some Example Data
It has the following (14) columns:
Enter bash and IFS
Let’s read this using a small script that leverages
We can use this script by running the following:
Which should yield:
Let’s further assume that we have a database schema DATA that contains a table called HEART in it.
The table was created with the following SQL statement:
Generating SQL Inserts
Let’s modify our pre-existing script to create a bunch of insert statements from our
heart.csv file for this table:
We can now run this with a second parameter (the output file name):
./<SCRIPT_NAME> <IN_FILE_NAME> <OUT_FILE_NAME>
Which gives us something like this:
And you’re done! (kind of, you’d still have to actually run the INSERT script)
This post doesn’t cover how to deal with internal commas and special character escaping weirdness. But it is a good starting point for a bare-bones CSV parser.
I might follow this up with another small post about using this script to easily pull out and aggregate different columns.
One of the things I’ve learned from using this approach to deal with CSV files is that the simplest tool is often the best and can yield great results without much overhead.
If you enjoyed this post or have any thoughts, feel free to comment or find me on Twitter.
Thanks for reading!
IFS Usage - https://bash.cyberciti.biz/guide/$IFS