Home Posts Github

Importing CSV data as an Org Mode table for use in a Babel environment

The Premise

For a few years now I have used the Org Babel literate programming environment to record personal finances. At its base is a simple log of cashflows, as well as a record of what money we have, by asset class. From there I can run analyses and visualizations on the source and derived data. The “notebook” environment has been beneficial in allowing me to document the various derivative calcs, so as to understand them again in future.

The source data has been stored in Org tables, which are starting to become unwieldy. The table editor in Org Mode is cool, but it can feel very clunky to work with as the dataset increases; there is no filtering functionality, and the formulae are less understandable than a code block. There’s also the prudence in keeping source data in a more widely parseable format, and there is little better than CSV for this job. Getting an Org table out to a CSV file is as simple as putting point on the table and invoking M-x org-table-export. Org will write out to the file path you provide.

Now, reading a CSV file into a Babel environment is a trivial task in that you can open a Python code block and use the csv APIs. However, I want the source data imported into the environment as a data frame, without the overhead of pulling in pandas just to do this. When the data was stored within the Org file as a table, this was achieved by naming the table, which made it available as an input to subsequent code blocks via the :var header arg:

#+name: piggy-bank
|  Cash | Equities | Fixed Income |
|-------+----------+--------------|
| 10000 |    30000 |        20000 |
| 10000 |    33000 |        22000 |
| 11000 |    34000 |        23000 |

#+begin_src python :var piggy_bank=piggy-bank
  cash = piggy_bank[0] # read the data from the table as an array
#+end_src

So really, what I am after is for the data to be stored in a CSV file, pulled into the Org file as an Org table, then made available to any code blocks as needed. From the point of view of the derivative code blocks, there should be no idea that the source data has relocated. Time for some Elisp.

Reading the data back in

Emacs manages to provide an immensely powerful programming environment within itself, and then forces you to run round the houses to perform what seems the most menial task. I presumed, being a text editor, that Emacs would have a built-in function to read file contents into a value, but Emacs, being a text editor, doesn’t. Instead, you must insert the contents into a buffer, and then operate on or return the buffer’s value. It is very easy to forget that buffers are the stores of values in Emacs. Yes, you can have variables via setq, but quite often the environment will force you down the route of buffers.

Anyway, for this operation I don’t need a full-blown buffer, so we will insert the contents into a temp one:

(with-temp-buffer
    (insert-file-contents "piggy-bank.csv"))

Elisp will now see the contents of piggy-bank.csv as one big string, so to get it into an Org table, we’ll need to naively parse it. As I’m dealing with my own well-formatted numeric data, we can ignore quoted delimiters for this.

(defun parse-csv/naive (input)
  (mapcar
   (lambda (s) (string-split s ",")) ; iterate over each row string and create a list of cells
   (string-split input "\n"))) ; create a list of strings for each row

(with-temp-buffer
    (insert-file-contents "piggy-bank.csv")
    (parse-csv/naive (buffer-string)))

For the above table as CSV, Emacs would return the below

(("Cash" "Equities" "Fixed Income") ("10000" "30000" "20000") ("10000" "33000" "22000") ("11000" "34000" "23000"))

Formatting

Notice in the above list that all values are strings, which will mean needing to cast them inside any derived code blocks. It would be better to treat them as numbers before we output this list as an Org table. In Elisp we can do this using string-to-number, but we’ll need to skip the first element in the list, the header row. cdr is our friend here. Assuming the list above is called parsed-data:

(mapcar
 (lambda (x)
   (mapcar 'string-to-number x))
 (cdr parsed-data))

This will give us the list below, with the numeric values converted, and the “header” removed.

((10000 30000 20000) (10000 33000 22000) (11000 34000 23000))

We can then restore the headers where they belong by joining two lists together. One containing the car from the original parsed-data, the headers, followed by the numeric data.

(append
 (list (car parsed-data))
 (mapcar
  (lambda (x)
    (mapcar 'string-to-number x))
  (cdr parsed-data)))

(("Cash" "Equities" "Fixed Income") (10000 30000 20000) (10000 33000 22000) (11000 34000 23000))

We can now output the results of our code block to an Org table using the header args :results value table:

#+begin_src emacs-lisp :results value table
  (append
     (list (car parsed-data))
     (mapcar
      (lambda (x)
        (mapcar 'string-to-number x))
      (cdr parsed-data)))
#+end_src

#+RESULTS:
|  Cash | Equities | Fixed Income |
| 10000 |    30000 |        20000 |
| 10000 |    33000 |        22000 |
| 11000 |    34000 |        23000 |

From here the csv data are available for display within the target Org file, while also parsed as a multidimensional array for use as input to subsequent code blocks. This allows the use of appropriate tools for the task in hand: a widely-used text format, any arbitrary language for derived operations.