Publishing tabular data as blog post

CSV in many ways is for data what Markdown is for text documents: a very simple format that is both human- and machine-readable, and that – despite a number of shortcomings – is widely used. Given the popularity of Markdown for writing blog posts, using CSV to publish blog posts with tabular data should be an obvious thing to do, and we have just published our first blog post using CSV data. The blog post shows Table 3 from the DataCite Metadata Schema [@], describing the mandatory properties.

Periodic table of elements. From: Wikipedia

The DataCite blog uses the Jekyll static site generator, and all blog posts are written in Markdown format. All posts have their metadata in YAML format at the beginning of the file (separated by --- from the main text).

layout: post
title: Publishing tabular data as blog post
author: mfenner
 - csv
 - metadata
 - blog

Markdown is a nice format for writing texts, but doesn’t work so well for tabular data, as the current Markdown table implementations are difficult to edit and read for humans for all but the simplest tables. CSV is a much better fit for tabular data, and can be written both with a general text editor, or with a spreadsheet program or other specialized tool.

To add the metadata required for every Jekyll blog post we are again adding a YAML header, the resulting file format is CSVY, about which we have talked before [@]. Jekyll can be extended to understand many file formats beyond Markdown. As a CSVY converter doesn’t exist yet, we have written this converter and released jekyll-csvy as Ruby gem [@], so that CSVY support can be easily added to every Jekyll-powered blog.

In HTML tabular data are typically displayed as HTML tables, and this is what we are doing with the CSVY converter. This works well for tables that are not too wide, and the converter supports inline Markdown formatting (bold, italic, links, etc.) in table cells. Block formatting (e.g. lists) is on our list of future improvements, and we will polish the converter based on user feedback. We are of course also interested in embedding CSV tables within Markdown documents, as this is a common use case.

One important feature of using CSVY for blog posts is that the CSV remains available, and can be ingested and processed by tools that can read CSVY, e.g. using the R rio [@] package.


Martin Fenner
Technical Director at DataCite | Blog posts