April 21, 2024 โ The source code for this blog post contains a dataset about the planets and generates this HTML file as well as a CSV, a TSV, and a JSON file. It demonstrates Scroll Datasets.
Scroll Datasets are normal plain text blog posts written in Scroll that also contain structured data and output that data into formats ready for data visualization and analysis tools.
Scroll Datasets are line oriented but represent a table(s). You might call them deconstructed csvs or deconstructed spreadsheets.
This dataset has 2 measures (columns) and 2 concepts (rows).
Documentation, column definitions, rows and *any notes/markup/content* can go in the same file.
# Measures (aka Header, aka Columns, aka Schema)
id: string
moons: int
# Concepts (aka Rows)
::
id: mars
moons: 2
I verified moon count with Google. - BY
::
id: jupiter
moons: 63
The moons of Jupiter have their own Wikipedia Page
https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter
::
writeDataset demo.csv
demo.csv
that contains this:id,moons
mars,2
jupiter,63
::
.::
) and are written like: appeared: int
appeared: 2024
Almost certainly. Using Scroll for datasets will be much slower and worse than future spreadsheet apps with carefully crafted LLM integrations.
However, it's important to also have simple, lower tech, timeless tools and Scroll Datasets is one of those.
Yes! You can easily achieve the same thing as LLMs & Scroll Datasets using LLMs & YAML, or LLMs & YAML & Markdown.
For YAML, just put your documentation and schema in YAML comments up top and then have a tiny script to read that YAML and dump CSV/TSV/JSON or whatever. YAML gives you loads of data structures to use and is widely supported in many languages. But generating HTML from the same file would require more work.
If you want to intermix markup content with your datasets, you can use Markdown to add the marked up content and then have code sections embedding the YAML and a tiny script to parse out those YAML blocks and write your data to disk.
Either can do the job. I expect the Scroll design to end up being more ergonomic, but that might not be true or may be unimportant.
If you don't like Scroll's (evolving) version and want to switch it will always be straightforward to automatically refactor to YAML.
This is a simple pattern to implement, so I'm sure it is likely it has been done a few times before. Please let me know so I can include links to--and learn from--any other prior art.
+ Planned.
LLM dataset generation is a major breakthrough in datasets. Scroll Datasets are, at best, a minor improvement. They are designed to work alongside LLMs to help solve the Dataset Needed problem.
Scroll Datasets evolved out of TrueBase. Scroll Datasets have eliminated the need for the TrueBase software (and existing TrueBase sites should be migrated to Scroll Datasets), but were informed by the TrueBase build experience.
Although Scroll Datasets are designed for a world with LLMs, the design is meant to be useful without them as well, and would also have been mildly useful 30 years ago.
import
keyword).The normal way to implement this in Scroll would be something like:
measures
id string
moons int
concept
id mars
moons 2
concept
id jupiter
moons 63
The flat design was chosen for ergonomic reasons. Datasets seem like they might be useful enough to be worth breaking from Scroll convention a bit. Like all things in Scroll, Datasets are experiment, and maybe this design will evolve.
Below is the dataset embedded in this Scroll file.
id | title | diameter | surfaceGravity | yearsToOrbitSun | moons | aka |
---|---|---|---|---|---|---|
mars | Mars | 6794 | 4 | 1.881 | 2 | |
jupiter | Jupiter | 142984 | 25 | 11.86 | 63 | |
earth | Earth | 12756 | 10 | 1 | 1 | Pale Blue Dot |
mercury | Mercury | 4879 | 4 | 0.241 | 0 | |
saturn | Saturn | 120536 | 9 | 29.46 | 64 | |
uranus | Uranus | 51118 | 8 | 84.01 | 27 | |
venus | Venus | 12104 | 9 | 0.615 | 0 | |
neptune | Neptune | 49572 | 11 | 164.79 | 14 |
What is the diameter of the planet?
What is the surface gravity of the planet?
How many Earth years does it take for the planet to orbit the Sun?
How many moons does the planet have?
What are the alternative names for the planet?
The moons of Jupiter have their own Wikipedia Page
Note: It was only during the 19th century that geologists realized Earth's age was at least many millions of years.