ScrollSets: source code for CSVs

More examples of ScrollSets from sets.scroll.pub.

April 21, 2024 โ€” The source code for this blog post contains a ScrollSet about the planets and generates this HTML file as well as a CSV, a TSV, and a JSON file. This page demonstrates ScrollSets.

ScrollSets are useful for small single day projects and large multi-year projects with thousands of concepts like PLDB (a Programming Language Database).

*

ScrollSets are normal plain text files written in Scroll that also contain measurements of concepts and output that data into formats ready for data visualization and analysis tools.

ScrollSets are line oriented but represent a table(s). You might call them deconstructed csvs or deconstructed spreadsheets.

Quick Code Example:

This ScrollSets has 2 measures (columns) and 2 concepts (rows). Documentation, column definitions, rows and *any notes/markup/content* can go in the same file. # Measures (aka Header, aka Columns, aka Schema) idParser // Every concept needs an "id". extends abstractIdParser moonsParser extends abstractIntegerParser # Concepts (aka Rows) id mars moons 2 // I verified moon count with Google. - BY id jupiter moons 63 // Note: the moons of Jupiter have their own Wikipedia Page https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter writeConcepts demo.csv

The code above generates an HTML page and demo.csv that contains this:

id,moons mars,2 jupiter,63

Overview:

How to use

FAQ

Isn't the better idea to enhance existing spreadsheet GUIs with LLM generation capabilities?

Almost certainly. Using ScrollSets will be much slower and worse than future spreadsheet apps with carefully crafted LLM integrations.

However, it's important to also have simple, lower tech, timeless tools and ScrollSets is one of those.

Can't you do this same thing with YAML and/or Markdown?

Yes! You can easily achieve the same thing as LLMs & ScrollSets using LLMs & YAML, or LLMs & YAML & Markdown.

For YAML, just put your documentation and schema in YAML comments up top and then have a tiny script to read that YAML and dump CSV/TSV/JSON or whatever. YAML gives you loads of data structures to use and is widely supported in many languages. But generating HTML from the same file would require more work.

If you want to intermix markup content with your data, you can use Markdown to add the marked up content and then have code sections embedding the YAML and a tiny script to parse out those YAML blocks and write your data to disk.

So, why use Scroll for storing data instead of YAML?

Either can do the job. I expect the Scroll design to end up being more ergonomic, but that might not be true or may be unimportant.

If you don't like Scroll's (evolving) version and want to switch it will always be straightforward to automatically refactor to YAML.

What other related work is out there?

This is a simple pattern to implement, so I'm sure it is likely it has been done a few times before. Please let me know so I can include links to--and learn from--any other prior art.

What are the advanced features?

+ Planned.

What is the origin of ScrollSets?

LLM dataset generation is a major breakthrough in datasets. ScrollSets are, at best, a minor improvement. They are designed to work alongside LLMs to help solve the Dataset Needed problem.

ScrollSets evolved out of TrueBase. ScrollSets have eliminated the need for the TrueBase software (and existing TrueBase sites should be migrated to ScrollSets), but were informed by the TrueBase build experience.

Although ScrollSets are designed for a world with LLMs, the design is meant to be useful without them as well, and would also have been mildly useful 30 years ago.

What were the design goals?

Why are measures and concepts root-level features and not indented?

The normal way to implement this in Scroll would be something like:

measures id string moons int concept id mars moons 2 concept id jupiter moons 63

The flat design was chosen for ergonomic reasons. ScrollSets seem like they might be useful enough to be worth breaking from Scroll convention a bit. Like all things in Scroll, ScrollSets are an experiment, and maybe this design will evolve.

Extended Example: a Planets ScrollSet

Below is the ScrollSet embedded in this Scroll file.

id moons wikipedia aka diameter surfaceGravity age yearsToOrbitSun hasLife
Mars 2 6794 4 1.881
Jupiter 63 142984 25 11.86
Earth 1 https://en.wikipedia.org/wiki/Earth Pale Blue Dot 12756 10 4500000000 1 true
Mercury 0 4879 4 0.241
Saturn 64 120536 9 29.46
Uranus 27 51118 8 84.01
Venus 0 12104 9 0.615
Neptune 14 49572 11 164.79

Measurements of the measures

Name Values Coverage Question Example Type Source SortIndex IsComputed IsRequired
id 8 100% What is the ID of this concept? Mars string 1 false true
moons 8 100% How many moons does the planet have? 2 number 1.1 false true
wikipedia 1 12% URL to the Wikipedia page. https://en.wikipedia.org/wiki/Earth string 1.9 false
aka 1 12% What are the alternative names for the planet? Pale Blue Dot string 1.9 false
diameter 8 100% What is the diameter of the planet? 6794 number 1.9 false
surfaceGravity 8 100% What is the surface gravity of the planet? 4 number 1.9 false
age 1 12% How old is this planet? 4500000000 number 1.9 false
yearsToOrbitSun 8 100% How many Earth years does it take for the planet to orbit the Sun? 1.881 number 1.9 false
hasLife 1 12% Does this planet have life? true boolean 1.9 false

Extended Measures Example

idParser extends abstractIdParser diameterParser extends abstractIntegerMeasureParser description What is the diameter of the planet? surfaceGravityParser extends abstractIntegerMeasureParser description What is the surface gravity of the planet? yearsToOrbitSunParser extends abstractFloatMeasureParser description How many Earth years does it take for the planet to orbit the Sun? moonsParser extends abstractIntegerMeasureParser description How many moons does the planet have? boolean isMeasureRequired true float sortIndex 1.1 akaParser extends abstractStringMeasureParser description What are the alternative names for the planet? ageParser extends abstractIntegerMeasureParser description How old is this planet? hasLifeParser extends abstractBooleanMeasureParser description Does this planet have life? wikipediaParser extends abstractUrlMeasureParser description URL to the Wikipedia page. // end measures

Extended Concepts Example

id Mars moons 2 // Til Mars has 2 moons! diameter 6794 surfaceGravity 4 yearsToOrbitSun 1.881 id Jupiter moons 63 // The moons of Jupiter have their own Wikipedia Page https://en.wikipedia.org/wiki/Moons_of_Jupiter moons of Jupiter diameter 142984 surfaceGravity 25 yearsToOrbitSun 11.86 id Earth moons 1 diameter 12756 surfaceGravity 10 yearsToOrbitSun 1 aka Pale Blue Dot hasLife true wikipedia https://en.wikipedia.org/wiki/Earth age 4500000000 // Note: It was only during the 19th century that geologists realized Earth's age was at least many millions of years. id Mercury moons 0 diameter 4879 surfaceGravity 4 yearsToOrbitSun 0.241 id Saturn moons 64 diameter 120536 surfaceGravity 9 yearsToOrbitSun 29.46 id Uranus moons 27 diameter 51118 surfaceGravity 8 yearsToOrbitSun 84.01 id Venus moons 0 diameter 12104 surfaceGravity 9 yearsToOrbitSun 0.615 id Neptune moons 14 diameter 49572 surfaceGravity 11 yearsToOrbitSun 164.79 // end concepts

View source