<>

Data Science with Scroll

by Breck Yunits

January 6, 2025

A Tutorial

This tutorial will walk you through how to use Scroll for data analysis and visualization, from basic concepts to advanced techniques.

What makes Scroll great for data science?

Scroll combines the simplicity of markdown-style syntax with powerful data transformation and visualization capabilities. You can:

Let's dive in!


Part 1: Getting Started with Data

Loading Sample Datasets

Scroll comes with several sample datasets. Let's start with the famous iris dataset:

iris printTable
sepal_length sepal_width petal_length petal_width species
6.1 3 4.9 1.8 virginica
5.6 2.7 4.2 1.3 versicolor
5.6 2.8 4.9 2 virginica
6.2 2.8 4.8 1.8 virginica
7.7 3.8 6.7 2.2 virginica
5.3 3.7 1.5 0.2 setosa
6.2 3.4 5.4 2.3 virginica
4.9 2.5 4.5 1.7 virginica
5.1 3.5 1.4 0.2 setosa
5 3.4 1.5 0.2 setosa

You can also load datasets from Vega's collection:

sampleData zipcodes.csv limit 0 5 printTable
"{

Folder Not Found

" {
"{

The folder ""ohayo.scroll.pub"" does not exist on this ScrollHub instance.

" "{

If you'd like to create this folder"`, d3.autoType) const get = (col, index ) => col !== "undefined" ? col : (index === undefined ? undefined : Object.keys(data[0])[index]) document.querySelector("#plot44").append(Plot.plot({ title: "Maximum Temperature in Seattle", subtitle: "", caption: "", symbol: {legend: false}, color: {legend: true}, grid: true, marks: [Plot.line(data, { x: get("date", 0), y: get("temp_max", 1), stroke: "steelblue", fill: get("undefined"), strokeWidth: 2, strokeLinecap: "round" })], width: 640, height: 400, })) } loadChart() }

Bar Charts

Let's create a bar chart showing precipitation:

sampleData seattle-weather.csv groupBy weather reduce precipitation mean precip_avg barchart x weather y precip_avg fill teal title Average Precipitation by Weather Type

Part 3: Advanced Data Transformations

Grouping and Aggregation

Let's look at some more complex transformations:

sampleData weather.csv groupBy weather reduce temp_max mean avg_max_temp reduce temp_min mean avg_min_temp orderBy -avg_max_temp printTable
count avg_max_temp avg_min_temp
13 NaN NaN

Creating New Columns

Let's add some computed columns:

iris compute ratio {sepal_length}/{sepal_width} where ratio > 2 printTable
sepal_length sepal_width petal_length petal_width species ratio
6.1 3 4.9 1.8 virginica 2.033333333333333
5.6 2.7 4.2 1.3 versicolor 2.074074074074074
6.2 2.8 4.8 1.8 virginica 2.2142857142857144
7.7 3.8 6.7 2.2 virginica 2.0263157894736845

Part 4: Advanced Visualizations

Heatmaps

Let's create a heatmap of annual precipitation values:

sampleData seattle-weather.csv splitYear groupBy year reduce precipitation mean precipitation_mean select year precipitation_mean transpose heatrix

Multiple Views

You can create multiple visualizations:

iris scatterplot x sepal_length y sepal_width fill species barchart x species y sepal_length fill teal title Sepal Length by Species

Conclusion

This tutorial covered the basics of data science with Scroll. Some key takeaways:

  • Scroll makes it easy to load and manipulate data
  • Visualizations are simple to create and customize
  • Complex transformations can be done with simple commands
  • Everything is readable and version-controllable