This tutorial will walk you through how to use Scroll for data analysis and visualization, from basic concepts to advanced techniques.
Scroll combines the simplicity of markdown-style syntax with powerful data transformation and visualization capabilities. You can:
Let's dive in!
Scroll comes with several sample datasets. Let's start with the famous iris dataset:
iris
printTable
sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|
6.1 | 3 | 4.9 | 1.8 | virginica |
5.6 | 2.7 | 4.2 | 1.3 | versicolor |
5.6 | 2.8 | 4.9 | 2 | virginica |
6.2 | 2.8 | 4.8 | 1.8 | virginica |
7.7 | 3.8 | 6.7 | 2.2 | virginica |
5.3 | 3.7 | 1.5 | 0.2 | setosa |
6.2 | 3.4 | 5.4 | 2.3 | virginica |
4.9 | 2.5 | 4.5 | 1.7 | virginica |
5.1 | 3.5 | 1.4 | 0.2 | setosa |
5 | 3.4 | 1.5 | 0.2 | setosa |
You can also load datasets from Vega's collection:
sampleData zipcodes.csv
limit 0 5
printTable
zip_code | latitude | longitude | city | state | county |
---|---|---|---|---|---|
501 | 40.922326 | -72.637078 | Holtsville | NY | Suffolk |
544 | 40.922326 | -72.637078 | Holtsville | NY | Suffolk |
601 | 18.165273 | -66.722583 | Adjuntas | PR | Adjuntas |
602 | 18.393103 | -67.180953 | Aguada | PR | Aguada |
603 | 18.455913 | -67.14578 | Aguadilla | PR | Aguadilla |
Let's explore some basic operations on the iris dataset:
iris
summarize
printTable
name | type | incompleteCount | uniqueCount | count | sum | median | mean | min | max | mode |
---|---|---|---|---|---|---|---|---|---|---|
sepal_length | number | 0 | 8 | 10 | 57.699999999999996 | 5.6 | 5.77 | 4.9 | 7.7 | 5.6 |
sepal_width | number | 0 | 8 | 10 | 31.599999999999998 | 3.2 | 3.1599999999999997 | 2.5 | 3.8 | 2.8 |
petal_length | number | 0 | 8 | 10 | 39.8 | 4.65 | 3.9799999999999995 | 1.4 | 6.7 | 4.9 |
petal_width | number | 0 | 7 | 10 | 13.699999999999996 | 1.75 | 1.3699999999999997 | 0.2 | 2.3 | 0.2 |
species | string | 0 | 3 | 10 | virginica |
This gives us summary statistics for each column.
Let's look at filtering:
iris
where species = setosa
printTable
sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|
5.3 | 3.7 | 1.5 | 0.2 | setosa |
5.1 | 3.5 | 1.4 | 0.2 | setosa |
5 | 3.4 | 1.5 | 0.2 | setosa |
Let's start with a simple scatterplot of the iris data:
iris
scatterplot
x sepal_width
y sepal_length
title Sepal Length vs Width
fill species
Let's look at some time series data:
sampleData seattle-weather.csv
parseDate date
linechart
x date
y temp_max
title Maximum Temperature in Seattle
stroke steelblue
Let's create a bar chart showing precipitation:
sampleData seattle-weather.csv
groupBy weather
reduce precipitation mean precip_avg
barchart
x weather
y precip_avg
fill teal
title Average Precipitation by Weather Type
Let's look at some more complex transformations:
sampleData weather.csv
groupBy weather
reduce temp_max mean avg_max_temp
reduce temp_min mean avg_min_temp
orderBy -avg_max_temp
printTable
count | weather | avg_max_temp | avg_min_temp |
---|---|---|---|
129 | drizzle | 18.555813953488368 | 10.143410852713178 |
459 | rain | 15.535294117647041 | 9.04727668845315 |
1674 | sun | 18.064157706093184 | 8.87275985663083 |
78 | snow | 4.528205128205127 | -1.4346153846153844 |
582 | fog | 15.261855670103111 | 8.527319587628869 |
Let's add some computed columns:
iris
compute ratio {sepal_length}/{sepal_width}
where ratio > 2
printTable
sepal_length | sepal_width | petal_length | petal_width | species | ratio |
---|---|---|---|---|---|
6.1 | 3 | 4.9 | 1.8 | virginica | 2.033333333333333 |
5.6 | 2.7 | 4.2 | 1.3 | versicolor | 2.074074074074074 |
6.2 | 2.8 | 4.8 | 1.8 | virginica | 2.2142857142857144 |
7.7 | 3.8 | 6.7 | 2.2 | virginica | 2.0263157894736845 |
Let's create a heatmap of annual precipitation values:
sampleData seattle-weather.csv
splitYear
groupBy year
reduce precipitation mean precipitation_mean
select year precipitation_mean
transpose
heatrix
You can create multiple visualizations:
iris
scatterplot
x sepal_length
y sepal_width
fill species
barchart
x species
y sepal_length
fill teal
title Sepal Length by Species
This tutorial covered the basics of data science with Scroll. Some key takeaways: