-
Notifications
You must be signed in to change notification settings - Fork 104
Visualization
Tier: Intermediate
Commands covered: viz
Note
Per-command flag reference lives in /docs/help/viz.md. This page is the workflow layer — when to reach for each chart and how viz smart builds a dashboard for you.
viz turns a CSV/TSV straight into interactive Plotly charts — no notebook, no spreadsheet round-trip. Output is a self-contained HTML file (the Plotly runtime is embedded, so it works offline) written to --output or stdout. With a viz_static build you can export PNG / SVG / PDF / JPEG / WebP instead (needs a local Chrome/Firefox for Plotly's renderer). Pass --open to pop the result straight into your browser.
viz is behind the viz feature flag (✨) — it ships in the prebuilt qsv binaries but not in qsvlite or qsvdp. See Binary Variants.
A single page rendering every chart type below, generated from the sample datasets in examples/viz/:
- ▶ View the gallery (rendered): https://dathere.github.io/qsv/gallery.html
- Raw HTML (downloads / shows source): https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/gallery.html
- The datasets + a copy-paste command per chart:
examples/viz/README.md
The sunburst figures (the standalone sunburst and the --dictionary infer, sunburst dashboard) render their in-sector labels radially — running along each ring's spoke — a plotly.js 3.6 refinement that keeps deep-path labels legible instead of clipping them tangentially.
Note
The gallery is served from GitHub Pages, published from examples/viz/ by a workflow on every push to master. It renders with the correct text/html type, so the charts and the embedded smart-dashboard iframes load directly — no third-party proxy. The raw.githubusercontent.com link above serves .html as text/plain (shows source), so use it only to view or download the file.
| If you want to… | Use | Notes |
|---|---|---|
| A one-shot overview of an unfamiliar dataset | viz smart |
Auto-picks a panel per column from the stats cache |
| Compare a measure across categories | viz bar |
--agg sum|mean|count|min|max when x repeats |
| Show a trend over time / an ordered x | viz line |
x is sorted numerically/temporally |
| Relate two numeric columns | viz scatter |
optional --series to color by group |
| Relate three numeric columns | viz scatter3d |
3D scatter; --color/--size encodings |
| See the 2D density of two numerics | viz contour |
binned density grid; --bins
|
| See the distribution of one numeric column | viz histogram |
--bins (default auto) |
| Summarize spread (quartiles, outliers) | viz box |
optional grouping x column |
| Show parts of a whole | viz pie |
--donut for a ring; --y value or counts |
| Show a nested part-to-whole as sized tiles | viz treemap |
nested rectangles — area encodes size (best for size comparison); --cols dim1,dim2… (levels, outer first), optional --value/--agg
|
| Show a nested hierarchy as rings | viz sunburst |
concentric rings — emphasize parent-child structure (best for deeper paths); same inputs as treemap
|
| Spot correlations across many numerics | viz heatmap |
Pearson matrix (RdBu, −1…1) — no --x/--y/--z
|
| Cross-tabulate two categories by a measure | viz heatmap |
give --x, --y and --z (pivot mode) |
| Plot OHLC price action |
viz candlestick / viz ohlc
|
--ohlc-open --high --low --close |
| Show flows between nodes | viz sankey |
--source --target [--value] |
| Compare entities across many axes | viz radar |
--cols a,b,c… + optional --series
|
| Plot lat/lon points on a map | viz map |
tile basemap; --color/--size encodings or --density heatmap |
| Plot lat/lon on an offline projection map | viz geo |
no tiles/token; --projection; same encodings as map
|
smart profiles the dataset from qsv's stats cache (🪄) and lays out one panel per column in a grid:
- Correlation heatmap — a Pearson matrix when there are 2+ continuous numeric columns, plus a correlated-pair drill-down of the strongest pair beside it: a scatter, or — for large datasets where a scatter would overplot — a 2D density contour (which embeds only a fixed bin grid). With 3+ numeric columns, a 3D scatter of the strongest-correlation triple is added as well.
- Time-series trend — a line of the first continuous numeric column over a detected date/datetime column.
-
Part-to-whole hierarchy — when 2+ low-cardinality dimensions are present, a full-width treemap (shallow 2-level hierarchy — area encodes size) or sunburst (deeper 3-level hierarchy — concentric rings) nesting them.
autofollows best practice for the depth; override with--hierarchy-style auto|treemap|sunburst. A panel is only auto-built when the dimensions are actually associated (bias-corrected Cramér's V ≥ 0.10 across every pair) — nesting statistically independent categoricals just replicates each level's marginal at every branch and tells you nothing the separate frequency bars don't, so that case is skipped with a note on stderr. An explicit--hierarchy-style treemap|sunburstbypasses the screen and forces the panel. -
Geographic map — when a latitude/longitude column pair is detected (by header name + numeric type + plausible coordinate ranges), a map leads the dashboard on a full-width row: a Mapbox tile map for a local extent, or an offline
ScatterGeoprojection world-overview (no tiles or token) when the coordinates span a continental/global area. Points far from the cluster centroid (beyond the Tukey far-out fence of all point-to-centroid distances) are flagged as geographic outliers — drawn with a distinct marker and excluded from the core extent, so a few strays don't inflate the zoom. With thegeocodefeature, the core extent's bounding box is reverse-geocoded against the local Geonames index and drawn as a labeled box plus a location summary (e.g. "New York & New Jersey, United States"); when outliers exist, a second dotted box marks the full extent and the HTML map gains "Core extent" / "Full extent" buttons to jump between views. In static image export the map renders as an offlineScatterGeoprojection fit to the data extent (US-spanning data usesalbers-usa) — only the Mapbox tile map and 3D scatter stay HTML-only. -
Box plots — for continuous numeric columns (quartiles drawn straight from the cache). Sample points are overlaid by a size heuristic: every point for small data (≤ 1,000 rows), Tukey outliers for medium (≤ 10,000), and none above that (the box stays a fast cache-only quartile summary). Pass
--box-points outliers|all|suspected|noneto force a mode (now accepted bysmart, not justbox). -
Frequency bars — for low-cardinality categoricals, booleans, and ratings. Each bar chart shows the top-N categories (
--limit, default 10) plus two aggregate buckets that mirrorqsv frequency: a(NULL)bar for empty cells and anOther (N)bar collecting the categories beyond the limit (N = how many distinct values were rolled up). Both are drawn in muted grey to read as summaries, set apart from the palette-colored real categories, and are shown by default — suppress them with--no-nulls/--no-other.
When one bucket dwarfs the rest (e.g. a dominant (NULL) or Other), the panel switches its y-axis to a logarithmic scale so the small categories stay readable. This is controlled by --log-scale auto|on|off (default auto, which kicks in only on high-dynamic-range panels; on forces it on every frequency panel, off keeps everything linear). A log panel carries two redundant cues so it's never mistaken for linear: a count (log) y-axis title, and diagonal hatching on the muted-grey (NULL)/Other bars.
ID-like (near-unique) and high-cardinality text columns are reported and skipped. By default the dashboard fits the data (up to 64 panels) for both HTML and static image export: up to 8 cartesian panels render as one typed subplot grid; beyond 8, HTML switches to an inline-div grid of independent plots and static export uses domain-positioned axes to fit them all in a single image. Cap the count with --max-charts.
To keep the self-contained HTML small and fast even on millions of rows, the point-heavy viz smart panels embed at most 50,000 points per series. Panels that summarize rather than plot raw points are unaffected: box plots read quartiles straight from the stats cache (the point overlay follows its own size heuristic — see above), frequency bars embed category counts, the correlation heatmap embeds a fixed coefficient matrix, and the density contour embeds a fixed bin grid. Two algorithms do the thinning, picked by panel:
| Panel | When it downsamples | Algorithm |
|---|---|---|
| Time-series trend (line over a date/datetime) | > 50,000 points | LTTB (Largest-Triangle-Three-Buckets) |
| Scatter drill-down pair, 3D scatter triple, map/geo lat-lon points, histogram source values | > 50,000 points | Uniform stride (every n-th row) |
- LTTB is shape-preserving: it selects points by largest triangle area in (x, y) space, so brief spikes and peaks survive instead of being stepped over, and the earliest and latest observation are always kept. It applies only to the time-series panel, whose x-axis is chronologically sorted (LTTB requires monotonic x). The algorithm is based on the paper Downsampling Time Series for Visual Representation.
- Uniform stride keeps every n-th row (endpoint-inclusive, so the first and last row are retained). It's used where x isn't ordered — a scatter cloud has no monotonic axis — and where two columns must share the same row indices (e.g. a bubble chart's color and size encodings).
Note
Downsampling is a viz smart behavior only. The explicit chart subcommands (viz scatter, viz line, viz histogram, …) plot every row you give them — cap the input yourself with sample, slice, or a sqlp LIMIT if a single chart gets too heavy.
# auto-fit dashboard for an unfamiliar file
qsv viz smart sales_sample.csv -o dashboard.html
# cap panels, lay out 3 columns, top-5 categories per frequency bar
qsv viz smart sales_sample.csv --max-charts 6 --grid-cols 3 --limit 5 -o dashboard.htmlThe sample above uses sales_sample.csv (500 e-commerce orders, designed so the correlation heatmap spans the full red→blue scale).
If you run moarstats before viz smart, the dashboard reads the extended statistics it adds to the stats cache and makes better chart choices. Without a moarstats run the behavior is unchanged — the extra stats are simply absent, so smart falls back to its defaults.
Tip
Pass --smarter to do this in one step — viz smart runs moarstats --advanced for you (one extra pass over the data, writing the .stats.csv sidecars + an .idx index) before building the dashboard.
-
Bimodal → histogram. A continuous column flagged bimodal/multimodal (high bimodality coefficient) renders as a histogram instead of a box plot — a box plot summarizes by quartiles and would hide the separate peaks. Needs
moarstats --advanced(the bimodality coefficient derives from kurtosis). -
Box shape hints. Box panels are annotated with the column's skew direction and outlier share — e.g.
account_age_days (right-skewed, 4.7% outliers)— from the Pearson skewness and outlier-percentage stats. These work from a plainqsv moarstatsrun. - Concentrated categoricals. A high-cardinality categorical that would normally be skipped as ID-like noise is kept as a top-N bar when its normalized entropy is low (a few dominant categories rather than near-uniform noise).
# one step: --smarter runs moarstats --advanced for you first
qsv viz smart customer_spend.csv --smarter -o spend_dashboard.html
# or do it manually — extend the stats cache, then let viz smart reuse it
qsv moarstats --advanced customer_spend.csv
qsv viz smart customer_spend.csv -o spend_dashboard.html
# monthly_spend (bimodal) -> histogram; account_age_days (skewed) -> annotated boxThe sample above uses customer_spend.csv (300 customers with a bimodal monthly_spend and a right-skewed account_age_days). Under the hood, moarstats regenerates the stats cache with the extra columns and viz smart reuses it — see Stats Cache & Caching.
Grab the sample datasets first:
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/sales_sample.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/stock_prices.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/web_flows.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/product_ratings.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/quakes.csv# bar — revenue by region (aggregated)
qsv viz bar sales_sample.csv --x region --y revenue --agg sum -o bar.html
# line — closing price over time
qsv viz line stock_prices.csv --x date --y close -o line.html
# scatter — units sold vs revenue
qsv viz scatter sales_sample.csv --x units_sold --y revenue -o scatter.html
# scatter3d — three numeric columns in 3D, colored by a fourth
qsv viz scatter3d sales_sample.csv --x units_sold --y revenue --z shipping_cost --color profit_margin_pct -o scatter3d.html
# contour — 2D density of two numeric columns (binned)
qsv viz contour sales_sample.csv --x units_sold --y revenue --bins 20 -o contour.html
# histogram — distribution of unit price
qsv viz histogram sales_sample.csv --x unit_price -o histogram.html
# box — spread of revenue
qsv viz box sales_sample.csv --y revenue -o box.html
# pie (donut) — revenue share by product category
qsv viz pie sales_sample.csv --x product_category --y revenue --donut -o pie.html
# treemap — nested part-to-whole as sized tiles (--cols are the levels, outer first), sized by a --value sum
qsv viz treemap sales_sample.csv --cols region,product_category --value revenue --agg sum -o treemap.html
# sunburst — deeper hierarchy as concentric rings; sized by row count when --value is omitted
qsv viz sunburst sales_sample.csv --cols region,product_category,payment_method -o sunburst.html
# heatmap (correlation) — Pearson matrix over all numeric columns
qsv viz heatmap sales_sample.csv -o heatmap_corr.html
# heatmap (pivot) — region x category grid of revenue (give --x, --y AND --z)
qsv viz heatmap sales_sample.csv --x region --y product_category --z revenue -o heatmap_pivot.html
# candlestick / ohlc — OHLC price action
qsv viz candlestick stock_prices.csv --x date --ohlc-open open --high high --low low --close close -o candlestick.html
qsv viz ohlc stock_prices.csv --x date --ohlc-open open --high high --low low --close close -o ohlc.html
# sankey — session funnel (duplicate source→target pairs are aggregated)
qsv viz sankey web_flows.csv --source source --target target --value sessions -o sankey.html
# radar — multi-axis brand comparison (one polygon per --series value, per-axis mean)
qsv viz radar product_ratings.csv --cols battery,camera,performance,display,value,design --series brand -o radar.html
# map — point map on token-free OpenStreetMap tiles; color by magnitude, size by depth
qsv viz map quakes.csv --lat lat --lon lon --color magnitude --size depth_km -o map.html
# map (density) — DensityMapbox heatmap of the same points, on a light Carto basemap
qsv viz map quakes.csv --lat lat --lon lon --density --style carto-positron -o map_density.html
# geo — offline projection map (no tiles/token); color by magnitude
qsv viz geo quakes.csv --lat lat --lon lon --color magnitude --projection natural-earth -o geo.htmlNote
--ohlc-open is spelled out (not --open) because --open already means "open the result in a browser".
viz map plots --lat/--lon coordinates on a tile basemap and auto-centers/zooms to the data's bounding box (with antimeridian wrap, so clusters straddling the 180° line frame correctly).
-
Encodings (point maps):
--color <numeric>for a continuous Viridis colorscale + colorbar,--size <numeric>for bubble sizes, or--series <category>for one trace per group.--text <col>adds per-point hover labels. -
--densityswitches to aDensityMapboxheatmap, weighted by--color/--size(or uniform). -
--stylepicks the basemap. Token-free (the defaultopen-street-map,carto-positron,carto-darkmatter,stamen-*,white-bg) need no account. The Mapbox-hosted styles (satellite,streets,dark, …) require--mapbox-token.
viz geo is the tile-free alternative: it draws the same --lat/--lon points (and the same --color/--size/--series/--text encodings) on a ScatterGeo projection basemap — coastlines, land, and country borders rendered locally, so it works fully offline with no tiles and no token. Choose the projection with --projection (natural-earth default, plus mercator, orthographic, equirectangular, albers-usa, robinson, winkel-tripel, mollweide, hammer, azimuthal-equal-area). It's the better fit for continental/global data, which is exactly when viz smart auto-picks it over the tile map. There's no --density mode for geo.
Tip
Pair these with geocode — turn city names or addresses into the lat/lon columns viz map and viz geo plot.
viz treemap and viz sunburst both render a part-to-whole hierarchy from the --cols you list — each column is one nesting level, outermost first (e.g. --cols region,product_category nests categories inside regions). They differ only in shape, so pick by what you want to read:
-
viz treemapdraws nested rectangles where area encodes size — best when you want to compare magnitudes across the hierarchy. -
viz sunburstdraws concentric rings that emphasize parent-child structure — best for deeper hierarchies (3+ levels) where ring lineage reads more clearly than nested boxes. In-sector labels orient radially (along each ring's spoke), so even a crowded deep-path ring stays legible rather than clipping tangential text.
Sizing:
- By default each leaf is sized by its row count (how many rows fall under that path).
- Add
--value <numeric>to size by a measure instead, and--agg sum|countto choose how repeated paths roll up (sumis the default when--valueis given). Unlikeviz bar, these charts accept only the additive aggregationssumandcount—mean/min/maxwould break the part-to-whole invariant (a parent must equal the sum of its children) and are rejected.
In viz smart, the same hierarchy appears as an auto-built panel governed by the association screen described above; the standalone subcommands have no such screen — they always build the chart you ask for.
# treemap — revenue by region then product category, summed
qsv viz treemap sales_sample.csv --cols region,product_category --value revenue --agg sum -o treemap.html
# sunburst — a deeper 3-level hierarchy, sized by row count (no --value)
qsv viz sunburst sales_sample.csv --cols region,product_category,payment_method -o sunburst.html--theme applies a built-in Plotly theme — background, fonts, and axis styling — to any chart type, including viz smart. Choose from default, plotly_white, plotly_dark, seaborn, seaborn_whitegrid, seaborn_dark, matplotlib, and plotnine (case-insensitive; hyphens accepted). When omitted, qsv's built-in look is used.
qsv viz smart sales_sample.csv --theme plotly_dark -o dashboard_dark.html
qsv viz bar sales_sample.csv --x region --y revenue --agg sum --theme seaborn -o bar.htmlDon't confuse
--theme(the chart's visual style, all chart types) with--style(the map basemap,viz maponly).
Any chart can be written as a static image instead of HTML — just change the extension. This needs a viz_static build (the prebuilt full binaries qualify) and a local Chrome/Firefox for Plotly's image renderer.
viz smart dashboards export in full — all eligible panels (beyond 8, laid out with domain-positioned axes), including the offline ScatterGeo map fit to the data extent. Only the Mapbox tile map (it needs network tiles) and 3D scatter panels stay HTML-only.
qsv viz scatter sales_sample.csv --x units_sold --y revenue -o scatter.png
qsv viz smart sales_sample.csv -o overview.svg --width 1400 --height 900-
docs/help/viz.md— full flag reference -
Metadata Profiling (profile) — the stats profiling
viz smartbuilds on -
Stats Cache & Caching — why
viz smartis fast -
Binary Variants — which builds include the
vizfeature -
Geospatial —
geocodeproduces the lat/lon columnsviz mapplots -
Recipe: Inspect an Unknown CSV — pairs well with
viz smart
qsv — GitHub · Releases · Discussions · qsv pro · Try it online · Benchmarks · datHere · DeepWiki · Dual-licensed MIT / Unlicense
Edit this page: Contributing to the Wiki
Home · Why qsv? · Tier legend
- All Commands (index)
- Selection & Inspection
- Transform & Reshape
- Aggregation & Statistics
- Joins & Set Ops
- SQL & Polars
- Validation & Schema
- Metadata Profiling (profile)
- Conversion & I/O
- Geospatial
- Visualization (viz)
- HTTP & Web
- Get & Disk Cache
- Scripting (Luau / Python)
- Indexing, Compression & Diff
- AI & Documentation
- Recipes index
- Inspect an Unknown CSV
- Clean & Normalize
- Geographic Enrichment
- Date Enrichment
- CKAN Integration
- JSON Schema Validation
- Build a Data Pipeline
- Stats → Insights
- Fetch & Cache
- Larger-than-RAM CSV
- Diff & Audit
- Multi-table Joins
- Synthesize Fake Data