Skip to content
Joel Natividad edited this page Jun 27, 2026 · 12 revisions

Visualization

Tier: Intermediate Commands covered: viz

Note

Per-command flag reference lives in /docs/help/viz.md. This page is the workflow layer — when to reach for each chart and how viz smart builds a dashboard for you.

viz turns a CSV/TSV straight into interactive Plotly charts — no notebook, no spreadsheet round-trip. Output is a self-contained HTML file (the Plotly runtime is embedded, so it works offline) written to --output or stdout. With a viz_static build you can export PNG / SVG / PDF / JPEG / WebP instead (needs a local Chrome/Firefox for Plotly's renderer). Pass --open to pop the result straight into your browser.

viz is behind the viz feature flag (✨) — it ships in the prebuilt qsv binaries but not in qsvlite or qsvdp. See Binary Variants.

Gallery

A single page rendering every chart type below, generated from the sample datasets in examples/viz/:

The sunburst figures (the standalone sunburst and the --dictionary infer, sunburst dashboard) render their in-sector labels radially — running along each ring's spoke — a plotly.js 3.6 refinement that keeps deep-path labels legible instead of clipping them tangentially.

Note

The gallery is served from GitHub Pages, published from examples/viz/ by a workflow on every push to master. It renders with the correct text/html type, so the charts and the embedded smart-dashboard iframes load directly — no third-party proxy. The raw.githubusercontent.com link above serves .html as text/plain (shows source), so use it only to view or download the file.

Quick decision table

If you want to… Use Notes
A one-shot overview of an unfamiliar dataset viz smart Auto-picks a panel per column from the stats cache
Compare a measure across categories viz bar --agg sum|mean|count|min|max when x repeats
Show a trend over time / an ordered x viz line x is sorted numerically/temporally
Relate two numeric columns viz scatter optional --series to color by group
Relate three numeric columns viz scatter3d 3D scatter; --color/--size encodings
See the 2D density of two numerics viz contour binned density grid; --bins
See the distribution of one numeric column viz histogram --bins (default auto)
Summarize spread (quartiles, outliers) viz box optional grouping x column
Show parts of a whole viz pie --donut for a ring; --y value or counts
Show a nested part-to-whole as sized tiles viz treemap nested rectangles — area encodes size (best for size comparison); --cols dim1,dim2… (levels, outer first), optional --value/--agg
Show a nested hierarchy as rings viz sunburst concentric rings — emphasize parent-child structure (best for deeper paths); same inputs as treemap
Spot correlations across many numerics viz heatmap Pearson matrix (RdBu, −1…1) — no --x/--y/--z
Cross-tabulate two categories by a measure viz heatmap give --x, --y and --z (pivot mode)
Plot OHLC price action viz candlestick / viz ohlc --ohlc-open --high --low --close
Show flows between nodes viz sankey --source --target [--value]
Compare entities across many axes viz radar --cols a,b,c… + optional --series
Plot lat/lon points on a map viz map tile basemap; --color/--size encodings or --density heatmap
Plot lat/lon on an offline projection map viz geo no tiles/token; --projection; same encodings as map

viz smart — the auto-dashboard

smart profiles the dataset from qsv's stats cache (🪄) and lays out one panel per column in a grid:

  • Correlation heatmap — a Pearson matrix when there are 2+ continuous numeric columns, plus a correlated-pair drill-down of the strongest pair beside it: a scatter, or — for large datasets where a scatter would overplot — a 2D density contour (which embeds only a fixed bin grid). With 3+ numeric columns, a 3D scatter of the strongest-correlation triple is added as well.
  • Time-series trend — a line of the first continuous numeric column over a detected date/datetime column.
  • Part-to-whole hierarchy — when 2+ low-cardinality dimensions are present, a full-width treemap (shallow 2-level hierarchy — area encodes size) or sunburst (deeper 3-level hierarchy — concentric rings) nesting them. auto follows best practice for the depth; override with --hierarchy-style auto|treemap|sunburst. A panel is only auto-built when the dimensions are actually associated (bias-corrected Cramér's V ≥ 0.10 across every pair) — nesting statistically independent categoricals just replicates each level's marginal at every branch and tells you nothing the separate frequency bars don't, so that case is skipped with a note on stderr. An explicit --hierarchy-style treemap|sunburst bypasses the screen and forces the panel.
  • Geographic map — when a latitude/longitude column pair is detected (by header name + numeric type + plausible coordinate ranges), a map leads the dashboard on a full-width row: a Mapbox tile map for a local extent, or an offline ScatterGeo projection world-overview (no tiles or token) when the coordinates span a continental/global area. Points far from the cluster centroid (beyond the Tukey far-out fence of all point-to-centroid distances) are flagged as geographic outliers — drawn with a distinct marker and excluded from the core extent, so a few strays don't inflate the zoom. With the geocode feature, the core extent's bounding box is reverse-geocoded against the local Geonames index and drawn as a labeled box plus a location summary (e.g. "New York & New Jersey, United States"); when outliers exist, a second dotted box marks the full extent and the HTML map gains "Core extent" / "Full extent" buttons to jump between views. In static image export the map renders as an offline ScatterGeo projection fit to the data extent (US-spanning data uses albers-usa) — only the Mapbox tile map and 3D scatter stay HTML-only.
  • Box plots — for continuous numeric columns (quartiles drawn straight from the cache). Sample points are overlaid by a size heuristic: every point for small data (≤ 1,000 rows), Tukey outliers for medium (≤ 10,000), and none above that (the box stays a fast cache-only quartile summary). Pass --box-points outliers|all|suspected|none to force a mode (now accepted by smart, not just box).
  • Frequency bars — for low-cardinality categoricals, booleans, and ratings. Each bar chart shows the top-N categories (--limit, default 10) plus two aggregate buckets that mirror qsv frequency: a (NULL) bar for empty cells and an Other (N) bar collecting the categories beyond the limit (N = how many distinct values were rolled up). Both are drawn in muted grey to read as summaries, set apart from the palette-colored real categories, and are shown by default — suppress them with --no-nulls / --no-other.

When one bucket dwarfs the rest (e.g. a dominant (NULL) or Other), the panel switches its y-axis to a logarithmic scale so the small categories stay readable. This is controlled by --log-scale auto|on|off (default auto, which kicks in only on high-dynamic-range panels; on forces it on every frequency panel, off keeps everything linear). A log panel carries two redundant cues so it's never mistaken for linear: a count (log) y-axis title, and diagonal hatching on the muted-grey (NULL)/Other bars.

ID-like (near-unique) and high-cardinality text columns are reported and skipped. By default the dashboard fits the data (up to 64 panels) for both HTML and static image export: up to 8 cartesian panels render as one typed subplot grid; beyond 8, HTML switches to an inline-div grid of independent plots and static export uses domain-positioned axes to fit them all in a single image. Cap the count with --max-charts.

Data size & downsampling

To keep the self-contained HTML small and fast even on millions of rows, the point-heavy viz smart panels embed at most 50,000 points per series. Panels that summarize rather than plot raw points are unaffected: box plots read quartiles straight from the stats cache (the point overlay follows its own size heuristic — see above), frequency bars embed category counts, the correlation heatmap embeds a fixed coefficient matrix, and the density contour embeds a fixed bin grid. Two algorithms do the thinning, picked by panel:

Panel When it downsamples Algorithm
Time-series trend (line over a date/datetime) > 50,000 points LTTB (Largest-Triangle-Three-Buckets)
Scatter drill-down pair, 3D scatter triple, map/geo lat-lon points, histogram source values > 50,000 points Uniform stride (every n-th row)
  • LTTB is shape-preserving: it selects points by largest triangle area in (x, y) space, so brief spikes and peaks survive instead of being stepped over, and the earliest and latest observation are always kept. It applies only to the time-series panel, whose x-axis is chronologically sorted (LTTB requires monotonic x). The algorithm is based on the paper Downsampling Time Series for Visual Representation.
  • Uniform stride keeps every n-th row (endpoint-inclusive, so the first and last row are retained). It's used where x isn't ordered — a scatter cloud has no monotonic axis — and where two columns must share the same row indices (e.g. a bubble chart's color and size encodings).

Note

Downsampling is a viz smart behavior only. The explicit chart subcommands (viz scatter, viz line, viz histogram, …) plot every row you give them — cap the input yourself with sample, slice, or a sqlp LIMIT if a single chart gets too heavy.

# auto-fit dashboard for an unfamiliar file
qsv viz smart sales_sample.csv -o dashboard.html

# cap panels, lay out 3 columns, top-5 categories per frequency bar
qsv viz smart sales_sample.csv --max-charts 6 --grid-cols 3 --limit 5 -o dashboard.html

The sample above uses sales_sample.csv (500 e-commerce orders, designed so the correlation heatmap spans the full red→blue scale).

moarstats-informed dashboards 🪄

If you run moarstats before viz smart, the dashboard reads the extended statistics it adds to the stats cache and makes better chart choices. Without a moarstats run the behavior is unchanged — the extra stats are simply absent, so smart falls back to its defaults.

Tip

Pass --smarter to do this in one step — viz smart runs moarstats --advanced for you (one extra pass over the data, writing the .stats.csv sidecars + an .idx index) before building the dashboard.

  • Bimodal → histogram. A continuous column flagged bimodal/multimodal (high bimodality coefficient) renders as a histogram instead of a box plot — a box plot summarizes by quartiles and would hide the separate peaks. Needs moarstats --advanced (the bimodality coefficient derives from kurtosis).
  • Box shape hints. Box panels are annotated with the column's skew direction and outlier share — e.g. account_age_days (right-skewed, 4.7% outliers) — from the Pearson skewness and outlier-percentage stats. These work from a plain qsv moarstats run.
  • Concentrated categoricals. A high-cardinality categorical that would normally be skipped as ID-like noise is kept as a top-N bar when its normalized entropy is low (a few dominant categories rather than near-uniform noise).
# one step: --smarter runs moarstats --advanced for you first
qsv viz smart customer_spend.csv --smarter -o spend_dashboard.html

# or do it manually — extend the stats cache, then let viz smart reuse it
qsv moarstats --advanced customer_spend.csv
qsv viz smart customer_spend.csv -o spend_dashboard.html
# monthly_spend (bimodal) -> histogram; account_age_days (skewed) -> annotated box

The sample above uses customer_spend.csv (300 customers with a bimodal monthly_spend and a right-skewed account_age_days). Under the hood, moarstats regenerates the stats cache with the extra columns and viz smart reuses it — see Stats Cache & Caching.

Chart subcommands

Grab the sample datasets first:

curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/sales_sample.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/stock_prices.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/web_flows.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/product_ratings.csv
curl -LO https://raw.githubusercontent.com/dathere/qsv/master/examples/viz/quakes.csv
# bar — revenue by region (aggregated)
qsv viz bar sales_sample.csv --x region --y revenue --agg sum -o bar.html

# line — closing price over time
qsv viz line stock_prices.csv --x date --y close -o line.html

# scatter — units sold vs revenue
qsv viz scatter sales_sample.csv --x units_sold --y revenue -o scatter.html

# scatter3d — three numeric columns in 3D, colored by a fourth
qsv viz scatter3d sales_sample.csv --x units_sold --y revenue --z shipping_cost --color profit_margin_pct -o scatter3d.html

# contour — 2D density of two numeric columns (binned)
qsv viz contour sales_sample.csv --x units_sold --y revenue --bins 20 -o contour.html

# histogram — distribution of unit price
qsv viz histogram sales_sample.csv --x unit_price -o histogram.html

# box — spread of revenue
qsv viz box sales_sample.csv --y revenue -o box.html

# pie (donut) — revenue share by product category
qsv viz pie sales_sample.csv --x product_category --y revenue --donut -o pie.html

# treemap — nested part-to-whole as sized tiles (--cols are the levels, outer first), sized by a --value sum
qsv viz treemap sales_sample.csv --cols region,product_category --value revenue --agg sum -o treemap.html

# sunburst — deeper hierarchy as concentric rings; sized by row count when --value is omitted
qsv viz sunburst sales_sample.csv --cols region,product_category,payment_method -o sunburst.html

# heatmap (correlation) — Pearson matrix over all numeric columns
qsv viz heatmap sales_sample.csv -o heatmap_corr.html

# heatmap (pivot) — region x category grid of revenue (give --x, --y AND --z)
qsv viz heatmap sales_sample.csv --x region --y product_category --z revenue -o heatmap_pivot.html

# candlestick / ohlc — OHLC price action
qsv viz candlestick stock_prices.csv --x date --ohlc-open open --high high --low low --close close -o candlestick.html
qsv viz ohlc        stock_prices.csv --x date --ohlc-open open --high high --low low --close close -o ohlc.html

# sankey — session funnel (duplicate source→target pairs are aggregated)
qsv viz sankey web_flows.csv --source source --target target --value sessions -o sankey.html

# radar — multi-axis brand comparison (one polygon per --series value, per-axis mean)
qsv viz radar product_ratings.csv --cols battery,camera,performance,display,value,design --series brand -o radar.html

# map — point map on token-free OpenStreetMap tiles; color by magnitude, size by depth
qsv viz map quakes.csv --lat lat --lon lon --color magnitude --size depth_km -o map.html

# map (density) — DensityMapbox heatmap of the same points, on a light Carto basemap
qsv viz map quakes.csv --lat lat --lon lon --density --style carto-positron -o map_density.html

# geo — offline projection map (no tiles/token); color by magnitude
qsv viz geo quakes.csv --lat lat --lon lon --color magnitude --projection natural-earth -o geo.html

Note

--ohlc-open is spelled out (not --open) because --open already means "open the result in a browser".

Maps in depth

viz map plots --lat/--lon coordinates on a tile basemap and auto-centers/zooms to the data's bounding box (with antimeridian wrap, so clusters straddling the 180° line frame correctly).

  • Encodings (point maps): --color <numeric> for a continuous Viridis colorscale + colorbar, --size <numeric> for bubble sizes, or --series <category> for one trace per group. --text <col> adds per-point hover labels.
  • --density switches to a DensityMapbox heatmap, weighted by --color/--size (or uniform).
  • --style picks the basemap. Token-free (the default open-street-map, carto-positron, carto-darkmatter, stamen-*, white-bg) need no account. The Mapbox-hosted styles (satellite, streets, dark, …) require --mapbox-token.

viz geo is the tile-free alternative: it draws the same --lat/--lon points (and the same --color/--size/--series/--text encodings) on a ScatterGeo projection basemap — coastlines, land, and country borders rendered locally, so it works fully offline with no tiles and no token. Choose the projection with --projection (natural-earth default, plus mercator, orthographic, equirectangular, albers-usa, robinson, winkel-tripel, mollweide, hammer, azimuthal-equal-area). It's the better fit for continental/global data, which is exactly when viz smart auto-picks it over the tile map. There's no --density mode for geo.

Tip

Pair these with geocode — turn city names or addresses into the lat/lon columns viz map and viz geo plot.

Hierarchy charts in depth

viz treemap and viz sunburst both render a part-to-whole hierarchy from the --cols you list — each column is one nesting level, outermost first (e.g. --cols region,product_category nests categories inside regions). They differ only in shape, so pick by what you want to read:

  • viz treemap draws nested rectangles where area encodes size — best when you want to compare magnitudes across the hierarchy.
  • viz sunburst draws concentric rings that emphasize parent-child structure — best for deeper hierarchies (3+ levels) where ring lineage reads more clearly than nested boxes. In-sector labels orient radially (along each ring's spoke), so even a crowded deep-path ring stays legible rather than clipping tangential text.

Sizing:

  • By default each leaf is sized by its row count (how many rows fall under that path).
  • Add --value <numeric> to size by a measure instead, and --agg sum|count to choose how repeated paths roll up (sum is the default when --value is given). Unlike viz bar, these charts accept only the additive aggregations sum and countmean/min/max would break the part-to-whole invariant (a parent must equal the sum of its children) and are rejected.

In viz smart, the same hierarchy appears as an auto-built panel governed by the association screen described above; the standalone subcommands have no such screen — they always build the chart you ask for.

# treemap — revenue by region then product category, summed
qsv viz treemap sales_sample.csv --cols region,product_category --value revenue --agg sum -o treemap.html

# sunburst — a deeper 3-level hierarchy, sized by row count (no --value)
qsv viz sunburst sales_sample.csv --cols region,product_category,payment_method -o sunburst.html

Theming

--theme applies a built-in Plotly theme — background, fonts, and axis styling — to any chart type, including viz smart. Choose from default, plotly_white, plotly_dark, seaborn, seaborn_whitegrid, seaborn_dark, matplotlib, and plotnine (case-insensitive; hyphens accepted). When omitted, qsv's built-in look is used.

qsv viz smart sales_sample.csv --theme plotly_dark -o dashboard_dark.html
qsv viz bar sales_sample.csv --x region --y revenue --agg sum --theme seaborn -o bar.html

Don't confuse --theme (the chart's visual style, all chart types) with --style (the map basemap, viz map only).

Static image export

Any chart can be written as a static image instead of HTML — just change the extension. This needs a viz_static build (the prebuilt full binaries qualify) and a local Chrome/Firefox for Plotly's image renderer.

viz smart dashboards export in full — all eligible panels (beyond 8, laid out with domain-positioned axes), including the offline ScatterGeo map fit to the data extent. Only the Mapbox tile map (it needs network tiles) and 3D scatter panels stay HTML-only.

qsv viz scatter sales_sample.csv --x units_sold --y revenue -o scatter.png
qsv viz smart   sales_sample.csv -o overview.svg --width 1400 --height 900

See also

Clone this wiki locally