diff --git a/documentation/playbook/demo-data-schema.md b/documentation/playbook/demo-data-schema.md new file mode 100644 index 000000000..a8e6910d7 --- /dev/null +++ b/documentation/playbook/demo-data-schema.md @@ -0,0 +1,250 @@ +--- +title: Demo Data Schema +sidebar_label: Demo data schema +description: Schema and structure of the FX market data and cryptocurrency trades available on demo.questdb.io +--- + +The [QuestDB demo instance at demo.questdb.io](https://demo.questdb.io) contains two datasets that you can query directly: simulated FX market data and real cryptocurrency trades. This page describes the available tables and their structure. + +## Overview + +The demo instance provides two independent datasets: + +1. **FX Market Data (Simulated)** - Foreign exchange prices and order books +2. **Cryptocurrency Trades (Real)** - Live cryptocurrency trades from OKX exchange + +--- + +# FX Market Data (Simulated) + +The FX dataset contains simulated foreign exchange market data for 30 currency pairs. We fetch real reference prices from Yahoo Finance every few seconds, but all order book levels and price updates are generated algorithmically based on these reference prices. + +## core_price Table + +The `core_price` table contains individual FX price updates from various liquidity providers. Each row represents a bid/ask quote update for a specific currency pair from a specific ECN. + +### Schema + +```sql title="core_price table structure" +CREATE TABLE 'core_price' ( + timestamp TIMESTAMP, + symbol SYMBOL CAPACITY 16384 CACHE, + ecn SYMBOL CAPACITY 256 CACHE, + bid_price DOUBLE, + bid_volume LONG, + ask_price DOUBLE, + ask_volume LONG, + reason SYMBOL CAPACITY 256 CACHE, + indicator1 DOUBLE, + indicator2 DOUBLE +) timestamp(timestamp) PARTITION BY HOUR TTL 3 DAYS WAL; +``` + +### Columns + +- **`timestamp`** - Time of the price update (designated timestamp) +- **`symbol`** - Currency pair from the 30 tracked symbols (see list below) +- **`ecn`** - Electronic Communication Network providing the quote: **LMAX**, **EBS**, **Currenex**, or **Hotspot** +- **`bid_price`** - Bid price (price at which market makers are willing to buy) +- **`bid_volume`** - Volume available at the bid price +- **`ask_price`** - Ask price (price at which market makers are willing to sell) +- **`ask_volume`** - Volume available at the ask price +- **`reason`** - Reason for the price update: "normal", "liquidity_event", or "news_event" +- **`indicator1`**, **`indicator2`** - Additional market indicators + +The table tracks **30 currency pairs**: EURUSD, GBPUSD, USDJPY, USDCHF, AUDUSD, USDCAD, NZDUSD, EURJPY, GBPJPY, EURGBP, AUDJPY, CADJPY, NZDJPY, EURAUD, EURNZD, AUDNZD, GBPAUD, GBPNZD, AUDCAD, NZDCAD, EURCAD, EURCHF, GBPCHF, USDNOK, USDSEK, USDZAR, USDMXN, USDSGD, USDHKD, USDTRY. + +### Sample Data + +```questdb-sql demo title="Recent core_price updates" +SELECT * FROM core_price +WHERE timestamp IN today() +LIMIT -10; +``` + +**Results:** + +| timestamp | symbol | ecn | bid_price | bid_volume | ask_price | ask_volume | reason | indicator1 | indicator2 | +| --------------------------- | ------ | -------- | --------- | ---------- | --------- | ---------- | --------------- | ---------- | ---------- | +| 2025-12-18T11:46:13.059566Z | USDCHF | LMAX | 0.7959 | 219884 | 0.7971 | 223174 | liquidity_event | 0.641 | | +| 2025-12-18T11:46:13.060542Z | USDSGD | Currenex | 1.291 | 295757049 | 1.2982 | 301215620 | normal | 0.034 | | +| 2025-12-18T11:46:13.061853Z | EURAUD | LMAX | 1.7651 | 6207630 | 1.7691 | 5631029 | liquidity_event | 0.027 | | +| 2025-12-18T11:46:13.064138Z | AUDNZD | LMAX | 1.1344 | 227668 | 1.1356 | 212604 | liquidity_event | 0.881 | | +| 2025-12-18T11:46:13.065041Z | GBPNZD | LMAX | 2.3307 | 2021166 | 2.3337 | 1712096 | normal | 0.308 | | +| 2025-12-18T11:46:13.065187Z | USDCAD | EBS | 1.3837 | 2394978 | 1.3869 | 2300556 | normal | 0.084 | | +| 2025-12-18T11:46:13.065722Z | USDZAR | EBS | 16.7211 | 28107021 | 16.7263 | 23536519 | liquidity_event | 0.151 | | +| 2025-12-18T11:46:13.066128Z | EURAUD | EBS | 1.763 | 810471822 | 1.7712 | 883424752 | news_event | 0.027 | | +| 2025-12-18T11:46:13.066700Z | CADJPY | Currenex | 113.63 | 20300827 | 114.11 | 19720915 | normal | 0.55 | | +| 2025-12-18T11:46:13.071607Z | NZDJPY | Currenex | 89.95 | 35284228 | 90.46 | 30552528 | liquidity_event | 0.69 | | + +## market_data Table + +The `market_data` table contains order book snapshots for currency pairs. Each row represents a complete view of the order book at a specific timestamp, with bid and ask prices and volumes stored as 2D arrays. + +### Schema + +```sql title="market_data table structure" +CREATE TABLE 'market_data' ( + timestamp TIMESTAMP, + symbol SYMBOL CAPACITY 16384 CACHE, + bids DOUBLE[][], + asks DOUBLE[][] +) timestamp(timestamp) PARTITION BY HOUR TTL 3 DAYS; +``` + +### Columns + +- **`timestamp`** - Time of the order book snapshot (designated timestamp) +- **`symbol`** - Currency pair (e.g., EURUSD, GBPJPY) +- **`bids`** - 2D array containing bid prices and volumes: `[[price1, price2, ...], [volume1, volume2, ...]]` +- **`asks`** - 2D array containing ask prices and volumes: `[[price1, price2, ...], [volume1, volume2, ...]]` + +The arrays are structured so that: +- `bids[1]` contains bid prices (descending order - highest first) +- `bids[2]` contains corresponding bid volumes +- `asks[1]` contains ask prices (ascending order - lowest first) +- `asks[2]` contains corresponding ask volumes + +### Sample Query + +```questdb-sql demo title="Recent order book snapshots" +SELECT timestamp, symbol, + array_count(bids[1]) as bid_levels, + array_count(asks[1]) as ask_levels +FROM market_data +WHERE timestamp IN today() +LIMIT -5; +``` + +**Results:** + +| timestamp | symbol | bid_levels | ask_levels | +| --------------------------- | ------ | ---------- | ---------- | +| 2025-12-18T12:04:07.071512Z | EURAUD | 40 | 40 | +| 2025-12-18T12:04:07.072060Z | USDJPY | 40 | 40 | +| 2025-12-18T12:04:07.072554Z | USDMXN | 40 | 40 | +| 2025-12-18T12:04:07.072949Z | USDCAD | 40 | 40 | +| 2025-12-18T12:04:07.073002Z | USDSEK | 40 | 40 | + +Each order book snapshot contains 40 bid levels and 40 ask levels. + +## FX Materialized Views + +The FX dataset includes several materialized views providing pre-aggregated data at different time intervals: + +### Best Bid/Offer (BBO) Views + +- **`bbo_1s`** - Best bid and offer aggregated every 1 second +- **`bbo_1m`** - Best bid and offer aggregated every 1 minute +- **`bbo_1h`** - Best bid and offer aggregated every 1 hour +- **`bbo_1d`** - Best bid and offer aggregated every 1 day + +### Core Price Aggregations + +- **`core_price_1s`** - Core prices aggregated every 1 second +- **`core_price_1d`** - Core prices aggregated every 1 day + +### Market Data OHLC + +- **`market_data_ohlc_1m`** - Open, High, Low, Close candlesticks at 1-minute intervals +- **`market_data_ohlc_15m`** - OHLC candlesticks at 15-minute intervals +- **`market_data_ohlc_1d`** - OHLC candlesticks at 1-day intervals + +These views are continuously updated and optimized for dashboard and analytics queries on FX data. + +### FX Data Volume + +- **`market_data`**: Approximately **160 million rows** per day (order book snapshots) +- **`core_price`**: Approximately **73 million rows** per day (price updates across all ECNs and symbols) + +--- + +# Cryptocurrency Trades (Real) + +The cryptocurrency dataset contains **real market data** streamed live from the OKX exchange using FeedHandler. These are actual executed trades, not simulated data. + +## trades Table + +The `trades` table contains real cryptocurrency trade data. Each row represents an actual executed trade for a cryptocurrency pair. + +### Schema + +```sql title="trades table structure" +CREATE TABLE 'trades' ( + symbol SYMBOL CAPACITY 256 CACHE, + side SYMBOL CAPACITY 256 CACHE, + price DOUBLE, + amount DOUBLE, + timestamp TIMESTAMP +) timestamp(timestamp) PARTITION BY DAY WAL; +``` + +### Columns + +- **`timestamp`** - Time when the trade was executed (designated timestamp) +- **`symbol`** - Cryptocurrency trading pair from the 12 tracked symbols (see list below) +- **`side`** - Trade side: **buy** or **sell** +- **`price`** - Execution price of the trade +- **`amount`** - Trade size (volume in base currency) + +The table tracks **12 cryptocurrency pairs**: ADA-USDT, AVAX-USD, BTC-USDT, DAI-USD, DOT-USD, ETH-BTC, ETH-USDT, LTC-USD, SOL-BTC, SOL-USD, UNI-USD, XLM-USD. + +### Sample Data + +```questdb-sql demo title="Recent cryptocurrency trades" +SELECT * FROM trades +LIMIT -10; +``` + +**Results:** + +| symbol | side | price | amount | timestamp | +| -------- | ---- | ------- | ---------- | --------------------------- | +| BTC-USDT | buy | 85721.6 | 0.00045714 | 2025-12-18T19:31:11.203000Z | +| BTC-USD | buy | 85721.6 | 0.00045714 | 2025-12-18T19:31:11.203000Z | +| BTC-USDT | buy | 85726.6 | 0.00001501 | 2025-12-18T19:31:11.206000Z | +| BTC-USD | buy | 85726.6 | 0.00001501 | 2025-12-18T19:31:11.206000Z | +| BTC-USDT | buy | 85726.9 | 0.000887 | 2025-12-18T19:31:11.206000Z | +| BTC-USD | buy | 85726.9 | 0.000887 | 2025-12-18T19:31:11.206000Z | +| BTC-USDT | buy | 85731.3 | 0.00004393 | 2025-12-18T19:31:11.206000Z | +| BTC-USD | buy | 85731.3 | 0.00004393 | 2025-12-18T19:31:11.206000Z | +| ETH-USDT | sell | 2827.54 | 0.006929 | 2025-12-18T19:31:11.595000Z | +| ETH-USD | sell | 2827.54 | 0.006929 | 2025-12-18T19:31:11.595000Z | + +## Cryptocurrency Materialized Views + +The cryptocurrency dataset includes materialized views for aggregated trade data: + +### Trades Aggregations + +- **`trades_latest_1d`** - Latest trade data aggregated daily +- **`trades_OHLC_15m`** - OHLC candlesticks for cryptocurrency trades at 15-minute intervals + +These views are continuously updated and provide faster query performance for cryptocurrency trade analysis. + +### Cryptocurrency Data Volume + +- **`trades`**: Approximately **3.7 million rows** per day (real cryptocurrency trades) + +--- + +## Data Retention + +**FX tables** (`core_price` and `market_data`) use a **3-day TTL (Time To Live)**, meaning data older than 3 days is automatically removed. This keeps the demo instance responsive while providing sufficient recent data. + +**Cryptocurrency trades table** has **no retention policy** and contains historical data dating back to **March 8, 2022**. This provides over 3 years of real cryptocurrency trade history for long-term analysis and backtesting. + +## Using the Demo Data + +You can run queries against both datasets directly on [demo.questdb.io](https://demo.questdb.io). Throughout the Playbook, recipes using demo data will include a direct link to execute the query. + +:::tip +The demo instance is read-only. For testing write operations (INSERT, UPDATE, DELETE), you'll need to run QuestDB locally. See the [Quick Start guide](/docs/quick-start/) for installation instructions. +::: + +:::info Related Documentation +- [SYMBOL type](/docs/concept/symbol/) +- [Arrays in QuestDB](/docs/concept/array/) +- [Designated timestamp](/docs/concept/designated-timestamp/) +- [Time-series aggregations](/docs/reference/function/aggregation/) +::: diff --git a/documentation/playbook/integrations/grafana/dynamic-table-queries.md b/documentation/playbook/integrations/grafana/dynamic-table-queries.md new file mode 100644 index 000000000..babc9eaf7 --- /dev/null +++ b/documentation/playbook/integrations/grafana/dynamic-table-queries.md @@ -0,0 +1,225 @@ +--- +title: Query Multiple Tables Dynamically in Grafana +sidebar_label: Dynamic table queries +description: Use Grafana variables to dynamically query multiple tables with the same schema for time-series visualization +--- + +Query multiple QuestDB tables dynamically in Grafana using dashboard variables. This is useful when you have many tables with identical schemas (e.g., sensor data, metrics from different sources) and want to visualize them together without hardcoding table names in your queries. + +## Problem: Visualize Many Similar Tables + +You have 100+ tables with the same structure (e.g., `sensor_1`, `sensor_2`, ..., `sensor_n`) and want to: +1. Display data from all tables on a single Grafana chart +2. Avoid manually updating queries when tables are added or removed +3. Allow users to select which tables to visualize via dashboard controls + +## Solution: Use Grafana Variables with Dynamic SQL + +Create Grafana dashboard variables that query QuestDB for table names, then use string aggregation functions to build the SQL query dynamically. + +### Step 1: Get Table Names + +First, query QuestDB to get all relevant table names: + +```sql +SELECT table_name FROM tables() +WHERE table_name LIKE 'sensor_%'; +``` + +This returns a list of all tables matching the pattern. + +### Step 2: Create Grafana Variables + +Create two dashboard variables to construct the dynamic query: + +**Variable 1: `$table_list`** - Build the JOIN clause + +```sql +WITH tbs AS ( + SELECT string_agg(table_name, ',') as names + FROM tables() + WHERE table_name LIKE 'sensor_%' +) +SELECT replace(names, ',', ' ASOF JOIN ') FROM tbs; +``` + +**Output:** `sensor_1 ASOF JOIN sensor_2 ASOF JOIN sensor_3 ASOF JOIN sensor_4` + +This creates the table list with ASOF JOIN operators between them. + +**Variable 2: `$column_avgs`** - Build the column list + +```sql +SELECT string_agg(concat('avg(', table_name, '.value)'), ',') as columns +FROM tables() +WHERE table_name LIKE 'sensor_%'; +``` + +**Output:** `avg(sensor_1.value),avg(sensor_2.value),avg(sensor_3.value),avg(sensor_4.value)` + +This creates the column selection list with aggregation functions. + +### Step 3: Use Variables in Dashboard Query + +Now reference these variables in your Grafana chart query: + +```sql +SELECT sensor_1.timestamp, $column_avgs +FROM $table_list +SAMPLE BY 1s FROM $__fromTime TO $__toTime FILL(PREV); +``` + +When Grafana executes this query, it interpolates the variables: + +```sql +SELECT sensor_1.timestamp, avg(sensor_1.value),avg(sensor_2.value),avg(sensor_3.value),avg(sensor_4.value) +FROM sensor_1 ASOF JOIN sensor_2 ASOF JOIN sensor_3 ASOF JOIN sensor_4 +SAMPLE BY 1s FROM cast(1571176800000000 as timestamp) TO cast(1571349600000000 as timestamp) FILL(PREV); +``` + +## How It Works + +The solution uses three key QuestDB features: + +1. **`tables()` function**: Returns metadata about all tables in the database +2. **`string_agg()`**: Concatenates multiple rows into a single comma-separated string +3. **`replace()`**: Swaps commas for JOIN operators to build the FROM clause + +Combined with Grafana's variable interpolation: +- `$column_avgs`: Replaced with the aggregated column list +- `$table_list`: Replaced with the joined table expression +- `$__fromTime` / `$__toTime`: Grafana macros for the dashboard's time range + +### Understanding ASOF JOIN + +`ASOF JOIN` is ideal for time-series data with different update frequencies: +- Joins tables on timestamp +- For each row in the first table, finds the closest past timestamp in other tables +- Works like a LEFT JOIN but with time-based matching + +This ensures that even if tables update at different rates, you get a complete dataset with the most recent known value from each table. + +## Adapting the Pattern + +**Filter by different patterns:** +```sql +-- Tables starting with "metrics_" +WHERE table_name LIKE 'metrics_%' + +-- Tables matching a regex pattern +WHERE table_name ~ 'sensor_[0-9]+' + +-- Exclude certain tables +WHERE table_name LIKE 'sensor_%' + AND table_name NOT IN ('sensor_test', 'sensor_backup') +``` + +**Different aggregation functions:** +```sql +-- Maximum values +SELECT string_agg(concat('max(', table_name, '.value)'), ',') + +-- Sum values +SELECT string_agg(concat('sum(', table_name, '.value)'), ',') + +-- Last values (no aggregation needed) +SELECT string_agg(concat(table_name, '.value'), ',') +``` + +**Different join strategies:** +```sql +-- INNER JOIN (only rows with data in all tables) +SELECT replace(names, ',', ' INNER JOIN ') + +-- LEFT JOIN (all rows from first table) +SELECT replace(names, ',', ' LEFT JOIN ') + +-- Add ON clause for explicit joins +SELECT replace(names, ',', ' LEFT JOIN ') || ' ON timestamp' +``` + +**Custom column names:** +```sql +-- Cleaner column names in the chart +SELECT string_agg( + concat('avg(', table_name, '.value) AS ', replace(table_name, 'sensor_', '')), + ',' +) +``` + +Output: `avg(sensor_1.value) AS 1,avg(sensor_2.value) AS 2,...` + +## Programmatic Alternative + +If you're not using Grafana, you can achieve the same result programmatically: + +1. **Query for table names:** + ```sql + SELECT table_name FROM tables() WHERE table_name LIKE 'sensor_%'; + ``` + +2. **Build the query on the client side:** + ```python + # Python example + tables = ['sensor_1', 'sensor_2', 'sensor_3'] + + # Build JOIN clause + join_clause = ' ASOF JOIN '.join(tables) + + # Build column list + columns = ','.join([f'avg({t}.value)' for t in tables]) + + # Final query + query = f""" + SELECT {tables[0]}.timestamp, {columns} + FROM {join_clause} + SAMPLE BY 1s FILL(PREV) + """ + ``` + +## Handling Different Sampling Intervals + +When tables have different update frequencies, use FILL to handle gaps: + +```sql +-- Fill with previous value (holds last known value) +SAMPLE BY 1s FILL(PREV) + +-- Fill with linear interpolation +SAMPLE BY 1s FILL(LINEAR) + +-- Fill with NULL (show actual gaps) +SAMPLE BY 1s FILL(NULL) + +-- Fill with zero +SAMPLE BY 1s FILL(0) +``` + +**Choose based on your data:** +- **PREV**: Best for metrics that persist (temperatures, prices, statuses) +- **LINEAR**: Best for continuous values that change smoothly +- **NULL**: Best when you want to see actual data gaps +- **0 or constant**: Best for counting or rate metrics + +:::tip Performance Optimization +Joining many tables can be expensive. To improve performance: +- Use `SAMPLE BY` to reduce the number of rows +- Add timestamp filters early in the query +- Consider pre-aggregating data into a single table for frequently-accessed views +- Limit the number of tables joined (split into multiple charts if needed) +::: + +:::warning Table Schema Consistency +This pattern assumes all tables have identical schemas. If schemas differ: +- The query will fail at runtime +- You'll need to handle missing columns explicitly +- Consider using separate queries for tables with different structures +::: + +:::info Related Documentation +- [ASOF JOIN](/docs/reference/sql/join/#asof-join) +- [tables() function](/docs/reference/function/meta/#tables) +- [string_agg()](/docs/reference/function/aggregation/#string_agg) +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [Grafana QuestDB data source](https://grafana.com/grafana/plugins/questdb-questdb-datasource/) +::: diff --git a/documentation/playbook/integrations/grafana/overlay-timeshift.md b/documentation/playbook/integrations/grafana/overlay-timeshift.md new file mode 100644 index 000000000..f3ae233e3 --- /dev/null +++ b/documentation/playbook/integrations/grafana/overlay-timeshift.md @@ -0,0 +1,72 @@ +--- +title: Overlay Two Time Series with Time Shift +sidebar_label: Overlay with timeshift +description: Overlay yesterday's and today's data on the same Grafana chart using time shift +--- + +Compare yesterday's data against today's data on the same Grafana chart by overlaying them. + +## Problem + +You have a query with Grafana's `timeshift` set to `1d/d` to display yesterday's data. You want to overlay today's data on the same chart, starting from scratch each day, so you can compare the shapes of both time series. + +## Solution + +Leave the timeshift as `1d/d` to cover yesterday, and add a new query to the same chart. In this new query, filter for timestamp plus 1 day to cover today's datapoints, then shift them back by 1 day for display. + +**Query 1 (Yesterday's data):** +```sql +DECLARE + @symbol := 'BTC-USDT' +WITH sampled AS ( + SELECT + timestamp, symbol, + volume AS volume, + ((open+close)/2) * volume AS traded_value + FROM trades_OHLC_15m + WHERE $__timeFilter(timestamp) + AND symbol = @symbol +), cumulative AS ( + SELECT timestamp, symbol, + SUM(traded_value) + OVER (ORDER BY timestamp) AS cumulative_value, + SUM(volume) + OVER (ORDER BY timestamp) AS cumulative_volume + FROM sampled +) +SELECT timestamp as time, cumulative_value/cumulative_volume AS vwap_yesterday FROM cumulative; +``` + +**Query 2 (Today's data, shifted back):** +```sql +DECLARE + @symbol := 'BTC-USDT' +WITH sampled AS ( + SELECT + timestamp, symbol, + volume AS volume, + ((open+close)/2) * volume AS traded_value + FROM trades_OHLC_15m + WHERE timestamp BETWEEN dateadd('d',1,$__unixEpochFrom()*1000000) + AND dateadd('d',1,$__unixEpochTo() * 1000000) + AND symbol = @symbol +), cumulative AS ( + SELECT timestamp, symbol, + SUM(traded_value) + OVER (ORDER BY timestamp) AS cumulative_value, + SUM(volume) + OVER (ORDER BY timestamp) AS cumulative_volume + FROM sampled +) +SELECT dateadd('d',-1,timestamp) as time, cumulative_value/cumulative_volume AS vwap_today FROM cumulative; +``` + +**Note:** This example uses `$__unixEpochFrom()` and `$__unixEpochTo()` macros from the PostgreSQL Grafana plugin. When using the QuestDB plugin, the equivalent macros are `$__fromTime` and `$__toTime` and don't need epoch conversion as those are native timestamps. + +This creates an overlay chart where yesterday's and today's data align on the same time axis, allowing direct comparison. + +:::info Related Documentation +- [UNION ALL](/docs/reference/sql/union-except-intersect/) +- [Window functions](/docs/reference/sql/over/) +- [Grafana integration](/docs/third-party-tools/grafana/) +::: diff --git a/documentation/playbook/integrations/grafana/read-only-user.md b/documentation/playbook/integrations/grafana/read-only-user.md new file mode 100644 index 000000000..cbc232c9f --- /dev/null +++ b/documentation/playbook/integrations/grafana/read-only-user.md @@ -0,0 +1,82 @@ +--- +title: Configure Read-Only User for Grafana +sidebar_label: Read-only user +description: Set up a read-only PostgreSQL user for Grafana dashboards while maintaining admin access for DDL operations +--- + +Configure a dedicated read-only user for Grafana to improve security by preventing accidental data modifications through dashboards. This allows you to maintain separate credentials for visualization (read-only) and administration (full access), following the principle of least privilege. + +:::note QuestDB Enterprise +For QuestDB Enterprise, use the comprehensive [Role-Based Access Control (RBAC)](/docs/operations/rbac/) system to create granular user permissions and roles. The configuration below applies to QuestDB Open Source. +::: + +## Problem: Separate Read and Write Access + +You want to: +1. Connect Grafana with read-only credentials +2. Prevent accidental `INSERT`, `UPDATE`, `DELETE`, or `DROP` operations from dashboards +3. Still be able to execute DDL statements (`CREATE TABLE`, etc.) from the QuestDB web console + +However, QuestDB's PostgreSQL wire protocol doesn't support standard PostgreSQL user management commands like `CREATE USER` or `GRANT`. + +## Solution: Enable the Read-Only User + +QuestDB Open Source supports a built-in read-only user that can be enabled via configuration. This gives you two users: +- **Admin user** (default: `admin`): Full access for DDL and DML operations +- **Read-only user** (default: `user`): Query-only access for dashboards + +### Configuration + +Add these settings to your `server.conf` file or set them as environment variables: + +**Via server.conf:** +```ini +# Enable the read-only user +pg.readonly.user.enabled=true + +# Optional: Customize username (default is "user") +pg.readonly.user=grafana_reader + +# Optional: Customize password (default is "quest") +pg.readonly.password=secure_password_here +``` + +**Via environment variables:** +```bash +export QDB_PG_READONLY_USER_ENABLED=true +export QDB_PG_READONLY_USER=grafana_reader +export QDB_PG_READONLY_PASSWORD=secure_password_here +``` + +**Via Docker:** +```bash +docker run \ + -p 9000:9000 -p 8812:8812 \ + -e QDB_PG_READONLY_USER_ENABLED=true \ + -e QDB_PG_READONLY_USER=grafana_reader \ + -e QDB_PG_READONLY_PASSWORD=secure_password_here \ + questdb/questdb:latest +``` + +### Using the Read-Only User + +After enabling, you have two separate users: + +**Admin user (web console):** +- Username: `admin` (default) +- Password: `quest` (default) +- Permissions: Full access - `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `CREATE`, `DROP`, `ALTER` +- Use for: QuestDB web console, administrative tasks, schema changes + +**Read-only user (Grafana):** +- Username: `grafana_reader` (or whatever you configured) +- Password: `secure_password_here` (or whatever you configured) +- Permissions: `SELECT` queries only +- Use for: Grafana dashboards, monitoring tools, analytics applications + +:::info Related Documentation +- [PostgreSQL wire protocol](/docs/reference/api/postgres/) +- [QuestDB Enterprise RBAC](/docs/operations/rbac/) +- [Configuration reference](/docs/configuration/) +- [Grafana QuestDB data source](https://grafana.com/grafana/plugins/questdb-questdb-datasource/) +::: diff --git a/documentation/playbook/integrations/grafana/variable-dropdown.md b/documentation/playbook/integrations/grafana/variable-dropdown.md new file mode 100644 index 000000000..a78735a91 --- /dev/null +++ b/documentation/playbook/integrations/grafana/variable-dropdown.md @@ -0,0 +1,270 @@ +--- +title: Grafana Variable Dropdown with Name and Value +sidebar_label: Variable dropdown +description: Create Grafana variable dropdowns that display one value but use another in queries using regex filters +--- + +Create Grafana variable dropdowns where the displayed label differs from the value used in queries. This is useful when you want to show user-friendly names in the dropdown while using different values (like IDs, prices, or technical identifiers) in your actual SQL queries. + +## Problem: Separate Display and Query Values + +You want a Grafana variable dropdown that: +- **Displays:** Readable labels like `"BTC-USDT"`, `"ETH-USDT"`, `"SOL-USDT"` +- **Uses in queries:** Different values like prices (`37779.62`, `2615.54`, `98.23`) or IDs + +For example, with this query result: + +| symbol | price | +|------------|---------| +| BTC-USDT | 37779.62| +| ETH-USDT | 2615.54 | +| SOL-USDT | 98.23 | + +You want the dropdown to show `"BTC-USDT"` but use `37779.62` in your queries. + +## Solution: Use Regex Variable Filters + +When using the QuestDB data source plugin, you can use Grafana's regex variable filters to parse a concatenated string into separate `text` and `value` fields. + +### Step 1: Concatenate Columns in Query + +First, combine both columns into a single string with a separator that doesn't appear in your data: + +```sql +WITH t AS ( + SELECT symbol, first(price) as price + FROM trades + WHERE symbol LIKE '%BTC%' +) +SELECT concat(symbol, '#', price) FROM t; +``` + +**Query results:** +``` +DOGE-BTC#0.00000204 +ETH-BTC#0.05551 +BTC-USDT#37779.62 +SOL-BTC#0.0015282 +MATIC-BTC#0.00002074 +BTC-USDC#60511.1 +``` + +Each row is now a single string with symbol and price separated by `#`. + +### Step 2: Apply Regex Filter in Grafana Variable + +In your Grafana variable configuration: + +**Query:** +```sql +WITH t AS ( + SELECT symbol, first(price) as price + FROM trades + WHERE symbol LIKE '%BTC%' +) +SELECT concat(symbol, '#', price) FROM t; +``` + +**Regex Filter:** +```regex +/(?[^#]+)#(?.*)/ +``` + +This regex pattern: +- `(?[^#]+)`: Captures everything before `#` into the `text` group (the display label) +- `#`: Matches the separator +- `(?.*)`: Captures everything after `#` into the `value` group (the query value) + +### Step 3: Use Variable in Queries + +Now you can reference the variable in your dashboard queries: + +```sql +SELECT timestamp, price +FROM trades +WHERE price = $your_variable_name + AND timestamp >= $__fromTime + AND timestamp <= $__toTime; +``` + +When a user selects "BTC-USDT" from the dropdown, Grafana will substitute the corresponding price value (`37779.62`) into the query. + +## How It Works + +Grafana's regex filter with named capture groups enables the separation: + +1. **Named capture groups**: `(?...)` and `(?...)` tell Grafana which parts to use +2. **`text` group**: Becomes the visible label in the dropdown +3. **`value` group**: Becomes the interpolated value in queries +4. **Pattern matching**: The regex must match the entire string returned by your query + +### Regex Pattern Breakdown + +```regex +/(?[^#]+)#(?.*)/ +``` + +- `/`: Regex delimiters +- `(?...)`: Named capture group called "text" +- `[^#]+`: One or more characters that are NOT `#` (greedy match) +- `#`: Literal separator character +- `(?.*)`: Named capture group called "value" +- `.*`: Zero or more characters of any type (captures rest of string) + +## Choosing a Separator + +Pick a separator that **never** appears in your data: + +**Good separators:** +- `#` - Uncommon in most data +- `|` - Clear visual separator +- `::` - Two characters, unlikely to appear +- `~` - Rarely used in trading symbols or prices +- `^^^` - Multi-character separator for extra safety + +**Bad separators:** +- `-` - Common in trading pairs (BTC-USDT) +- `.` - Common in decimal numbers +- `,` - Common in CSV-like data +- Space - Can cause parsing issues + +## Alternative Patterns + +### Multiple Data Fields + +If you need more than two fields, use additional separators: + +```sql +SELECT concat(symbol, '#', price, '#', volume) FROM trades; +``` + +```regex +/(?[^#]+)#(?[^#]+)#(?.*)/ +``` + +Now you have three captured groups, though Grafana's variable system typically only uses `text` and `value`. + +### Numeric IDs with Descriptions + +Common pattern for entity selection: + +```sql +SELECT concat(name, '#', id) FROM users; +``` + +```regex +/(?[^#]+)#(?\d+)/ +``` + +Output in dropdown: User sees "John Doe", query uses `42`. + +### Escaping Special Characters + +If your data contains regex special characters, escape them in the pattern: + +```sql +-- If data contains parentheses +SELECT concat(name, ' (', id, ')', '#', id) FROM users; +-- Result: "John Doe (42)#42" +``` + +```regex +/(?.*?)#(?\d+)/ +``` + +## PostgreSQL Data Source Alternative + +If using the PostgreSQL data source (instead of the QuestDB plugin), you can use special column aliases: + +```sql +SELECT + symbol AS __text, + price AS __value +FROM trades +WHERE symbol LIKE '%BTC%'; +``` + +The PostgreSQL data source recognizes `__text` and `__value` as special column names for dropdown variables. + +**Note:** This works with the PostgreSQL data source plugin pointing to QuestDB, but NOT with the native QuestDB data source plugin. + +## Adapting the Pattern + +**Different filter conditions:** +```sql +-- Filter by time range +WHERE timestamp IN yesterday() + +-- Filter by multiple criteria +WHERE symbol LIKE '%USDT' AND price > 1000 + +-- Dynamic filter using another variable +WHERE symbol LIKE concat('%', $base_currency, '%') +``` + +**Sorting the dropdown:** +```sql +-- Sort alphabetically by symbol +SELECT concat(symbol, '#', price) FROM trades +ORDER BY symbol; + +-- Sort by price (highest first) +SELECT concat(symbol, '#', price) FROM trades +ORDER BY price DESC; + +-- Sort by volume +WITH t AS ( + SELECT symbol, first(price) as price, sum(amount) as volume + FROM trades + GROUP BY symbol +) +SELECT concat(symbol, '#', price) FROM t +ORDER BY volume DESC; +``` + +**Include additional context in label:** +```sql +-- Show symbol and volume in the label +SELECT concat(symbol, ' (Vol: ', round(sum(amount), 2), ')', '#', first(price)) +FROM trades +GROUP BY symbol; +``` + +Result: "BTC-USDT (Vol: 1234.56)#37779.62" + +## Troubleshooting + +**Dropdown shows concatenated string:** +- Verify the regex pattern is correct +- Check that the regex delimiters are `/.../ ` (forward slashes) +- Ensure named capture groups are spelled correctly: `(?...)` and `(?...)` + +**Variable not interpolating in queries:** +- Verify you're using `$variable_name` syntax in queries +- Check that the variable is defined at the dashboard level +- Test the query manually with a hardcoded value + +**Regex not matching:** +- Test your regex pattern with a regex tester (regex101.com) +- Verify your separator doesn't appear in the data itself +- Check for trailing whitespace in query results + +**Dropdown is empty:** +- Verify the query returns data +- Check that QuestDB is accessible from Grafana +- Review Grafana logs for error messages + +:::tip Multi-Select Variables +This pattern works with multi-select variables too. Enable "Multi-value" in the variable configuration, and users can select multiple options. Use `IN ($variable)` in your queries to handle multiple selected values. +::: + +:::tip Variable Preview +Grafana shows a preview of what the dropdown will look like when you configure the regex filter. Use this to verify your pattern is working correctly before applying it. +::: + +:::info Related Documentation +- [Grafana variables documentation](https://grafana.com/docs/grafana/latest/dashboards/variables/) +- [Grafana regex filters](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#filter-variables-with-regex) +- [concat() function](/docs/reference/function/text/#concat) +- [Grafana QuestDB data source](https://grafana.com/grafana/plugins/questdb-questdb-datasource/) +::: diff --git a/documentation/playbook/integrations/opcua-dense-format.md b/documentation/playbook/integrations/opcua-dense-format.md new file mode 100644 index 000000000..4278cd3fc --- /dev/null +++ b/documentation/playbook/integrations/opcua-dense-format.md @@ -0,0 +1,305 @@ +--- +title: Collect OPC-UA Data with Telegraf in Dense Format +sidebar_label: OPC-UA dense format +description: Configure Telegraf to merge sparse OPC-UA metrics into dense rows for efficient storage and querying in QuestDB +--- + +Configure Telegraf to collect OPC-UA industrial automation data and insert it into QuestDB in a dense format. By default, Telegraf creates one row per metric with sparse columns, but for QuestDB it's more efficient to merge all metrics from the same timestamp into a single dense row. + +## Problem: Sparse Data Format + +When using Telegraf's OPC-UA input plugin with the default configuration, each metric value generates a separate row. Even when multiple metrics are collected at the same timestamp, they arrive as individual sparse rows: + +**Sparse format (inefficient):** + +| timestamp | ServerLoad | ServerRAM | ServerIO | +|------------------------------|------------|-----------|----------| +| 2024-01-15T10:00:00.000000Z | 45.2 | NULL | NULL | +| 2024-01-15T10:00:00.000000Z | NULL | 8192.0 | NULL | +| 2024-01-15T10:00:00.000000Z | NULL | NULL | 1250.5 | + +This wastes storage space and makes queries more complex. + +**Dense format (efficient):** + +| timestamp | ServerLoad | ServerRAM | ServerIO | +|------------------------------|------------|-----------|----------| +| 2024-01-15T10:00:00.000000Z | 45.2 | 8192.0 | 1250.5 | + +## Solution: Use Telegraf's Merge Aggregator + +Configure Telegraf to merge metrics with matching timestamps and tags before sending to QuestDB. This requires two key changes: + +1. Add a common tag to all metrics +2. Use the `merge` aggregator to combine rows + +### Complete Configuration + +```toml +[agent] + omit_hostname = true + +# OPC-UA Input Plugin +[[inputs.opcua]] + endpoint = "${OPCUA_ENDPOINT}" + connect_timeout = "30s" + request_timeout = "30s" + security_policy = "None" + security_mode = "None" + auth_method = "Anonymous" + name_override = "${METRICS_TABLE_NAME}" + + [[inputs.opcua.nodes]] + name = "ServerLoad" + namespace = "2" + identifier_type = "s" + identifier = "Server/Load" + default_tags = { source="opcua_merge" } + + [[inputs.opcua.nodes]] + name = "ServerRAM" + namespace = "2" + identifier_type = "s" + identifier = "Server/RAM" + default_tags = { source="opcua_merge" } + + [[inputs.opcua.nodes]] + name = "ServerIO" + namespace = "2" + identifier_type = "s" + identifier = "Server/IO" + default_tags = { source="opcua_merge" } + +# Merge Aggregator +[[aggregators.merge]] + drop_original = true + tags = ["source"] + +# QuestDB Output via ILP +[[outputs.influxdb_v2]] + urls = ["${QUESTDB_HTTP_ENDPOINT}"] + token = "${QUESTDB_HTTP_TOKEN}" + content_encoding = "identity" +``` + +### Key Configuration Elements + +**1. Common Tag** + +```toml +default_tags = { source="opcua_merge" } +``` + +Adds the same tag value (`source="opcua_merge"`) to all metrics. The merge aggregator uses this to identify which metrics should be combined. + +**2. Merge Aggregator** + +```toml +[[aggregators.merge]] + drop_original = true + tags = ["source"] +``` + +- `drop_original = true`: Discards the original sparse rows after merging +- `tags = ["source"]`: Merges metrics with matching `source` tag values and the same timestamp + +**3. QuestDB Output** + +```toml +[[outputs.influxdb_v2]] + urls = ["${QUESTDB_HTTP_ENDPOINT}"] + content_encoding = "identity" +``` + +- Uses the InfluxDB Line Protocol (ILP) over HTTP +- `content_encoding = "identity"`: Disables gzip compression (QuestDB doesn't require it) + +## How It Works + +The data flow is: + +1. **OPC-UA server** → Telegraf collects metrics +2. **Telegraf input** → Creates separate rows for each metric with the `source="opcua_merge"` tag +3. **Merge aggregator** → Combines rows with matching timestamp + `source` tag +4. **QuestDB output** → Sends merged dense rows via ILP + +### Merging Logic + +The merge aggregator combines metrics when: +- **Timestamps match**: Metrics collected at the same moment +- **Tags match**: All specified tags (in this case, `source`) have the same values + +If metrics have different timestamps or tag values, they won't be merged. + +## Handling Tag Conflicts + +If your OPC-UA nodes have additional tags with **different** values, those tags will prevent merging. Solutions: + +### Remove Conflicting Tags + +Use the `override` processor to remove unwanted tags: + +```toml +[[processors.override]] + [processors.override.tags] + node_id = "" # Removes the 'node_id' tag + namespace = "" # Removes the 'namespace' tag +``` + +### Convert Tags to Fields + +Use the `converter` processor to convert tags to fields (fields don't affect merging): + +```toml +[[processors.converter]] + [processors.converter.tags] + string = ["node_id", "namespace"] +``` + +This converts the tags to string fields, which won't interfere with the merge aggregator. + +### Remove the Common Tag After Merging + +If you don't want the `source` tag in your final QuestDB table: + +```toml +# Place this AFTER the merge aggregator +[[processors.override]] + [processors.override.tags] + source = "" # Removes the 'source' tag +``` + +## Environment Variables + +Use environment variables for sensitive configuration: + +```bash +export OPCUA_ENDPOINT="opc.tcp://your-opcua-server:4840" +export METRICS_TABLE_NAME="industrial_metrics" +export QUESTDB_HTTP_ENDPOINT="http://questdb-host:9000" +export QUESTDB_HTTP_TOKEN="your_token_here" +``` + +Alternatively, use a `.env` file: + +```bash +# .env file +OPCUA_ENDPOINT=opc.tcp://localhost:4840 +METRICS_TABLE_NAME=opcua_metrics +QUESTDB_HTTP_ENDPOINT=http://localhost:9000 +QUESTDB_HTTP_TOKEN= +``` + +Then start Telegraf with: + +```bash +telegraf --config telegraf.conf +``` + +## Verification + +Query QuestDB to verify the data format: + +```sql +SELECT * FROM opcua_metrics +ORDER BY timestamp DESC +LIMIT 10; +``` + +**Expected: Dense rows** with all metrics populated: + +| timestamp | source | ServerLoad | ServerRAM | ServerIO | +|------------------------------|-------------|------------|-----------|----------| +| 2024-01-15T10:05:00.000000Z | opcua_merge | 47.8 | 8256.0 | 1305.2 | +| 2024-01-15T10:04:00.000000Z | opcua_merge | 45.2 | 8192.0 | 1250.5 | + +**Problem: Sparse rows** with NULL values: + +| timestamp | source | ServerLoad | ServerRAM | ServerIO | +|------------------------------|-------------|------------|-----------|----------| +| 2024-01-15T10:05:00.000000Z | opcua_merge | 47.8 | NULL | NULL | +| 2024-01-15T10:05:00.000000Z | opcua_merge | NULL | 8256.0 | NULL | + +If you see sparse rows, check: +- All nodes have the same `default_tags` +- The merge aggregator is configured correctly +- Timestamps are identical (not just close) + +## Alternative: TCP Output + +For higher throughput, use TCP instead of HTTP: + +```toml +[[outputs.socket_writer]] + address = "tcp://questdb-host:9009" +``` + +**Differences:** +- **TCP**: Higher throughput, no acknowledgments, potential data loss on connection failure +- **HTTP**: Reliable delivery, acknowledgments, slightly lower throughput + +Choose TCP when: +- You need maximum performance +- Occasional data loss is acceptable +- You're on a reliable local network + +Choose HTTP when: +- Data integrity is critical +- You need error feedback +- You're sending over the internet + +## Multiple OPC-UA Sources + +To collect from multiple OPC-UA servers into separate tables: + +```toml +# Server 1 +[[inputs.opcua]] + endpoint = "opc.tcp://server1:4840" + name_override = "server1_metrics" + [[inputs.opcua.nodes]] + name = "Temperature" + namespace = "2" + identifier_type = "s" + identifier = "Sensor/Temp" + default_tags = { source="server1" } + +# Server 2 +[[inputs.opcua]] + endpoint = "opc.tcp://server2:4840" + name_override = "server2_metrics" + [[inputs.opcua.nodes]] + name = "Pressure" + namespace = "2" + identifier_type = "s" + identifier = "Sensor/Press" + default_tags = { source="server2" } + +# Merge by source tag +[[aggregators.merge]] + drop_original = true + tags = ["source"] +``` + +This creates two tables (`server1_metrics`, `server2_metrics`) with merged metrics from each server. + +:::tip Performance Tuning +For high-frequency OPC-UA data: +- Increase Telegraf's `flush_interval` to batch more data +- Use `aggregators.merge.period` to specify merge window duration +- Monitor QuestDB's ingestion rate and adjust accordingly +::: + +:::warning Timestamp Precision +OPC-UA timestamps may have different precision than QuestDB expects. Ensure: +- Telegraf agent precision matches your requirements (default: nanoseconds) +- OPC-UA server timestamps are synchronized (use NTP) +- Clock drift between systems is minimal +::: + +:::info Related Documentation +- [Telegraf OPC-UA plugin](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/opcua) +- [Telegraf merge aggregator](https://github.com/influxdata/telegraf/tree/master/plugins/aggregators/merge) +- [QuestDB ILP reference](/docs/reference/api/ilp/overview/) +- [InfluxDB Line Protocol](/docs/reference/api/ilp/overview/) +::: diff --git a/documentation/playbook/operations/check-transaction-applied.md b/documentation/playbook/operations/check-transaction-applied.md new file mode 100644 index 000000000..d52c4026d --- /dev/null +++ b/documentation/playbook/operations/check-transaction-applied.md @@ -0,0 +1,35 @@ +--- +title: Check Transaction Applied After Ingestion +sidebar_label: Check transaction applied +description: Verify that all ingested rows to a WAL table are visible for queries in QuestDB +--- + +When ingesting data to a WAL table using ILP protocol, inserts are asynchronous. This playbook shows how to ensure all ingested rows are visible for read-only queries. + +## Problem + +You're performing a single-time ingestion of a large data volume using ILP protocol to a table that uses Write-Ahead Log (WAL). Since inserts are asynchronous, you need to confirm that all ingested rows are visible for read-only queries before proceeding with operations. + +## Solution + +Query the `wal_tables()` function to check if the writer transaction matches the sequencer transaction. When these values match, all rows have become visible: + +```questdb-sql demo title="Check applied transactions from WAL files" +SELECT * +FROM wal_tables() +WHERE name = 'core_price' AND writerTxn = sequencerTxn; +``` + +This query returns a row when `writerTxn` equals `sequencerTxn` for your table: +- `writerTxn` is the last committed transaction available for read-only queries +- `sequencerTxn` is the last transaction committed to WAL + +When they match, all WAL transactions have been applied and all rows are visible for queries. + +Another viable approach is to run `SELECT count(*) FROM my_table` and verify the expected row count. + +:::info Related Documentation +- [Write-Ahead Log concept](/docs/concept/write-ahead-log/) +- [Meta functions reference](/docs/reference/function/meta/) +- [InfluxDB Line Protocol overview](/docs/reference/api/ilp/overview/) +::: diff --git a/documentation/playbook/operations/copy-data-between-instances.md b/documentation/playbook/operations/copy-data-between-instances.md new file mode 100644 index 000000000..94b74b9e0 --- /dev/null +++ b/documentation/playbook/operations/copy-data-between-instances.md @@ -0,0 +1,39 @@ +--- +title: Copy Data Between QuestDB Instances +sidebar_label: Copy data between instances +description: Copy a subset of data from production to development QuestDB instances +--- + +Copy a subset of data from one QuestDB instance to another for testing or development purposes. + +## Problem + +You want to copy data between QuestDB instances. This method allows you to copy any arbitrary query result, but if you want a full database copy please check the [backup and restore documentation](/docs/operations/backup/). + +## Solution: Table2Ilp Utility + +QuestDB ships with a `utils` folder that includes a tool to read from one instance (using the PostgreSQL protocol) and write into another (using ILP). + +You would need to [compile the jar](https://github.com/questdb/questdb/tree/master/utils), and then use it like this: + +```shell +java -cp utils.jar io.questdb.cliutil.Table2Ilp \ + -d trades \ + -dilp "https::addr=localhost:9000;username=admin;password=quest;" \ + -s "trades WHERE start_time in '2022-06'" \ + -sc "jdbc:postgresql://localhost:8812/qdb?user=account&password=secret&ssl=false" \ + -sym "ticker,exchange" \ + -sts start_time +``` + +This reads from the source instance using PostgreSQL wire protocol and writes to the destination using ILP. + +## Alternative: Export Endpoint + +You can also use [the export endpoint](/docs/reference/api/rest/#exp---export-data) to export data to CSV or other formats. + +:::info Related Documentation +- [ILP ingestion](/docs/ingestion-overview/) +- [PostgreSQL wire protocol](/docs/reference/api/postgres/) +- [REST API export](/docs/reference/api/rest/#exp---export-data) +::: diff --git a/documentation/playbook/operations/csv-import-milliseconds.md b/documentation/playbook/operations/csv-import-milliseconds.md new file mode 100644 index 000000000..4ea485be2 --- /dev/null +++ b/documentation/playbook/operations/csv-import-milliseconds.md @@ -0,0 +1,67 @@ +--- +title: Import CSV with Millisecond Timestamps +sidebar_label: CSV import with milliseconds +description: Import CSV files with epoch millisecond timestamps into QuestDB +--- + +Import CSV files containing epoch timestamps in milliseconds into QuestDB, which expects microseconds. + +## Problem + +QuestDB does not support flags for timestamp conversion during CSV import. + +## Solution Options + +Here are the options available: + +### Option 1: Pre-process the Dataset + +Convert timestamps from milliseconds to microseconds before import. If importing lots of data, create Parquet files, copy them to the QuestDB import folder, and read them with `read_parquet('file.parquet')`. Then use `INSERT INTO SELECT` to copy to another table. + +### Option 2: Staging Table + +Import into a non-partitioned table as DATE, then `INSERT INTO` a partitioned table as TIMESTAMP: + +```sql +-- Create staging table +CREATE TABLE trades_staging ( + timestamp_ms LONG, + symbol SYMBOL, + price DOUBLE, + amount DOUBLE +); + +-- Import CSV to staging table (via web console or REST API) + +-- Create final table +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX, + price DOUBLE, + amount DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY; + +-- Convert and insert +INSERT INTO trades +SELECT + cast(timestamp_ms * 1000 AS TIMESTAMP) as timestamp, + symbol, + price, + amount +FROM trades_staging; + +-- Drop staging table +DROP TABLE trades_staging; +``` + +You would be using twice the storage temporarily, but then you can drop the initial staging table. + +### Option 3: ILP Client + +Read the CSV line-by-line and convert, then send via the ILP client. + +:::info Related Documentation +- [CSV import](/docs/web-console/import-csv/) +- [ILP ingestion](/docs/ingestion-overview/) +- [read_parquet()](/docs/reference/function/parquet/) +::: diff --git a/documentation/playbook/operations/docker-compose-config.md b/documentation/playbook/operations/docker-compose-config.md new file mode 100644 index 000000000..86b739755 --- /dev/null +++ b/documentation/playbook/operations/docker-compose-config.md @@ -0,0 +1,137 @@ +--- +title: Configure QuestDB with Docker Compose +sidebar_label: Docker Compose config +description: Override QuestDB configuration parameters using environment variables in Docker Compose +--- + +You can override any QuestDB configuration parameter using environment variables in Docker Compose. This is useful for setting custom ports, authentication credentials, memory limits, and other operational settings without modifying configuration files. + +## Environment Variable Format + +To override configuration parameters via environment variables: + +1. **Prefix with `QDB_`**: Add `QDB_` before the parameter name +2. **Capitalize**: Convert to uppercase +3. **Replace dots with underscores**: Change `.` to `_` + +For example: +- `pg.user` becomes `QDB_PG_USER` +- `pg.password` becomes `QDB_PG_PASSWORD` +- `cairo.sql.copy.buffer.size` becomes `QDB_CAIRO_SQL_COPY_BUFFER_SIZE` + +## Example: Custom PostgreSQL Credentials + +This Docker Compose file overrides the default PostgreSQL wire protocol credentials: + +```yaml title="docker-compose.yml - Override pg.user and pg.password" +version: "3.9" + +services: + questdb: + image: questdb/questdb + container_name: custom_questdb + restart: always + ports: + - "8812:8812" + - "9000:9000" + - "9009:9009" + - "9003:9003" + extra_hosts: + - "host.docker.internal:host-gateway" + environment: + - QDB_PG_USER=borat + - QDB_PG_PASSWORD=clever_password + volumes: + - ./questdb/questdb_root:/var/lib/questdb/ +``` + +This configuration: +- Sets PostgreSQL wire protocol username to `borat` +- Sets password to `clever_password` +- Persists data to `./questdb/questdb_root` on the host machine +- Exposes all QuestDB ports (web console, HTTP, ILP, PostgreSQL wire) + +## Common Configuration Examples + +### Increase Write Buffer Size + +```yaml title="Increase buffer sizes for high-throughput writes" +environment: + - QDB_CAIRO_SQL_COPY_BUFFER_SIZE=4194304 + - QDB_LINE_TCP_MAINTENANCE_JOB_INTERVAL=500 +``` + +### Configure Memory Limits + +```yaml title="Set memory allocation for query execution" +environment: + - QDB_CAIRO_SQL_PAGE_FRAME_MAX_ROWS=1000000 + - QDB_CAIRO_SQL_PAGE_FRAME_MIN_ROWS=100000 +``` + +### Enable Debug Logging + +```yaml title="Enable verbose logging for troubleshooting" +environment: + - QDB_LOG_LEVEL=DEBUG + - QDB_LOG_BUFFER_SIZE=1048576 +``` + +### Custom Data Directory Permissions + +```yaml title="Run with specific user/group for volume permissions" +services: + questdb: + image: questdb/questdb + user: "1000:1000" + environment: + - QDB_CAIRO_ROOT=/var/lib/questdb + volumes: + - ./questdb_data:/var/lib/questdb +``` + +## Complete Configuration Reference + +For a full list of available configuration parameters, see: +- [Server Configuration Reference](/docs/configuration/) - All configurable parameters with descriptions +- [Docker Deployment Guide](/docs/deployment/docker/) - Docker-specific setup instructions + +## Verifying Configuration + +After starting QuestDB with custom configuration, verify the settings: + +1. **Check logs**: View container logs to confirm configuration was applied: + ```bash + docker compose logs questdb + ``` + +2. **Query system tables**: Connect via PostgreSQL wire protocol and query configuration: + ```sql + SELECT * FROM sys.config; + ``` + +3. **Web console**: Access the web console at `http://localhost:9000` and check the "Configuration" tab + +:::tip +Keep sensitive configuration like passwords in a `.env` file and reference them in `docker-compose.yml`: + +```yaml +environment: + - QDB_PG_PASSWORD=${QUESTDB_PASSWORD} +``` + +Then create a `.env` file: +``` +QUESTDB_PASSWORD=your_secure_password +``` +::: + +:::warning Volume Permissions +If you encounter permission errors with mounted volumes, ensure the QuestDB container user has write access to the host directory. You may need to set ownership with `chown -R 1000:1000 ./questdb_root` or run the container with a specific user ID. +::: + +:::info Related Documentation +- [Server Configuration](/docs/configuration/) +- [Docker Deployment Guide](/docs/deployment/docker/) +- [PostgreSQL Wire Protocol](/docs/reference/api/postgres/) +::: diff --git a/documentation/playbook/operations/optimize-many-tables.md b/documentation/playbook/operations/optimize-many-tables.md new file mode 100644 index 000000000..3d0b4e570 --- /dev/null +++ b/documentation/playbook/operations/optimize-many-tables.md @@ -0,0 +1,37 @@ +--- +title: Optimize Disk and Memory Usage with Many Tables +sidebar_label: Optimize for many tables +description: Reduce memory and disk usage when running QuestDB with many tables by adjusting memory allocation and disk chunk sizes +--- + +When operating QuestDB with many tables, the default settings may consume more memory and disk space than necessary. This playbook shows how to optimize these resources. + +## Problem + +QuestDB allocates memory for out-of-order inserts per column and table. With the default setting of `cairo.o3.column.memory.size=256K`, each table and column uses 512K of memory (2x the configured size). When you have many tables, this memory overhead can become significant. + +Similarly, QuestDB allocates disk space in chunks for columns and indexes. While larger chunks make sense for a single large table, multiple smaller tables benefit from smaller allocation sizes, which can noticeably decrease disk storage usage. + +## Solution + +Reduce memory allocation for out-of-order inserts by setting a smaller `cairo.o3.column.memory.size`. Start with 128K and adjust based on your needs: + +``` +cairo.o3.column.memory.size=128K +``` + +Reduce disk space allocation by configuring smaller page sizes for data and indexes: + +``` +cairo.system.writer.data.append.page.size=128K +cairo.writer.data.append.page.size=128K +cairo.writer.data.index.key.append.page.size=128K +cairo.writer.data.index.value.append.page.size=128K +``` + +These settings should be added to your `server.conf` file or set as environment variables. + +:::info Related Documentation +- [Configuration reference](/docs/configuration/) +- [Capacity planning](/docs/operations/capacity-planning/) +::: diff --git a/documentation/playbook/operations/query-times-histogram.md b/documentation/playbook/operations/query-times-histogram.md new file mode 100644 index 000000000..c1e4ee87a --- /dev/null +++ b/documentation/playbook/operations/query-times-histogram.md @@ -0,0 +1,113 @@ +--- +title: Query Performance Histogram +sidebar_label: Query times histogram +description: Create histogram of query execution times using _query_trace table +--- + +Create a histogram of query execution times using the `_query_trace` system table. + +## Solution: Percentile-Based Histogram + +We can create a subquery that first calculates the percentiles for each bucket, in this case at 10% intervals. Then on a second query we can do a `UNION` of 10 subqueries where each is doing a `CROSS JOIN` against the calculated percentiles and finding how many queries are below the threshold for the bucket. + +Note in this case the histogram is cumulative, and each bucket includes the results from the smaller buckets as well. If we prefer non-cumulative, the condition would change from less than to `BETWEEN`. + +```sql +WITH quantiles AS ( + SELECT + approx_percentile(execution_micros, 0.10, 5) AS p10, + approx_percentile(execution_micros, 0.20, 5) AS p20, + approx_percentile(execution_micros, 0.30, 5) AS p30, + approx_percentile(execution_micros, 0.40, 5) AS p40, + approx_percentile(execution_micros, 0.50, 5) AS p50, + approx_percentile(execution_micros, 0.60, 5) AS p60, + approx_percentile(execution_micros, 0.70, 5) AS p70, + approx_percentile(execution_micros, 0.80, 5) AS p80, + approx_percentile(execution_micros, 0.90, 5) AS p90, + approx_percentile(execution_micros, 1.0, 5) AS p100 + FROM _query_trace +), cumulative_hist AS ( +SELECT '10' AS bucket, p10 as micros_threshold, count(*) AS frequency +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p10 + +UNION ALL + +SELECT '20', p20 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p20 + +UNION ALL + +SELECT '30', p30 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p30 + +UNION ALL + +SELECT '40', p40 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p40 + +UNION ALL + +SELECT '50', p50 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p50 + +UNION ALL + +SELECT '60', p60 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p60 + +UNION ALL + +SELECT '70', p70 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p70 + +UNION ALL + +SELECT '80', p80 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p80 + +UNION ALL + +SELECT '90', p90 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles +WHERE execution_micros < p90 + +UNION ALL + +SELECT '100', p100 as micros_threshold, count(*) +FROM _query_trace CROSS JOIN quantiles + ) + SELECT * FROM cumulative_hist; +``` + +**Output:** + +```csv +"bucket","micros_threshold","frequency" +"10",215.0,26 +"20",348.0,53 +"30",591.0,80 +"40",819.0,106 +"50",1088.0,133 +"60",1527.0,160 +"70",2293.0,186 +"80",4788.0,213 +"90",23016.0,240 +"100",1078759.0,267 +``` + +:::note Enable Query Tracing +Query tracing needs to be enabled for the `_query_trace` table to be populated. See the [configuration documentation](/docs/configuration/) for details. +::: + +:::info Related Documentation +- [Query tracing](/docs/concept/query-tracing/) +- [approx_percentile() function](/docs/reference/function/aggregation/#approx_percentile) +::: diff --git a/documentation/playbook/operations/show-non-default-params.md b/documentation/playbook/operations/show-non-default-params.md new file mode 100644 index 000000000..3ca535401 --- /dev/null +++ b/documentation/playbook/operations/show-non-default-params.md @@ -0,0 +1,27 @@ +--- +title: Show Parameters with Non-Default Values +sidebar_label: Show non-default params +description: List all QuestDB configuration parameters that have been modified from their default values +--- + +When troubleshooting or auditing your QuestDB configuration, it's useful to see which parameters have been changed from their defaults. + +## Problem + +You need to identify which configuration parameters have been explicitly set via the configuration file or environment variables, filtering out all parameters that are still using their default values. + +## Solution + +Query the `SHOW PARAMETERS` command and filter by `value_source` to exclude defaults: + +```questdb-sql demo title="Find which params where modified from default values" +-- Show all parameters modified from their defaults, via conf file or env variable +(SHOW PARAMETERS) WHERE value_source <> 'default'; +``` + +This query returns only the parameters that have been explicitly configured, showing their current values and the source of the configuration (e.g., `conf` file or `env` variable). + +:::info Related Documentation +- [SHOW PARAMETERS reference](/docs/reference/sql/show/#show-parameters) +- [Configuration reference](/docs/configuration/) +::: diff --git a/documentation/playbook/operations/store-questdb-metrics.md b/documentation/playbook/operations/store-questdb-metrics.md new file mode 100644 index 000000000..2fef0c346 --- /dev/null +++ b/documentation/playbook/operations/store-questdb-metrics.md @@ -0,0 +1,57 @@ +--- +title: Store QuestDB Metrics in QuestDB +sidebar_label: Store QuestDB metrics +description: Scrape QuestDB Prometheus metrics using Telegraf and store them in QuestDB +--- + +Store QuestDB's operational metrics in QuestDB itself by scraping Prometheus metrics using Telegraf. + +## Solution: Telegraf Configuration + +You could use Prometheus to scrape those metrics, but you can also use any server agent that understands the Prometheus format. It turns out Telegraf has input plugins for Prometheus and output plugins for QuestDB, so you can use it to get the metrics from the endpoint and insert them into a QuestDB table. + +This is a `telegraf.conf` configuration which works (using default ports): + +```toml +# Configuration for Telegraf agent +[agent] + ## Default data collection interval for all inputs + interval = "5s" + omit_hostname = true + precision = "1ms" + flush_interval = "5s" + +# -- INPUT PLUGINS ------------------------------------------------------ # +[[inputs.prometheus]] + ## An array of urls to scrape metrics from. + urls = ["http://questdb-origin:9003/metrics"] + url_tag="" + metric_version = 2 # all entries will be on a single table + ignore_timestamp = false + +# -- AGGREGATOR PLUGINS ------------------------------------------------- # +# Merge metrics into multifield metrics by series key +[[aggregators.merge]] + ## If true, the original metric will be dropped by the + ## aggregator and will not get sent to the output plugins. + drop_original = true + + +# -- OUTPUT PLUGINS ----------------------------------------------------- # +[[outputs.socket_writer]] + # Write metrics to a local QuestDB instance over TCP + address = "tcp://questdb-target:9009" +``` + +A few things to note: +* I omit the hostname, so I don't end up with an extra column I don't need. If I was monitoring several QuestDB instances, it would be good to keep it. +* I set the `url_tag` to blank because of the same reason. By default the Prometheus plugin for Telegraf adds the url as an extra column and we don't need it. +* I am using `metric_version` 2 for the input plugin. This is to make sure I get all the metrics into a single table, rather than one table for each different metric, which I find annoying. +* I am using the aggregator so metrics get rolled-up into a single row per data point (with multiple columns), rather than one row per metric. Without the aggregator it works fine, but you end up with a very sparse table. +* On my config, I used a different hostname for the QuestDB output, so we can collect metrics on a different instance. For production this would be a best practice, but for development you can just use the same host you are monitoring. + +:::info Related Documentation +- [QuestDB metrics](/docs/operations/logging-metrics/) +- [ILP ingestion](/docs/ingestion-overview/) +- [Telegraf documentation](https://docs.influxdata.com/telegraf/) +::: diff --git a/documentation/playbook/operations/tls-pgbouncer.md b/documentation/playbook/operations/tls-pgbouncer.md new file mode 100644 index 000000000..0840f4274 --- /dev/null +++ b/documentation/playbook/operations/tls-pgbouncer.md @@ -0,0 +1,55 @@ +--- +title: TLS with PgBouncer for QuestDB +sidebar_label: TLS with PgBouncer +description: Configure PgBouncer to provide TLS termination for QuestDB PostgreSQL connections +--- + +Configure PgBouncer to provide TLS termination for QuestDB Open Source PostgreSQL wire protocol connections. + +## Solution: TLS Termination at PgBouncer + +QuestDB Open Source does not implement TLS on the PostgreSQL wire protocol, so TLS termination needs to be done at the PgBouncer level. + +Configure PgBouncer with: + +```ini +[databases] +questdb = host=127.0.0.1 port=8812 dbname=questdb user=admin password=quest + +[pgbouncer] +listen_addr = 127.0.0.1 +listen_port = 5432 +auth_type = trust +auth_file = /path/to/pgbouncer/userlist.txt + +client_tls_sslmode = require +client_tls_key_file = /path/to/pgbouncer/pgbouncer.key +client_tls_cert_file = /path/to/pgbouncer/pgbouncer.crt +client_tls_ca_file = /etc/ssl/cert.pem + +server_tls_sslmode = disable +logfile = /path/to/pgbouncer/pgbouncer.log +pidfile = /path/to/pgbouncer/pgbouncer.pid +``` + +The key setting is `server_tls_sslmode = disable`. This makes psql connect using TLS to PgBouncer, but PgBouncer will connect without TLS to your QuestDB instance. + +Connect with: + +```bash +psql "host=127.0.0.1 port=5432 dbname=questdb user=admin sslmode=require" +``` + +:::warning Unencrypted Traffic +Traffic will be unencrypted between PgBouncer and QuestDB. This setup is only suitable when both services run on the same host or within a trusted network. +::: + +:::note QuestDB Enterprise +For QuestDB Enterprise, there is native TLS support, so you can connect directly with TLS or use PgBouncer with full TLS end-to-end encryption. +::: + +:::info Related Documentation +- [PostgreSQL wire protocol](/docs/reference/api/postgres/) +- [QuestDB security](/docs/guides/architecture/security/) +- [PgBouncer documentation](https://www.pgbouncer.org/config.html) +::: diff --git a/documentation/playbook/overview.md b/documentation/playbook/overview.md new file mode 100644 index 000000000..897672388 --- /dev/null +++ b/documentation/playbook/overview.md @@ -0,0 +1,44 @@ +--- +title: Playbook Overview +sidebar_label: Overview +description: Quick recipes and practical examples for common QuestDB tasks and queries +--- + +The Playbook is a collection of **short, actionable recipes** that demonstrate how to accomplish specific tasks with QuestDB. Each recipe follows a problem-solution-result format, making it easy to find and apply solutions quickly. + +## What is the Playbook? + +Unlike comprehensive reference documentation, the Playbook focuses on practical examples for: + +- **Common SQL patterns** - Window functions, pivoting, time-series aggregations +- **Programmatic integration** - Language-specific client examples +- **Operations** - Deployment and configuration tasks + +Each recipe provides a focused solution to a specific problem, with working code examples and expected results. + +## Structure + +The Playbook is organized into three main sections: + +- **SQL Recipes** - Common SQL patterns, window functions, and time-series queries +- **Programmatic** - Language-specific client examples and integration patterns +- **Operations** - Deployment, configuration, and operational tasks + +## Running the Examples + +**Most recipes run directly on our [live demo instance at demo.questdb.com](https://demo.questdb.com)** without any local setup. Queries that can be executed on the demo site are marked with a direct link to run them. + +For recipes that require write operations or specific configuration, the recipe will indicate what setup is needed. + +The demo instance contains live FX market data with tables for core prices and order book snapshots. See the [Demo Data Schema](/docs/playbook/demo-data-schema/) page for details about available tables and their structure. + +## Using the Playbook + +Each recipe follows a consistent format: + +1. **Problem statement** - What you're trying to accomplish +2. **Solution** - Code example with explanation +3. **Results** - Expected output or verification +4. **Additional context** - Tips, variations, or related documentation links + +Start by browsing the SQL Recipes section for common patterns, or jump directly to the recipe that matches your needs. diff --git a/documentation/playbook/programmatic/cpp/missing-columns.md b/documentation/playbook/programmatic/cpp/missing-columns.md new file mode 100644 index 000000000..9340c6643 --- /dev/null +++ b/documentation/playbook/programmatic/cpp/missing-columns.md @@ -0,0 +1,117 @@ +--- +title: Handle Missing Columns in C++ Client +sidebar_label: Missing columns +description: Send rows with optional columns using the QuestDB C++ client by conditionally calling column methods +--- + +Send rows with missing or optional columns to QuestDB using the C++ client. + +## Problem + +In Python, you can handle missing columns easily with dictionaries: + +```python +{"price1": 10.0, "price2": 10.1} +``` + +And if price2 is not available: + +```python +{"price1": 10.0, "price2": None} +``` + +Which is equivalent to: + +```python +{"price1": 10.0} +``` + +You can pass the dict as the columns argument to `sender.rows` and it transparently sends the rows, with or without missing columns, to the server. + +In C++, the buffer API requires explicit method calls: + +```cpp +buffer + .table("trades") + .symbol("symbol", "ETH-USD") + .symbol("side", "sell") + .column("price", 2615.54) + .column("amount", 0.00044) + .at(questdb::ingress::timestamp_nanos::now()); + +sender.flush(buffer); +``` + +How do you handle "ragged" rows with missing columns in C++? + +## Solution + +You need to call `at` at the end of the buffer so the data gets queued to be sent, but you can call `symbol` and `column` as many times as needed for each row, and you can do this conditionally. + +The example below builds a vector with three rows, one of them with an empty column, then it iterates over the vector and checks if the optional `price` column is null. If it is, it skips invoking `column` for the buffer on that column. + +```cpp +#include +#include +#include +#include +#include +#include + +int main() +{ + try + { + auto sender = questdb::ingress::line_sender::from_conf( + "http::addr=localhost:9000;username=admin;password=quest;retry_timeout=20000;"); + + auto now = std::chrono::system_clock::now(); + auto duration = now.time_since_epoch(); + auto nanos = std::chrono::duration_cast(duration).count(); + + struct Row { + std::string symbol; + std::string side; + std::optional price; + double amount; + }; + + std::vector rows = { + {"ETH-USD", "sell", 2615.54, 0.00044}, + {"BTC-USD", "sell", 39269.98, 0.001}, + {"SOL-USD", "sell", std::nullopt, 5.5} // Missing price + }; + + questdb::ingress::line_sender_buffer buffer; + + for (const auto& row : rows) { + buffer.table("trades") + .symbol("symbol", row.symbol) + .symbol("side", row.side); + + if (row.price.has_value()) { + buffer.column("price", row.price.value()); + } + + buffer.column("amount", row.amount) + .at(questdb::ingress::timestamp_nanos(nanos)); + } + + sender.flush(buffer); + sender.close(); + + std::cout << "Data successfully sent!" << std::endl; + return 0; + } + catch (const questdb::ingress::line_sender_error& err) + { + std::cerr << "Error running example: " << err.what() << std::endl; + return 1; + } +} +``` + +:::info Related Documentation +- [QuestDB C++ client documentation](https://github.com/questdb/c-questdb-client) +- [ILP reference](/docs/reference/api/ilp/overview/) +::: diff --git a/documentation/playbook/programmatic/php/inserting-ilp.md b/documentation/playbook/programmatic/php/inserting-ilp.md new file mode 100644 index 000000000..00d5c28a6 --- /dev/null +++ b/documentation/playbook/programmatic/php/inserting-ilp.md @@ -0,0 +1,357 @@ +--- +title: Insert Data from PHP Using ILP +sidebar_label: Inserting via ILP +description: Send time-series data from PHP to QuestDB using the InfluxDB Line Protocol +--- + +QuestDB doesn't maintain an official PHP library, but since the ILP (InfluxDB Line Protocol) is text-based, you can easily send your data using PHP's built-in HTTP or socket functions, or use the official InfluxDB PHP client library. + +## Available Approaches + +This guide covers three methods for sending ILP data to QuestDB from PHP: + +1. **HTTP with cURL** (recommended for most use cases) + - Full control over ILP formatting and timestamps + - No external dependencies beyond PHP's built-in cURL + - Requires manual ILP string construction + +2. **InfluxDB v2 PHP Client** (easiest to use) + - Clean Point builder API + - Automatic batching and error handling + - **Limitation:** Cannot use custom timestamps with QuestDB (must use server timestamps) + - Requires Composer packages: `influxdata/influxdb-client-php` and `guzzlehttp/guzzle` + +3. **TCP Socket** (highest throughput) + - Best performance for high-volume scenarios + - No acknowledgments - data loss possible + - Manual implementation required + +## ILP Protocol Overview + +The ILP protocol allows you to send data to QuestDB using a simple line-based text format: + +``` +table_name,comma_separated_symbols comma_separated_non_symbols optional_timestamp\n +``` + +Each line represents one row of data. For example, these two lines are well-formed ILP messages: + +``` +readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n +readings,city=Bristol,make=Honeywell temperature=23.2,humidity=0.443\n +``` + +The format consists of: +- **Table name**: The target table for the data +- **Symbols** (tags): Comma-separated key-value pairs for indexed categorical data +- **Columns** (fields): Space-separated, then comma-separated key-value pairs for numerical or string data +- **Timestamp** (optional): Nanosecond-precision timestamp; if omitted, QuestDB uses server time + +For complete ILP specification, see the [ILP reference documentation](/docs/reference/api/ilp/overview/). + +## ILP Over HTTP + +QuestDB supports ILP data via HTTP or TCP. **HTTP is the recommended approach** for most use cases as it provides better reliability and easier debugging. + +To send data via HTTP: +1. Send a POST request to `http://localhost:9000/write` (or your QuestDB instance endpoint) +2. Set `Content-Type: text/plain` header +3. Include ILP-formatted rows in the request body +4. For higher throughput, batch multiple rows in a single request + +### HTTP Buffering Example + +The following PHP class provides buffered insertion with automatic flushing based on either row count or elapsed time: + +```php title="Buffered ILP insertion via HTTP" +bufferSize = $bufferSize; + $this->flushInterval = $flushInterval; + $this->lastFlushTime = time(); + } + + public function __destruct() { + // Attempt to flush any remaining data when script is terminating + $this->flush(); + } + + public function insertRow($tableName, $symbols, $columns, $timestamp = null) { + $row = $this->formatRow($tableName, $symbols, $columns, $timestamp); + $this->buffer[] = $row; + $this->checkFlushConditions(); + } + + private function formatRow($tableName, $symbols, $columns, $timestamp) { + $escape = function($value) { + return str_replace([' ', ',', "\n"], ['\ ', '\,', '\n'], $value); + }; + + $symbolString = implode(',', array_map( + function($k, $v) use ($escape) { return "$k={$escape($v)}"; }, + array_keys($symbols), $symbols + )); + + $columnString = implode(',', array_map( + function($k, $v) use ($escape) { return "$k={$escape($v)}"; }, + array_keys($columns), $columns + )); + + // Check if timestamp is provided + $timestampPart = is_null($timestamp) ? '' : " $timestamp"; + + return "$tableName,$symbolString $columnString$timestampPart"; + } + + private function checkFlushConditions() { + if (count($this->buffer) >= $this->bufferSize || (time() - $this->lastFlushTime) >= $this->flushInterval) { + $this->flush(); + } + } + + private function flush() { + if (empty($this->buffer)) { + return; // Nothing to flush + } + $data = implode("\n", $this->buffer); + $this->buffer = []; + $this->lastFlushTime = time(); + + $ch = curl_init(); + curl_setopt($ch, CURLOPT_URL, $this->endpoint); + curl_setopt($ch, CURLOPT_POST, true); + curl_setopt($ch, CURLOPT_POSTFIELDS, $data); + curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); + curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: text/plain']); + curl_exec($ch); + curl_close($ch); + } +} + +// Usage example: +$inserter = new DataInserter(10, 30); + +// Inserting rows for London +$inserter->insertRow("test_readings", ["city" => "London", "make" => "Omron"], ["temperature" => 23.5, "humidity" => 0.343], "1650573480100400000"); +$inserter->insertRow("test_readings", ["city" => "London", "make" => "Sony"], ["temperature" => 21.0, "humidity" => 0.310]); +$inserter->insertRow("test_readings", ["city" => "London", "make" => "Philips"], ["temperature" => 22.5, "humidity" => 0.333], "1650573480100500000"); +$inserter->insertRow("test_readings", ["city" => "London", "make" => "Samsung"], ["temperature" => 24.0, "humidity" => 0.350]); + +// Inserting rows for Madrid +$inserter->insertRow("test_readings", ["city" => "Madrid", "make" => "Omron"], ["temperature" => 25.5, "humidity" => 0.360], "1650573480100600000"); +$inserter->insertRow("test_readings", ["city" => "Madrid", "make" => "Sony"], ["temperature" => 23.0, "humidity" => 0.340]); +$inserter->insertRow("test_readings", ["city" => "Madrid", "make" => "Philips"], ["temperature" => 26.0, "humidity" => 0.370], "1650573480100700000"); +$inserter->insertRow("test_readings", ["city" => "Madrid", "make" => "Samsung"], ["temperature" => 22.0, "humidity" => 0.355]); + +// Inserting rows for New York +$inserter->insertRow("test_readings", ["city" => "New York", "make" => "Omron"], ["temperature" => 20.5, "humidity" => 0.330], "1650573480100800000"); +$inserter->insertRow("test_readings", ["city" => "New York", "make" => "Sony"], ["temperature" => 19.0, "humidity" => 0.320]); +$inserter->insertRow("test_readings", ["city" => "New York", "make" => "Philips"], ["temperature" => 21.0, "humidity" => 0.340], "1650573480100900000"); +$inserter->insertRow("test_readings", ["city" => "New York", "make" => "Samsung"], ["temperature" => 18.5, "humidity" => 0.335]); +?> +``` + +This class: +- Buffers rows until either 10 rows are accumulated or 30 seconds have elapsed +- Properly escapes special characters (spaces, commas, newlines) in values +- Automatically flushes remaining data when the script terminates +- Uses cURL for HTTP communication + +:::tip +For production use, consider adding error handling to check the HTTP response status and implement retry logic for failed requests. +::: + +## Using the InfluxDB v2 PHP Client + +Another approach is to use the official [InfluxDB PHP client library](https://github.com/influxdata/influxdb-client-php), which supports the InfluxDB v2 write API. QuestDB is compatible with this API, making the client library a convenient option. + +### Installation + +Install the required packages via Composer: + +```bash +composer require influxdata/influxdb-client-php guzzlehttp/guzzle +``` + +**Required dependencies:** +- `influxdata/influxdb-client-php` - The InfluxDB v2 PHP client library +- `guzzlehttp/guzzle` - A PSR-18 compatible HTTP client (required by the InfluxDB client) + +:::info Alternative HTTP Clients +The InfluxDB client requires a PSR-18 compatible HTTP client. While we recommend Guzzle, you can use alternatives like `php-http/guzzle7-adapter` or `symfony/http-client` if preferred. +::: + +### Configuration + +When using the InfluxDB client with QuestDB: + +- **URL**: Use your QuestDB HTTP endpoint (default: `http://localhost:9000`) +- **Token**: Not required - can be left empty or use any string +- **Bucket**: Not required - can be any string (ignored by QuestDB) +- **Organization**: Not required - can be any string (ignored by QuestDB) + +:::warning Write API Only +QuestDB only supports the **InfluxDB v2 write API** when using this client. Query operations are not supported through the InfluxDB client - use QuestDB's PostgreSQL wire protocol or REST API for queries instead. +::: + +### Example Code + +```php title="Using InfluxDB v2 PHP client with QuestDB" + "http://localhost:9000", + "token" => "", // Not required for QuestDB + "bucket" => "default", // Not used by QuestDB + "org" => "default", // Not used by QuestDB + "precision" => WritePrecision::NS +]); + +$writeApi = $client->createWriteApi(); + +// Write points using the Point builder +// Note: Omit ->time() to let QuestDB assign server timestamps +$point = Point::measurement("readings") + ->addTag("city", "London") + ->addTag("make", "Omron") + ->addField("temperature", 23.5) + ->addField("humidity", 0.343); + +$writeApi->write($point); + +// Write multiple points +$points = [ + Point::measurement("readings") + ->addTag("city", "Madrid") + ->addTag("make", "Sony") + ->addField("temperature", 25.5) + ->addField("humidity", 0.360), + + Point::measurement("readings") + ->addTag("city", "New York") + ->addTag("make", "Philips") + ->addField("temperature", 20.5) + ->addField("humidity", 0.330) +]; + +$writeApi->write($points); + +// Always close the client +$client->close(); +?> +``` + +### Benefits and Limitations + +The Point builder provides several advantages: +- **Automatic ILP formatting and escaping** - No need to manually construct ILP strings +- **Built-in error handling** - The client handles HTTP errors and retries +- **Batching support** - Automatically batches writes for better performance +- **Clean API** - Fluent Point builder interface is easy to use + +:::warning Timestamp Limitation +The InfluxDB PHP client **cannot be used with custom timestamps** when writing to QuestDB. When you call `->time()` with a nanosecond timestamp, the client serializes it in scientific notation (e.g., `1.76607297E+18`), which QuestDB's ILP parser rejects. + +**Solution:** Always omit the `->time()` call and let QuestDB assign server-side timestamps automatically. This is the only reliable way to use the InfluxDB PHP client with QuestDB. + +**If you need client-side timestamps:** Use the raw HTTP cURL approach (documented above) where you manually format the ILP string with full control over timestamp formatting. +::: + +## ILP Over TCP Socket + +TCP over socket provides higher throughput but is less reliable than HTTP. The message format is identical - only the transport changes. + +Use TCP when: +- You need maximum ingestion throughput +- Your application can handle potential data loss on connection failures +- You're willing to implement your own connection management and error handling + +### TCP Socket Example + +Here's a basic example using PHP's socket functions: + +```php title="Send ILP data via TCP socket" + +``` + +This basic example: +- Connects to QuestDB's ILP port (default 9009) +- Sends a single row of data +- Closes the connection + +For production use with TCP, you should: +- Keep connections open and reuse them for multiple rows +- Implement batching to reduce network overhead +- Add proper error handling and reconnection logic +- Consider using a connection pool for concurrent writes + +:::warning TCP Considerations +TCP ILP does not provide acknowledgments for successful writes. If the connection drops, you may lose data without notification. For critical data, use HTTP ILP instead. +::: + +## Choosing the Right Approach + +| Feature | HTTP (cURL) | HTTP (InfluxDB Client) | TCP Socket | +|---------|-------------|------------------------|------------| +| **Reliability** | High - responses indicate success/failure | High - responses indicate success/failure | Low - no acknowledgment | +| **Throughput** | Good | Good | Excellent | +| **Error handling** | Manual via cURL | Built-in via client library | Manual implementation required | +| **Ease of use** | Medium - manual ILP formatting | High - Point builder API | Low - manual everything | +| **Custom timestamps** | ✅ Full control | ❌ Must use server timestamps | ✅ Full control | +| **Dependencies** | None (cURL built-in) | `influxdb-client-php`
`guzzlehttp/guzzle` | None (sockets built-in) | +| **Authentication** | Standard HTTP auth | Standard HTTP auth | Limited options | +| **Recommended for** | Custom timestamps required | Ease of development, server timestamps acceptable | High-volume, loss-tolerant scenarios | + +:::info Related Documentation +- [ILP reference documentation](/docs/reference/api/ilp/overview/) +- [HTTP REST API](/docs/reference/api/rest/) +- [Authentication and security](/docs/operations/rbac/) +::: diff --git a/documentation/playbook/programmatic/ruby/inserting-ilp.md b/documentation/playbook/programmatic/ruby/inserting-ilp.md new file mode 100644 index 000000000..219bd71ab --- /dev/null +++ b/documentation/playbook/programmatic/ruby/inserting-ilp.md @@ -0,0 +1,355 @@ +--- +title: Insert Data from Ruby Using ILP +sidebar_label: Inserting via ILP +description: Send time-series data from Ruby to QuestDB using the InfluxDB Line Protocol over HTTP +--- + +Send time-series data from Ruby to QuestDB using the InfluxDB Line Protocol (ILP). While QuestDB doesn't maintain an official Ruby client, you can easily use the official InfluxDB Ruby gem to send data via ILP over HTTP, which QuestDB fully supports. + +## Available Approaches + +Two methods for sending ILP data from Ruby: + +1. **InfluxDB v2 Ruby Client** (recommended) + - Official InfluxDB gem with clean API + - Automatic batching and error handling + - Compatible with QuestDB's ILP endpoint + - Requires: `influxdb-client` gem + +2. **TCP Socket** (for custom implementations) + - Direct socket communication + - Manual ILP message formatting + - Higher throughput, no dependencies + - Requires: Built-in Ruby socket library + +## Using the InfluxDB v2 Ruby Client + +The InfluxDB v2 client provides a convenient Point builder API that works with QuestDB. + +### Installation + +```bash +gem install influxdb-client +``` + +Or add to your `Gemfile`: + +```ruby +gem 'influxdb-client', '~> 3.1' +``` + +### Example Code + +```ruby +require 'influxdb-client' + +# Create client +client = InfluxDB2::Client.new( + 'http://localhost:9000', + 'ignore-token', # Token not required for QuestDB + bucket: 'ignore-bucket', # Bucket not used by QuestDB + org: 'ignore-org', # Organization not used by QuestDB + precision: InfluxDB2::WritePrecision::NANOSECOND, + use_ssl: false +) + +write_api = client.create_write_api + +# Write a single point +point = InfluxDB2::Point.new(name: 'readings') + .add_tag('city', 'London') + .add_tag('make', 'Omron') + .add_field('temperature', 23.5) + .add_field('humidity', 0.343) + +write_api.write(data: point) + +# Write multiple points +points = [ + InfluxDB2::Point.new(name: 'readings') + .add_tag('city', 'Madrid') + .add_tag('make', 'Sony') + .add_field('temperature', 25.5) + .add_field('humidity', 0.360), + + InfluxDB2::Point.new(name: 'readings') + .add_tag('city', 'New York') + .add_tag('make', 'Philips') + .add_field('temperature', 20.5) + .add_field('humidity', 0.330) +] + +write_api.write(data: points) + +# Always close the client +client.close! +``` + +### Configuration Notes + +When using the InfluxDB client with QuestDB: + +- **`token`**: Not required - can be empty string or any value +- **`bucket`**: Ignored by QuestDB - can be any string +- **`org`**: Ignored by QuestDB - can be any string +- **`precision`**: Use `NANOSECOND` for compatibility (QuestDB's native precision) +- **`use_ssl`**: Set to `false` for local development, `true` for production with TLS + +### Data Types + +The InfluxDB client automatically handles type conversions: + +```ruby +point = InfluxDB2::Point.new(name: 'measurements') + .add_tag('sensor_id', '001') # SYMBOL in QuestDB + .add_field('temperature', 23.5) # DOUBLE + .add_field('humidity', 0.343) # DOUBLE + .add_field('pressure', 1013) # LONG (integer) + .add_field('status', 'active') # STRING + .add_field('online', true) # BOOLEAN +``` + +## TCP Socket Approach + +For maximum control and performance, send ILP messages directly via TCP sockets. + +### Basic TCP Example + +```ruby +require 'socket' + +HOST = 'localhost' +PORT = 9009 + +# Helper method to get current time in nanoseconds +def time_in_nsec + now = Time.now + return now.to_i * (10 ** 9) + now.nsec +end + +begin + s = TCPSocket.new(HOST, PORT) + + # Single record with timestamp + s.puts "trades,symbol=BTC-USDT,side=buy price=37779.62,amount=0.5 #{time_in_nsec}\n" + + # Omitting timestamp - server assigns one + s.puts "trades,symbol=ETH-USDT,side=sell price=2615.54,amount=1.2\n" + + # Multiple records (newline-delimited) + s.puts "trades,symbol=SOL-USDT,side=buy price=98.23,amount=10.0\n" + + "trades,symbol=BTC-USDT,side=sell price=37800.00,amount=0.3\n" + +rescue SocketError => ex + puts "Socket error: #{ex.inspect}" +ensure + s.close if s +end +``` + +### ILP Message Format + +The ILP format is: + +``` +table_name,tag1=value1,tag2=value2 field1=value1,field2=value2 timestamp\n +``` + +Breaking it down: +- **Table name**: Target table (created automatically if doesn't exist) +- **Tags** (symbols): Comma-separated key=value pairs for indexed categorical data +- **Space separator**: Separates tags from fields +- **Fields** (columns): Comma-separated key=value pairs for numerical or string data +- **Space separator**: Separates fields from timestamp +- **Timestamp** (optional): Nanosecond-precision timestamp; if omitted, server assigns + +**Example:** +``` +readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n +``` + +### Escaping Special Characters + +ILP requires escaping for certain characters: + +```ruby +def escape_ilp(value) + value.to_s + .gsub(' ', '\\ ') # Space + .gsub(',', '\\,') # Comma + .gsub('=', '\\=') # Equals + .gsub("\n", '\\n') # Newline +end + +# Usage +tag_value = "London, UK" +escaped = escape_ilp(tag_value) # "London\\, UK" + +s.puts "readings,city=#{escaped} temperature=23.5\n" +``` + +### Batching for Performance + +Send multiple rows in a single TCP write: + +```ruby +require 'socket' + +HOST = 'localhost' +PORT = 9009 + +def time_in_nsec + now = Time.now + return now.to_i * (10 ** 9) + now.nsec +end + +begin + s = TCPSocket.new(HOST, PORT) + + # Build batch of rows + batch = [] + (1..1000).each do |i| + timestamp = time_in_nsec + i * 1000000 # 1ms apart + batch << "readings,sensor_id=#{i} value=#{rand(100.0)},status=\"ok\" #{timestamp}" + end + + # Send entire batch at once + s.puts batch.join("\n") + "\n" + s.flush + +rescue SocketError => ex + puts "Socket error: #{ex.inspect}" +ensure + s.close if s +end +``` + +## Comparison: InfluxDB Client vs TCP Socket + +| Feature | InfluxDB Client | TCP Socket | +|---------|----------------|------------| +| **Ease of use** | High - Point builder API | Medium - Manual ILP formatting | +| **Dependencies** | Requires `influxdb-client` gem | None (stdlib only) | +| **Error handling** | Automatic with retries | Manual implementation | +| **Batching** | Automatic | Manual | +| **Performance** | Good | Excellent (direct TCP) | +| **Type safety** | Automatic type conversion | Manual string formatting | +| **Reliability** | HTTP with acknowledgments | No acknowledgments (fire and forget) | +| **Escaping** | Automatic | Manual implementation required | +| **Recommended for** | Most applications | High-throughput scenarios, custom needs | + +## Best Practices + +### Connection Management + +**InfluxDB Client:** +```ruby +# Reuse client for multiple writes +client = InfluxDB2::Client.new(...) +write_api = client.create_write_api + +# ... perform many writes ... + +client.close! # Always close when done +``` + +**TCP Socket:** +```ruby +# Keep connection open for multiple writes +socket = TCPSocket.new(HOST, PORT) + +begin + # ... send multiple batches ... +ensure + socket.close if socket +end +``` + +### Error Handling + +**InfluxDB Client:** +```ruby +begin + write_api.write(data: points) +rescue InfluxDB2::InfluxError => e + puts "Failed to write to QuestDB: #{e.message}" + # Implement retry logic or logging +end +``` + +**TCP Socket:** +```ruby +begin + socket.puts(ilp_messages) + socket.flush +rescue Errno::EPIPE, Errno::ECONNRESET => e + puts "Connection lost: #{e.message}" + # Reconnect and retry +rescue StandardError => e + puts "Unexpected error: #{e.message}" +end +``` + +### Timestamp Generation + +Use nanosecond precision for maximum compatibility: + +```ruby +# Current time in nanoseconds +def current_nanos + now = Time.now + now.to_i * 1_000_000_000 + now.nsec +end + +# Specific time to nanoseconds +def time_to_nanos(time) + time.to_i * 1_000_000_000 + time.nsec +end + +# Usage +timestamp = current_nanos +# or +timestamp = time_to_nanos(Time.parse("2024-09-05 14:30:00 UTC")) +``` + +### Batching Strategy + +For high-throughput scenarios: + +```ruby +BATCH_SIZE = 1000 +FLUSH_INTERVAL = 5 # seconds + +batch = [] +last_flush = Time.now + +data_stream.each do |record| + batch << format_ilp_message(record) + + if batch.size >= BATCH_SIZE || (Time.now - last_flush) >= FLUSH_INTERVAL + socket.puts batch.join("\n") + "\n" + socket.flush + batch.clear + last_flush = Time.now + end +end + +# Flush remaining records +socket.puts batch.join("\n") + "\n" unless batch.empty? +``` + +:::tip Choosing an Approach +- **Use InfluxDB client** for most Ruby applications - it's easier, safer, and handles edge cases +- **Use TCP sockets** only when you need maximum throughput and can handle reliability concerns +::: + +:::warning Data Loss with TCP +TCP ILP has no acknowledgments. If the connection drops, data may be lost silently. For critical data, use HTTP (via the InfluxDB client) which provides delivery confirmation. +::: + +:::info Related Documentation +- [ILP reference](/docs/reference/api/ilp/overview/) +- [ILP over HTTP](/docs/reference/api/ilp/overview/#transport-selection) +- [ILP over TCP](/docs/reference/api/ilp/overview/#transport-selection) +- [InfluxDB Ruby client](https://github.com/influxdata/influxdb-client-ruby) +::: diff --git a/documentation/playbook/programmatic/tls-ca-configuration.md b/documentation/playbook/programmatic/tls-ca-configuration.md new file mode 100644 index 000000000..edf2d9d54 --- /dev/null +++ b/documentation/playbook/programmatic/tls-ca-configuration.md @@ -0,0 +1,102 @@ +--- +title: Configure TLS Certificate Authorities +sidebar_label: TLS CA configuration +description: Configure TLS certificate authority validation for QuestDB clients +--- + +Configure TLS certificate authority (CA) validation when connecting QuestDB clients to TLS-enabled instances. + +## Problem + +You are using a QuestDB client (Rust, Python, C++, etc.) to insert data. It works when using QuestDB without TLS, but when you enable TLS on your QuestDB instance using a self-signed certificate, you get an error of "certificate unknown". + +When using the PostgreSQL wire interface, you can insert data passing `sslmode=require`, and it works, so you can discard any problems with QuestDB recognizing the certificate. But you need to figure out the equivalent for your ILP client. + +## Solution: Configure TLS CA + +QuestDB clients support the `tls_ca` parameter, which has multiple values to configure certificate authority validation: + +### Option 1: Use WebPKI and OS Certificate Roots (Recommended for Production) + +If you want to accept both the webpki-root certificates plus whatever you have on the OS, pass `tls_ca=webpki_and_os_roots`: + +``` +https::addr=localhost:9000;username=admin;password=quest;tls_ca=webpki_and_os_roots; +``` + +This will work with certificates signed by standard certificate authorities. + +### Option 2: Use a Custom PEM File + +Point to a PEM-encoded certificate file for self-signed or custom CA certificates: + +``` +https::addr=localhost:9000;username=admin;password=quest;tls_ca=pem_file;tls_roots=/path/to/cert.pem; +``` + +This is useful for self-signed certificates or internal CAs. + +### Option 3: Skip Verification (Development Only) + +For development environments with self-signed certificates, you might be tempted to disable verification by passing `tls_verify=unsafe_off`: + +``` +https::addr=localhost:9000;username=admin;password=quest;tls_verify=unsafe_off; +``` + +:::danger +This is a very bad idea for production and should only be used for testing on a development environment with a self-signed certificate. It disables all certificate validation. +::: + +**Note:** Some clients require enabling an optional feature (like `insecure-skip-verify` in Rust) before the `tls_verify=unsafe_off` parameter will work. Check your client's documentation for details. + +## Available tls_ca Values + +| Value | Description | +|-------|-------------| +| `webpki_roots` | Mozilla's WebPKI root certificates only | +| `os_roots` | Operating system certificate store only | +| `webpki_and_os_roots` | Both WebPKI and OS roots (recommended) | +| `pem_file` | Load from a PEM file (requires `tls_roots` parameter) | + +## Example: Rust Client + +```rust +use questdb::ingress::{Sender, SenderBuilder}; + +#[tokio::main] +async fn main() -> Result<(), Box> { + let sender = SenderBuilder::new("https", "localhost", 9000)? + .username("admin")? + .password("quest")? + .tls_ca("webpki_and_os_roots")? // Use standard CAs + .build() + .await?; + + // Use sender... + + sender.close().await?; + Ok(()) +} +``` + +For self-signed certificates with a PEM file: + +```rust +let sender = SenderBuilder::new("https", "localhost", 9000)? + .username("admin")? + .password("quest")? + .tls_ca("pem_file")? + .tls_roots("/path/to/questdb.crt")? + .build() + .await?; +``` + +The examples are in Rust but the concepts are similar in other languages. Check the documentation for your specific client. + +:::info Related Documentation +- [QuestDB Rust client](https://docs.rs/questdb/) +- [QuestDB Python client](/docs/clients/ingest-python/) +- [QuestDB C++ client](/docs/clients/ingest-c-and-cpp/) +- [QuestDB TLS configuration](/docs/operations/tls/) +::: diff --git a/documentation/playbook/sql/advanced/array-from-string.md b/documentation/playbook/sql/advanced/array-from-string.md new file mode 100644 index 000000000..1091dc58e --- /dev/null +++ b/documentation/playbook/sql/advanced/array-from-string.md @@ -0,0 +1,32 @@ +--- +title: Create Arrays from String Literals +sidebar_label: Array from string literal +description: Cast string literals to array types in QuestDB +--- + +Cast string literals to array types for use with functions that accept array parameters. + +## Solution + +To cast an array from a string you need to cast to `double[]` for a vector, or to `double[][]` for a two-dimensional array. You can just keep adding brackets for as many dimensions as the literal has. + +This query shows how to convert a string literal into an array, even when there are new lines: + +```questdb-sql demo title="Cast string to array" +SELECT CAST('[ + [ 1.0, 2.0, 3.0 ], + [ + 4.0, + 5.0, + 6.0 + ] +]' AS double[][]), +cast('[[1,2,3],[4,5,6]]' as double[][]); +``` + +Note if you add the wrong number of brackets (for example, in this case if you try casting to `double[]` or `double[][][][]`), it will not error, but will instead convert as null. + +:::info Related Documentation +- [CAST function](/docs/reference/sql/cast/) +- [Data types](/docs/reference/sql/datatypes/) +::: diff --git a/documentation/playbook/sql/advanced/conditional-aggregates.md b/documentation/playbook/sql/advanced/conditional-aggregates.md new file mode 100644 index 000000000..ba3c0cc27 --- /dev/null +++ b/documentation/playbook/sql/advanced/conditional-aggregates.md @@ -0,0 +1,68 @@ +--- +title: Multiple Conditional Aggregates +sidebar_label: Conditional aggregates +description: Calculate multiple conditional aggregates in a single query using CASE expressions +--- + +Calculate multiple aggregates with different conditions in a single pass through the data using CASE expressions. + +## Problem + +You need to calculate various metrics from the same dataset with different conditions: +- Count of buy orders +- Count of sell orders +- Average buy price +- Average sell price +- Total volume for large trades (> 1.0) +- Total volume for small trades (≤ 1.0) + +Running separate queries is inefficient. + +## Solution: CASE Within Aggregate Functions + +Use CASE expressions inside aggregates to calculate all metrics in one query: + +```questdb-sql demo title="Multiple conditional aggregates in single query" +SELECT + symbol, + count(CASE WHEN side = 'buy' THEN 1 END) as buy_count, + count(CASE WHEN side = 'sell' THEN 1 END) as sell_count, + avg(CASE WHEN side = 'buy' THEN price END) as avg_buy_price, + avg(CASE WHEN side = 'sell' THEN price END) as avg_sell_price, + sum(CASE WHEN amount > 1.0 THEN amount END) as large_trade_volume, + sum(CASE WHEN amount <= 1.0 THEN amount END) as small_trade_volume, + sum(amount) as total_volume +FROM trades +WHERE timestamp >= dateadd('d', -1, now()) + AND symbol IN ('BTC-USDT', 'ETH-USDT') +GROUP BY symbol; +``` + +## How It Works + +### CASE Returns NULL for Non-Matching Rows + +```sql +count(CASE WHEN side = 'buy' THEN 1 END) +``` + +- When `side = 'buy'`: CASE returns 1 +- When `side != 'buy'`: CASE returns NULL (implicit ELSE NULL) +- `count()` only counts non-NULL values +- Result: counts only rows where side is 'buy' + +### Aggregate Functions Ignore NULL + +```sql +avg(CASE WHEN side = 'buy' THEN price END) +``` + +- `avg()` calculates average of non-NULL values only +- Only includes price when side is 'buy' +- Automatically skips all other rows + +:::info Related Documentation +- [CASE expressions](/docs/reference/sql/case/) +- [Aggregate functions](/docs/reference/function/aggregation/) +- [count()](/docs/reference/function/aggregation/#count) +::: diff --git a/documentation/playbook/sql/advanced/consistent-histogram-buckets.md b/documentation/playbook/sql/advanced/consistent-histogram-buckets.md new file mode 100644 index 000000000..8daa6d9d7 --- /dev/null +++ b/documentation/playbook/sql/advanced/consistent-histogram-buckets.md @@ -0,0 +1,424 @@ +--- +title: Consistent Histogram Buckets +sidebar_label: Histogram buckets +description: Generate histogram data with fixed bucket boundaries for consistent time-series distribution analysis +--- + +Create histograms with consistent bucket boundaries across different time periods. This ensures that distributions are comparable over time, essential for monitoring metric distributions, latency percentiles, and value ranges in dashboards. + +## Problem: Inconsistent Histogram Buckets + +You want to track the distribution of trade sizes over time: + +**Naive approach (inconsistent buckets):** +```sql +SELECT + CASE + WHEN amount < 1.0 THEN 'small' + WHEN amount < 10.0 THEN 'medium' + ELSE 'large' + END as bucket, + count(*) as count +FROM trades +GROUP BY bucket; +``` + +This works for a single query, but comparing histograms across different time periods or symbols becomes difficult when bucket boundaries aren't precisely defined. + +## Solution: Fixed Numeric Buckets + +Define consistent bucket boundaries using integer division: + +```questdb-sql demo title="Histogram with fixed 0.5 BTC buckets" +SELECT + (cast(amount / 0.5 AS INT) * 0.5) as bucket_start, + ((cast(amount / 0.5 AS INT) + 1) * 0.5) as bucket_end, + count(*) as count +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -1, now()) +GROUP BY bucket_start, bucket_end +ORDER BY bucket_start; +``` + +**Results:** + +| bucket_start | bucket_end | count | +|--------------|------------|-------| +| 0.0 | 0.5 | 1,234 | +| 0.5 | 1.0 | 890 | +| 1.0 | 1.5 | 456 | +| 1.5 | 2.0 | 234 | +| 2.0 | 2.5 | 123 | + +## How It Works + +### Bucket Calculation + +```sql +cast(amount / 0.5 AS INT) * 0.5 +``` + +**Step by step:** +1. `amount / 0.5`: Divide by bucket width (amount 1.3 → 2.6) +2. `cast(... AS INT)`: Truncate to integer (2.6 → 2) +3. `* 0.5`: Multiply back by bucket width (2 → 1.0) + +**Examples:** +- amount = 0.3 → 0.3/0.5=0.6 → INT(0.6)=0 → 0*0.5=0.0 +- amount = 1.3 → 1.3/0.5=2.6 → INT(2.6)=2 → 2*0.5=1.0 +- amount = 2.7 → 2.7/0.5=5.4 → INT(5.4)=5 → 5*0.5=2.5 + +### Bucket End + +```sql +(cast(amount / 0.5 AS INT) + 1) * 0.5 +``` + +Add 1 before multiplying back to get the upper boundary. + +## Dynamic Bucket Width + +Use a variable for easy adjustment: + +```questdb-sql demo title="Configurable bucket width" +WITH bucketed AS ( + SELECT + amount, + 0.25 as bucket_width, -- Change this to adjust granularity + (cast(amount / 0.25 AS INT) * 0.25) as bucket_start + FROM trades + WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -1, now()) +) +SELECT + bucket_start, + (bucket_start + bucket_width) as bucket_end, + count(*) as count, + sum(amount) as total_volume +FROM bucketed +GROUP BY bucket_start, bucket_width +ORDER BY bucket_start; +``` + +**Bucket widths by use case:** +- Latency (milliseconds): 10ms, 50ms, 100ms +- Trade sizes: 0.1, 0.5, 1.0 +- Prices: 100, 500, 1000 +- Temperatures: 1°C, 5°C, 10°C + +## Time-Series Histogram + +Track distribution changes over time: + +```questdb-sql demo title="Hourly histogram evolution" +SELECT + timestamp_floor('h', timestamp) as hour, + (cast(amount / 0.5 AS INT) * 0.5) as bucket, + count(*) as count +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -7, now()) +GROUP BY hour, bucket +ORDER BY hour DESC, bucket; +``` + +**Results:** + +| hour | bucket | count | +|------|--------|-------| +| 2025-01-15 23:00 | 0.0 | 345 | +| 2025-01-15 23:00 | 0.5 | 234 | +| 2025-01-15 23:00 | 1.0 | 123 | +| 2025-01-15 22:00 | 0.0 | 312 | +| 2025-01-15 22:00 | 0.5 | 245 | + +This shows how the distribution shifts over time. + +## Grafana Heatmap Visualization + +Format for Grafana heatmap: + +```questdb-sql demo title="Heatmap data for Grafana" +SELECT + timestamp_floor('5m', timestamp) as time, + (cast(latency_ms / 10 AS INT) * 10) as bucket, + count(*) as count +FROM api_requests +WHERE $__timeFilter(timestamp) +GROUP BY time, bucket +ORDER BY time, bucket; +``` + +**Grafana configuration:** +- Visualization: Heatmap +- X-axis: time +- Y-axis: bucket (latency range) +- Cell value: count + +Creates a heatmap showing latency distribution evolution over time. + +## Logarithmic Buckets + +For data spanning multiple orders of magnitude: + +```questdb-sql demo title="Logarithmic buckets for wide value ranges" +SELECT + POWER(10, cast(log10(amount) AS INT)) as bucket_start, + POWER(10, cast(log10(amount) AS INT) + 1) as bucket_end, + count(*) as count +FROM trades +WHERE symbol = 'BTC-USDT' + AND amount > 0 + AND timestamp >= dateadd('d', -1, now()) +GROUP BY bucket_start, bucket_end +ORDER BY bucket_start; +``` + +**Results:** + +| bucket_start | bucket_end | count | +|--------------|------------|-------| +| 0.01 | 0.1 | 1,234 | +| 0.1 | 1.0 | 4,567 | +| 1.0 | 10.0 | 2,345 | +| 10.0 | 100.0 | 123 | + +**Use cases:** +- Response times (1ms to 10s) +- File sizes (1KB to 1GB) +- Memory usage (1MB to 10GB) + +## Percentile Buckets + +Create buckets representing percentile ranges: + +```questdb-sql demo title="Percentile-based buckets" +WITH percentiles AS ( + SELECT + percentile(price, 10) as p10, + percentile(price, 25) as p25, + percentile(price, 50) as p50, + percentile(price, 75) as p75, + percentile(price, 90) as p90 + FROM trades + WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -30, now()) +) +SELECT + CASE + WHEN price < p10 THEN '< P10' + WHEN price < p25 THEN 'P10-P25' + WHEN price < p50 THEN 'P25-P50' + WHEN price < p75 THEN 'P50-P75' + WHEN price < p90 THEN 'P75-P90' + ELSE '> P90' + END as percentile_bucket, + count(*) as count, + (count(*) * 100.0 / sum(count(*)) OVER ()) as percentage +FROM trades, percentiles +WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -1, now()) +GROUP BY percentile_bucket, p10, p25, p50, p75, p90 +ORDER BY + CASE percentile_bucket + WHEN '< P10' THEN 1 + WHEN 'P10-P25' THEN 2 + WHEN 'P25-P50' THEN 3 + WHEN 'P50-P75' THEN 4 + WHEN 'P75-P90' THEN 5 + ELSE 6 + END; +``` + +This shows what percentage of recent trades fall into each historical percentile range. + +## Cumulative Distribution + +Calculate cumulative counts for CDF visualization: + +```questdb-sql demo title="Cumulative distribution function" +WITH histogram AS ( + SELECT + (cast(amount / 0.5 AS INT) * 0.5) as bucket, + count(*) as count + FROM trades + WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -1, now()) + GROUP BY bucket +) +SELECT + bucket, + count, + sum(count) OVER (ORDER BY bucket) as cumulative_count, + (sum(count) OVER (ORDER BY bucket) * 100.0 / + sum(count) OVER ()) as cumulative_percentage +FROM histogram +ORDER BY bucket; +``` + +**Results:** + +| bucket | count | cumulative_count | cumulative_percentage | +|--------|-------|------------------|----------------------| +| 0.0 | 1,234 | 1,234 | 40.2% | +| 0.5 | 890 | 2,124 | 69.1% | +| 1.0 | 456 | 2,580 | 84.0% | +| 1.5 | 234 | 2,814 | 91.6% | + +Shows that 84% of trades are 1.5 BTC or less. + +## Multi-Dimensional Histogram + +Bucket by two dimensions: + +```questdb-sql demo title="2D histogram: amount vs price range" +SELECT + (cast(amount / 0.5 AS INT) * 0.5) as amount_bucket, + (cast(price / 1000 AS INT) * 1000) as price_bucket, + count(*) as count +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -1, now()) +GROUP BY amount_bucket, price_bucket +HAVING count > 10 -- Filter sparse buckets +ORDER BY amount_bucket, price_bucket; +``` + +**Results:** + +| amount_bucket | price_bucket | count | +|---------------|--------------|-------| +| 0.0 | 61000 | 234 | +| 0.0 | 62000 | 345 | +| 0.5 | 61000 | 123 | +| 0.5 | 62000 | 156 | + +## Adaptive Bucketing + +Adjust bucket width based on data density: + +```questdb-sql demo title="Fine-grained buckets for common ranges" +SELECT + CASE + WHEN amount < 1.0 THEN cast(amount / 0.1 AS INT) * 0.1 -- 0.1 BTC buckets + WHEN amount < 10.0 THEN cast(amount / 1.0 AS INT) * 1.0 -- 1 BTC buckets + ELSE cast(amount / 10.0 AS INT) * 10.0 -- 10 BTC buckets + END as bucket, + count(*) as count +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp >= dateadd('d', -1, now()) +GROUP BY bucket +ORDER BY bucket; +``` + +Provides more detail in common ranges, broader buckets for rare large trades. + +## Comparison Across Symbols + +Compare distributions using consistent buckets: + +```questdb-sql demo title="Compare trade size distributions" +SELECT + symbol, + (cast(amount / 0.5 AS INT) * 0.5) as bucket, + count(*) as count, + avg(price) as avg_price +FROM trades +WHERE symbol IN ('BTC-USDT', 'ETH-USDT') + AND timestamp >= dateadd('d', -1, now()) +GROUP BY symbol, bucket +ORDER BY symbol, bucket; +``` + +Shows whether trade size patterns differ between assets. + +## Performance Optimization + +**Index usage:** +```sql +-- Ensure timestamp and symbol are indexed +CREATE TABLE trades ( + timestamp TIMESTAMP, + symbol SYMBOL INDEX, -- SYMBOL type has implicit index + price DOUBLE, + amount DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY; +``` + +**Pre-aggregate for dashboards:** +```sql +-- Create hourly histogram summary +CREATE TABLE trade_histogram_hourly AS +SELECT + timestamp_floor('h', timestamp) as hour, + symbol, + (cast(amount / 0.5 AS INT) * 0.5) as bucket, + count(*) as count, + sum(amount) as total_volume +FROM trades +SAMPLE BY 1h; + +-- Query summary instead of raw data +SELECT * FROM trade_histogram_hourly WHERE hour >= dateadd('d', -7, now()); +``` + +**Limit bucket range:** +```sql +-- Exclude extreme outliers +WHERE amount BETWEEN 0.01 AND 100 +``` + +Prevents single extreme values from creating many empty buckets. + +## Common Pitfalls + +**Empty buckets not shown:** +```sql +-- This only returns buckets with data +SELECT bucket, count(*) FROM ... GROUP BY bucket; + +-- To include empty buckets, use generate_series or CROSS JOIN +``` + +**Floating point precision:** +```sql +-- Bad: May have precision issues +cast(amount / 0.1 AS INT) * 0.1 + +-- Better: Use integers where possible +cast(amount * 10 AS INT) / 10.0 +``` + +**Negative values:** +```sql +-- Handle negative values correctly +SIGN(value) * (cast(ABS(value) / bucket_width AS INT) * bucket_width) +``` + +:::tip Choosing Bucket Width +Select bucket width based on: +- **Data range**: 10-50 buckets typically ideal for visualization +- **Precision needed**: Smaller buckets for detailed analysis +- **Query performance**: Fewer buckets = faster aggregation +- **Visual clarity**: Too many buckets create cluttered charts + +Formula: `bucket_width = (max - min) / target_bucket_count` +::: + +:::warning Grafana Heatmap Requirements +Grafana heatmaps require: +1. Time column named `time` +2. Numeric bucket column +3. Count/value column +4. Data sorted by time, then bucket +5. Consistent bucket boundaries across all time periods +::: + +:::info Related Documentation +- [Aggregate functions](/docs/reference/function/aggregation/) +- [CAST function](/docs/reference/sql/cast/) +- [percentile()](/docs/reference/function/aggregation/#approx_percentile) +- [Window functions](/docs/reference/sql/over/) +::: diff --git a/documentation/playbook/sql/advanced/general-and-sampled-aggregates.md b/documentation/playbook/sql/advanced/general-and-sampled-aggregates.md new file mode 100644 index 000000000..4582262b1 --- /dev/null +++ b/documentation/playbook/sql/advanced/general-and-sampled-aggregates.md @@ -0,0 +1,102 @@ +--- +title: General and Sampled Aggregates +sidebar_label: General + sampled aggregates +description: Combine overall statistics with time-bucketed aggregates using CROSS JOIN +--- + +Combine overall (unsampled) aggregates with sampled aggregates in the same query. + +## Problem + +You have a query with three aggregates: + +```sql +SELECT max(price), avg(price), min(price) +FROM trades_2024 +WHERE timestamp IN '2024-08'; +``` + +This returns: +``` +max avg min +======== =========== ======== +61615.43 31598.71891 58402.01 +``` + +And another query to get event count per second, then select the maximum: + +```sql +SELECT max(count_sec) FROM ( + SELECT count() as count_sec FROM trades + WHERE timestamp IN '2024-08' + SAMPLE BY 1s +); +``` + +This returns: +``` +max +==== +1241 +``` + +You want to combine both results in a single row: + +``` +max avg min max_count +======== =========== ======== ========= +61615.43 31598.71891 58402.01 1241 +``` + +## Solution: CROSS JOIN + +A `CROSS JOIN` can join every row from the first query (1 row) with every row from the second (1 row), so you get a single row with all the aggregates combined: + +```questdb-sql demo title="Combine general and sampled aggregates" +WITH +sampled AS ( + SELECT timestamp, count() as count_sec FROM trades + WHERE timestamp IN '2024-08' + SAMPLE BY 1s + ORDER BY 2 DESC + LIMIT -1 +) +SELECT max(price), avg(price), min(price), count_sec as max_count +FROM trades_2024 CROSS JOIN sampled +WHERE trades_2024.timestamp IN '2024-08'; +``` + +## Grafana Baseline Visualization + +Format for Grafana with baseline reference line: + +```questdb-sql demo title="Time-series with baseline for Grafana" +WITH baseline AS ( + SELECT avg(response_time_ms) as avg_response_time + FROM api_metrics + WHERE timestamp >= dateadd('d', -7, now()) +), +timeseries AS ( + SELECT + timestamp as time, + avg(response_time_ms) as current_response_time + FROM api_metrics + WHERE $__timeFilter(timestamp) + SAMPLE BY $__interval +) +SELECT + timeseries.time, + timeseries.current_response_time as "Current", + baseline.avg_response_time as "7-Day Average" +FROM timeseries +CROSS JOIN baseline +ORDER BY timeseries.time; +``` + +Grafana will plot both series, making it easy to see when current values deviate from baseline. + +:::info Related Documentation +- [CROSS JOIN](/docs/reference/sql/join/#cross-join) +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [Grafana integration](/docs/third-party-tools/grafana/) +::: diff --git a/documentation/playbook/sql/advanced/pivot-table.md b/documentation/playbook/sql/advanced/pivot-table.md new file mode 100644 index 000000000..9018d656b --- /dev/null +++ b/documentation/playbook/sql/advanced/pivot-table.md @@ -0,0 +1,100 @@ +--- +title: Pivoting Query Results +sidebar_label: Pivoting results +description: Transform rows into columns using CASE statements to pivot time-series data +--- + +Pivoting transforms row-based data into column-based data, where values from one column become column headers. This is useful for creating wide-format reports or comparison tables. + +## Problem: Long-format Results + +When you aggregate data with `SAMPLE BY`, you get one row per time interval and grouping value: + +```questdb-sql demo title="Query returning rows per symbol and timestamp" +SELECT timestamp, symbol, SUM(bid_price) AS total_bid +FROM core_price +WHERE timestamp IN today() +SAMPLE BY 1m +LIMIT 20; +``` + +**Results:** + +| timestamp | symbol | total_bid | +| --------------------------- | ------ | ------------------ | +| 2025-12-18T00:00:00.000000Z | AUDUSD | 1146.7547999999995 | +| 2025-12-18T00:00:00.000000Z | USDTRY | 77545.1637 | +| 2025-12-18T00:00:00.000000Z | USDSEK | 15655.122000000012 | +| 2025-12-18T00:00:00.000000Z | USDCHF | 1308.9189999999994 | +| 2025-12-18T00:00:00.000000Z | AUDCAD | 1533.120900000004 | +| 2025-12-18T00:00:00.000000Z | EURNZD | 3502.5426999999922 | +| 2025-12-18T00:00:00.000000Z | AUDNZD | 2014.2881000000089 | +| 2025-12-18T00:00:00.000000Z | USDMXN | 31111.124799999983 | +| 2025-12-18T00:00:00.000000Z | EURGBP | 1501.919500000002 | +| 2025-12-18T00:00:00.000000Z | EURJPY | 305747.47 | +| 2025-12-18T00:00:00.000000Z | USDZAR | 28375.69069999998 | +| 2025-12-18T00:00:00.000000Z | EURUSD | 2034.6741000000018 | +| 2025-12-18T00:00:00.000000Z | NZDCAD | 1365.2795000000028 | +| 2025-12-18T00:00:00.000000Z | USDCAD | 2318.794500000005 | +| 2025-12-18T00:00:00.000000Z | GBPNZD | 4033.9539000000054 | +| 2025-12-18T00:00:00.000000Z | NZDUSD | 977.1505000000012 | +| 2025-12-18T00:00:00.000000Z | USDHKD | 13200.823400000027 | +| 2025-12-18T00:00:00.000000Z | GBPCHF | 1856.3431999999962 | +| 2025-12-18T00:00:00.000000Z | NZDJPY | 152123.41999999998 | +| 2025-12-18T00:00:00.000000Z | GBPJPY | 348693.1200000006 | + +This format has multiple rows per timestamp, one for each symbol. + +## Solution: Pivot Using CASE Statements + +To get one row per timestamp with a column for each symbol, use conditional aggregation with `CASE` statements: + +```questdb-sql demo title="Pivot symbols into columns" +SELECT timestamp, + SUM(CASE WHEN symbol='EURUSD' THEN bid_price END) AS EURUSD, + SUM(CASE WHEN symbol='GBPUSD' THEN bid_price END) AS GBPUSD, + SUM(CASE WHEN symbol='USDJPY' THEN bid_price END) AS USDJPY, + SUM(CASE WHEN symbol='USDCHF' THEN bid_price END) AS USDCHF, + SUM(CASE WHEN symbol='AUDUSD' THEN bid_price END) AS AUDUSD, + SUM(CASE WHEN symbol='USDCAD' THEN bid_price END) AS USDCAD, + SUM(CASE WHEN symbol='NZDUSD' THEN bid_price END) AS NZDUSD +FROM core_price +WHERE timestamp IN today() +SAMPLE BY 1m +LIMIT 5; +``` + +Now each timestamp has a single row with all symbols as columns, making cross-symbol comparison much easier. + +## How It Works + +The `CASE` statement conditionally includes values: + +```sql +SUM(CASE WHEN symbol='EURUSD' THEN bid_price END) AS EURUSD +``` + +This means: +1. For each row, if `symbol='EURUSD'`, include the `bid_price` value +2. Otherwise, include `NULL` (implicit) +3. `SUM()` aggregates only the non-NULL values for each timestamp + +The same pattern applies to each symbol, creating one column per unique value. + +## Use Cases + +Pivoting is useful for: +- **Comparison tables**: Side-by-side comparison of metrics across categories +- **Dashboard exports**: Wide-format data for spreadsheets or BI tools +- **Correlation analysis**: Computing correlations between time-series in different columns +- **Report generation**: Creating fixed-width reports with known categories + +:::tip +For unknown or dynamic column lists, you'll need to generate the CASE statements programmatically in your application code. SQL doesn't support dynamic column generation. +::: + +:::info Related Documentation +- [CASE expressions](/docs/reference/sql/case/) +- [SAMPLE BY aggregation](/docs/reference/sql/sample-by/) +- [Aggregation functions](/docs/reference/function/aggregation/) +::: diff --git a/documentation/playbook/sql/advanced/rows-before-after-value-match.md b/documentation/playbook/sql/advanced/rows-before-after-value-match.md new file mode 100644 index 000000000..816a95db8 --- /dev/null +++ b/documentation/playbook/sql/advanced/rows-before-after-value-match.md @@ -0,0 +1,137 @@ +--- +title: Find Rows Before and After Value Match +sidebar_label: Rows before/after match +description: Use LAG and LEAD window functions to access values from surrounding rows +--- + +Access values from rows before and after the current row to find patterns, detect changes, or provide context around events. This is useful for comparing values across adjacent rows or detecting local minimums and maximums. + +## Problem: Need Surrounding Context + +You want to find all rows in the `core_price` table where the bid price is lower than the prices in the surrounding rows (5 rows before and 5 rows after). This helps identify local price drops or troughs in the EURUSD time series. + +## Solution: Use LAG and LEAD Functions + +Use `LAG()` to access rows before the current row and `LEAD()` to access rows after: + +```questdb-sql demo title="Find rows where bid price is lower than surrounding rows" +WITH framed AS ( + SELECT timestamp, bid_price, + LAG(bid_price, 1) OVER () AS bidprice_1up, + LAG(bid_price, 2) OVER () AS bidprice_2up, + LAG(bid_price, 3) OVER () AS bidprice_3up, + LAG(bid_price, 4) OVER () AS bidprice_4up, + LAG(bid_price, 5) OVER () AS bidprice_5up, + LEAD(bid_price, 1) OVER () AS bidprice_1down, + LEAD(bid_price, 2) OVER () AS bidprice_2down, + LEAD(bid_price, 3) OVER () AS bidprice_3down, + LEAD(bid_price, 4) OVER () AS bidprice_4down, + LEAD(bid_price, 5) OVER () AS bidprice_5down + FROM core_price + WHERE timestamp >= dateadd('m', -1, now()) AND symbol = 'EURUSD' +) +SELECT timestamp, bid_price +FROM framed +WHERE bid_price < bidprice_1up AND bid_price < bidprice_2up AND bid_price < bidprice_3up AND bid_price < bidprice_4up AND bid_price < bidprice_5up + AND bid_price < bidprice_1down AND bid_price < bidprice_2down AND bid_price < bidprice_3down AND bid_price < bidprice_4down AND bid_price < bidprice_5down +LIMIT 20; +``` + +This returns all rows where the current bid price is lower than ALL of the surrounding 10 rows (5 before and 5 after), identifying local minimums for EURUSD in the last minute. + +## How It Works + +The query uses a two-step approach: + +1. **Access surrounding rows**: The CTE `framed` uses `LAG()` and `LEAD()` to access values from surrounding rows: + - `LAG(bid_price, N)`: Gets the bid price from N rows **before** the current row + - `LEAD(bid_price, N)`: Gets the bid price from N rows **after** the current row + +2. **Filter for local minimums**: The outer query uses `AND` conditions to find rows where the current price is lower than ALL surrounding prices, identifying true local minimums + +### LAG vs LEAD + +- **`LAG(column, offset)`** - Accesses the value from `offset` rows **before** (earlier in time) +- **`LEAD(column, offset)`** - Accesses the value from `offset` rows **after** (later in time) + +Both functions return `NULL` for rows where the offset goes beyond the dataset boundaries (e.g., `LAG(5)` returns `NULL` for the first 5 rows). + +:::warning Symbol Filter Required +When using window functions without `PARTITION BY`, you must filter by a specific symbol. This ensures the window frame operates on a single symbol's time series, preventing incorrect comparisons across different symbols. +::: + +## Viewing Surrounding Values + +To see all surrounding values for debugging or analysis, select all the LAG/LEAD columns: + +```questdb-sql demo title="Show all surrounding values for inspection" +WITH framed AS ( + SELECT row_number() OVER () as rownum, timestamp, bid_price, + LAG(bid_price, 1) OVER () AS bidprice_1up, + LAG(bid_price, 2) OVER () AS bidprice_2up, + LAG(bid_price, 3) OVER () AS bidprice_3up, + LAG(bid_price, 4) OVER () AS bidprice_4up, + LAG(bid_price, 5) OVER () AS bidprice_5up, + LEAD(bid_price, 1) OVER () AS bidprice_1down, + LEAD(bid_price, 2) OVER () AS bidprice_2down, + LEAD(bid_price, 3) OVER () AS bidprice_3down, + LEAD(bid_price, 4) OVER () AS bidprice_4down, + LEAD(bid_price, 5) OVER () AS bidprice_5down + FROM core_price + WHERE timestamp >= dateadd('m', -1, now()) AND symbol = 'EURUSD' +) +SELECT rownum, timestamp, bid_price, + bidprice_1up, bidprice_2up, bidprice_3up, bidprice_4up, bidprice_5up, + bidprice_1down, bidprice_2down, bidprice_3down, bidprice_4down, bidprice_5down +FROM framed +WHERE bid_price < bidprice_1up AND bid_price < bidprice_2up AND bid_price < bidprice_3up AND bid_price < bidprice_4up AND bid_price < bidprice_5up + AND bid_price < bidprice_1down AND bid_price < bidprice_2down AND bid_price < bidprice_3down AND bid_price < bidprice_4down AND bid_price < bidprice_5down +LIMIT 20; +``` + +This shows each matching row with all its surrounding bid prices as separate columns, making it easy to verify the local minimum detection. + +## Advanced: Checking Against Aggregate Over Large Ranges + +For more complex scenarios where you need to compare against the **maximum or minimum** value across a large range (e.g., 100 rows before and after), you can use the `FIRST_VALUE()` trick with reversed ordering: + +```questdb-sql demo title="Find rows where price is below the max of surrounding 100 rows" +WITH framed AS ( + SELECT timestamp, bid_price, + -- Max of 100 rows before + MAX(bid_price) OVER (ROWS BETWEEN 100 PRECEDING AND 1 PRECEDING) AS max_100_before, + -- Max of 100 rows after (using DESC ordering trick) + MAX(bid_price) OVER (ORDER BY timestamp DESC ROWS BETWEEN 100 PRECEDING AND 1 PRECEDING) AS max_100_after + FROM core_price + WHERE timestamp >= dateadd('h', -1, now()) AND symbol = 'EURUSD' +) +SELECT timestamp, bid_price, max_100_before, max_100_after +FROM framed +WHERE bid_price < max_100_before AND bid_price < max_100_after +LIMIT 20; +``` + +This pattern is useful when you need to: +- Check against **aggregates** (MAX, MIN, AVG) over a range rather than individual values +- Work with **large ranges** (50-100+ rows) where listing individual LAG/LEAD calls would be impractical +- Find rows where the current value is below the maximum or above the minimum in a large window + +### The Reversed Ordering Trick + +To access rows **after** the current row using aggregate functions, use `ORDER BY timestamp DESC`: +- Normal order: `ROWS BETWEEN 100 PRECEDING AND 1 PRECEDING` gives you the 100 rows **before** +- Reversed order: `ORDER BY timestamp DESC ROWS BETWEEN 100 PRECEDING AND 1 PRECEDING` gives you the 100 rows **after** (because descending order reverses what "preceding" means) + +This is a workaround since QuestDB doesn't have `ROWS FOLLOWING` syntax yet. + +:::tip When to Use Each Approach +- **Use LAG/LEAD**: When you need to compare against **specific individual rows** (e.g., the previous 5 rows, the next 3 rows) +- **Use aggregate with window frames**: When you need to compare against an **aggregate value** (MAX, MIN, AVG) over a **large range** of rows (e.g., highest price in the last 100 rows) +::: + +:::info Related Documentation +- [LAG window function](/docs/reference/function/window/#lag) +- [LEAD window function](/docs/reference/function/window/#lead) +- [Window functions overview](/docs/reference/sql/over/) +- [Window frame clauses](/docs/reference/sql/over/#frame-types-and-behavior) +::: diff --git a/documentation/playbook/sql/advanced/sankey-funnel.md b/documentation/playbook/sql/advanced/sankey-funnel.md new file mode 100644 index 000000000..c84cce2ff --- /dev/null +++ b/documentation/playbook/sql/advanced/sankey-funnel.md @@ -0,0 +1,122 @@ +--- +title: Sankey and Funnel Diagrams +sidebar_label: Sankey/funnel diagrams +description: Create session-based analytics for Sankey diagrams and conversion funnels +--- + +Build user journey flow data for Sankey diagrams and conversion funnels by sessionizing event data and tracking state transitions. + +## Problem + +You want to build a user-flow or Sankey diagram to find out which pages contribute visits to others, and in which proportion. You'd like to track elapsed time, number of pages in a single session, entry/exit pages, etc., similar to web analytics tools. + +Your issue is that you only capture a flat table with events, with no concept of session. For analytics purposes, you want to define a session as a visit that was more than 1 hour apart from the last one for the same user. + +Your simplified table schema: + +```sql +CREATE TABLE events ( + visitor_id SYMBOL, + pathname SYMBOL, + timestamp TIMESTAMP, + metric_name SYMBOL +) TIMESTAMP(timestamp) PARTITION BY MONTH WAL; +``` + +## Solution: Session Window Functions + +By combining window functions and `CASE` statements: + +1. Sessionize the data by identifying gaps longer than 1 hour +2. Generate unique session ids for aggregations +3. Assign sequence numbers to each hit within a session +4. Assign the session initial timestamp +5. Check next page in the sequence + +With that, you can count page hits for the next page from current, identify elapsed time between hits or since the start of the session, count sessions per user, or power navigation funnels and Sankey diagrams. + +```questdb-sql demo title="Sessionize events and track page flows" +WITH PrevEvents AS ( + SELECT + visitor_id, + pathname, + timestamp, + first_value(timestamp::long) OVER ( + PARTITION BY visitor_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) AS prev_ts + FROM + events WHERE timestamp > dateadd('d', -7, now()) + AND metric_name = 'page_view' +), VisitorSessions AS ( + SELECT *, + SUM(CASE WHEN datediff('h', timestamp, prev_ts::timestamp)>1 THEN 1 END) + OVER( + PARTITION BY visitor_id + ORDER BY timestamp + ) as local_session_id FROM PrevEvents + +), GlobalSessions AS ( + SELECT visitor_id, pathname, timestamp, prev_ts, + concat(visitor_id, '#', coalesce(local_session_id,0)::int) AS session_id + FROM VisitorSessions +), EventSequences AS ( + SELECT *, row_number() OVER ( + PARTITION BY session_id ORDER BY timestamp + ) as session_sequence, + row_number() OVER ( + PARTITION BY session_id ORDER BY timestamp DESC + ) as reverse_session_sequence, + first_value(timestamp::long) OVER ( + PARTITION BY session_id ORDER BY timestamp + ) as session_ts + FROM GlobalSessions +), EventsFullInfo AS ( + SELECT e1.session_id, e1.session_ts::timestamp as session_ts, e1.visitor_id, + e1.timestamp, e1.pathname, e1.session_sequence, + CASE WHEN e1.session_sequence = 1 THEN true END is_entry_page, + e2.pathname as next_pathname, datediff('T', e1.timestamp, e1.prev_ts::timestamp)::double as elapsed, + e2.reverse_session_sequence, + CASE WHEN e2.reverse_session_sequence = 1 THEN true END is_exit_page + FROM EventSequences e1 + LEFT JOIN EventSequences e2 ON (e1.session_id = e2.session_id) + WHERE e2.session_sequence - e1.session_sequence = 1 +) +SELECT * FROM EventsFullInfo; +``` + +## Visualizing in Grafana + +Format output for Sankey diagram tools: + +```questdb-sql demo title="Sankey diagram data format" +WITH transitions AS ( + SELECT + pathname as current_state, + lag(pathname) OVER (PARTITION BY visitor_id ORDER BY timestamp) as previous_state + FROM events + WHERE timestamp >= dateadd('d', -1, now()) + AND metric_name = 'page_view' +) +SELECT + previous_state as source, + current_state as target, + count(*) as value +FROM transitions +WHERE previous_state IS NOT NULL + AND previous_state != current_state +GROUP BY previous_state, current_state +HAVING count(*) >= 10 +ORDER BY value DESC; +``` + +This format works directly with: +- **Plotly**: `go.Sankey(node=[...], link=[source, target, value])` +- **D3.js**: Standard Sankey input format +- **Grafana Flow plugin**: Source/target/value format + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [LAG function](/docs/reference/function/window/#lag) +- [Grafana integration](/docs/third-party-tools/grafana/) +::: diff --git a/documentation/playbook/sql/advanced/top-n-plus-others.md b/documentation/playbook/sql/advanced/top-n-plus-others.md new file mode 100644 index 000000000..d7b9c9ab2 --- /dev/null +++ b/documentation/playbook/sql/advanced/top-n-plus-others.md @@ -0,0 +1,366 @@ +--- +title: Top N Plus Others Row +sidebar_label: Top N + Others +description: Group query results into top N rows plus an aggregated "Others" row using rank() and CASE expressions +--- + +Create aggregated results showing the top N items individually, with all remaining items combined into a single "Others" row. This pattern is useful for dashboards and reports where you want to highlight the most important items while still showing the total. + +## Problem: Show Top Items Plus Remainder + +You want to display results like: + +| Browser | Count | +|--------------------|-------| +| Chrome | 450 | +| Firefox | 380 | +| Safari | 320 | +| Edge | 280 | +| Opera | 190 | +| -Others- | 380 | ← Combined total of all other browsers + +Instead of listing all browsers (which might be dozens), show the top 5 individually and aggregate the rest. + +## Solution: Use rank() with CASE Statement + +Use `rank()` to identify top N rows, then use `CASE` to group remaining rows: + +```questdb-sql demo title="Top 5 symbols plus Others" +WITH totals AS ( + SELECT + symbol, + count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +), +ranked AS ( + SELECT + *, + rank() OVER (ORDER BY total DESC) as ranking + FROM totals +) +SELECT + CASE + WHEN ranking <= 5 THEN symbol + ELSE '-Others-' + END as symbol, + SUM(total) as total_trades +FROM ranked +GROUP BY 1 +ORDER BY total_trades DESC; +``` + +**Results:** + +| symbol | total_trades | +|------------|--------------| +| BTC-USDT | 15234 | +| ETH-USDT | 12890 | +| SOL-USDT | 8945 | +| MATIC-USDT | 6723 | +| AVAX-USDT | 5891 | +| -Others- | 23456 | ← Sum of all other symbols + +## How It Works + +The query uses a three-step approach: + +1. **Aggregate data** (`totals` CTE): + - Count or sum values by the grouping column + - Creates base data for ranking + +2. **Rank rows** (`ranked` CTE): + - `rank() OVER (ORDER BY total DESC)`: Assigns rank based on count (1 = highest) + - Ties receive the same rank + +3. **Conditional grouping** (outer query): + - `CASE WHEN ranking <= 5`: Keep top 5 with original names + - `ELSE '-Others-'`: Rename all others to "-Others-" + - `SUM(total)`: Aggregate counts (combines all "Others" into one row) + - `GROUP BY 1`: Group by the CASE expression result + +### Understanding rank() + +`rank()` assigns ranks with gaps for ties: + +| symbol | total | rank | +|------------|-------|------| +| BTC-USDT | 1000 | 1 | +| ETH-USDT | 900 | 2 | +| SOL-USDT | 900 | 2 | ← Tie at rank 2 +| AVAX-USDT | 800 | 4 | ← Next rank is 4 (skips 3) +| MATIC-USDT | 700 | 5 | + +If there are ties at the boundary (rank 5), all tied items will be included in top N. + +## Adapting the Pattern + +**Different top N:** +```sql +-- Top 10 instead of top 5 +WHEN ranking <= 10 THEN symbol + +-- Top 3 +WHEN ranking <= 3 THEN symbol +``` + +**Different aggregations:** +```sql +-- Sum instead of count +WITH totals AS ( + SELECT symbol, SUM(amount) as total_volume + FROM trades +) +... +``` + +**Multiple levels:** +```sql +SELECT + CASE + WHEN ranking <= 5 THEN symbol + WHEN ranking <= 10 THEN '-Top 10-' + ELSE '-Others-' + END as category, + SUM(total) as count +FROM ranked +GROUP BY 1; +``` + +Results in three groups: top 5 individual, ranks 6-10 combined, rest combined. + +**Different grouping columns:** +```questdb-sql demo title="Top 5 ECNs plus Others from market data" +WITH totals AS ( + SELECT + ecn, + count() as total + FROM market_data + WHERE timestamp >= dateadd('h', -1, now()) +), +ranked AS ( + SELECT *, rank() OVER (ORDER BY total DESC) as ranking + FROM totals +) +SELECT + CASE WHEN ranking <= 5 THEN ecn ELSE '-Others-' END as ecn, + SUM(total) as message_count +FROM ranked +GROUP BY 1 +ORDER BY message_count DESC; +``` + +**With percentage:** +```questdb-sql demo title="Top 5 symbols with percentage of total" +WITH totals AS ( + SELECT symbol, count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +), +ranked AS ( + SELECT *, rank() OVER (ORDER BY total DESC) as ranking + FROM totals +), +summed AS ( + SELECT SUM(total) as grand_total FROM totals +), +grouped AS ( + SELECT + CASE WHEN ranking <= 5 THEN symbol ELSE '-Others-' END as symbol, + SUM(total) as total_trades + FROM ranked + GROUP BY 1 +) +SELECT + symbol, + total_trades, + round(100.0 * total_trades / grand_total, 2) as percentage +FROM grouped CROSS JOIN summed +ORDER BY total_trades DESC; +``` + +## Alternative: Using row_number() + +If you don't want to handle ties and always want exactly N rows in top tier: + +```sql +WITH totals AS ( + SELECT symbol, count() as total + FROM trades +), +ranked AS ( + SELECT *, row_number() OVER (ORDER BY total DESC) as rn + FROM totals +) +SELECT + CASE WHEN rn <= 5 THEN symbol ELSE '-Others-' END as symbol, + SUM(total) as total_trades +FROM ranked +GROUP BY 1 +ORDER BY total_trades DESC; +``` + +**Difference:** +- `rank()`: May include more than N if there are ties at position N +- `row_number()`: Always exactly N in top tier (breaks ties arbitrarily) + +## Multiple Grouping Columns + +Show top N for multiple dimensions: + +```sql +WITH totals AS ( + SELECT + symbol, + side, + count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +), +ranked AS ( + SELECT + *, + rank() OVER (PARTITION BY side ORDER BY total DESC) as ranking + FROM totals +) +SELECT + side, + CASE WHEN ranking <= 3 THEN symbol ELSE '-Others-' END as symbol, + SUM(total) as total_trades +FROM ranked +GROUP BY side, 2 +ORDER BY side, total_trades DESC; +``` + +This shows top 3 symbols separately for buy and sell sides. + +## Visualization Considerations + +This pattern is particularly useful for charts: + +**Pie/Donut charts:** +```sql +-- Top 5 slices plus "Others" slice +CASE WHEN ranking <= 5 THEN symbol ELSE '-Others-' END +``` + +**Bar charts:** +```sql +-- Top 10 bars, sorted by value +CASE WHEN ranking <= 10 THEN symbol ELSE '-Others-' END +ORDER BY total_trades DESC +``` + +**Time series:** +```questdb-sql demo title="Top 5 symbols over time with Others" +WITH totals AS ( + SELECT + timestamp_floor('h', timestamp) as hour, + symbol, + count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) + SAMPLE BY 1h +), +overall_ranks AS ( + SELECT symbol, SUM(total) as grand_total + FROM totals + GROUP BY symbol +), +ranked_symbols AS ( + SELECT symbol, rank() OVER (ORDER BY grand_total DESC) as ranking + FROM overall_ranks +) +SELECT + t.hour, + CASE WHEN rs.ranking <= 5 THEN t.symbol ELSE '-Others-' END as symbol, + SUM(t.total) as hourly_total +FROM totals t +LEFT JOIN ranked_symbols rs ON t.symbol = rs.symbol +GROUP BY t.hour, 2 +ORDER BY t.hour, hourly_total DESC; +``` + +This shows how top 5 symbols trade over time, with all others combined. + +## Filtering Out Low Values + +Add a minimum threshold to exclude negligible values: + +```sql +WITH totals AS ( + SELECT symbol, count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +), +ranked AS ( + SELECT *, rank() OVER (ORDER BY total DESC) as ranking + FROM totals + WHERE total >= 10 -- Exclude symbols with less than 10 trades +) +SELECT + CASE WHEN ranking <= 5 THEN symbol ELSE '-Others-' END as symbol, + SUM(total) as total_trades +FROM ranked +GROUP BY 1 +ORDER BY total_trades DESC; +``` + +## Performance Tips + +**Pre-filter data:** +```sql +-- Good: Filter before aggregation +WITH totals AS ( + SELECT symbol, count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) -- Filter early + AND symbol IN (SELECT DISTINCT symbol FROM watched_symbols) +) +... + +-- Less efficient: Filter after aggregation +WITH totals AS ( + SELECT symbol, count() as total + FROM trades -- No filter +) +, filtered AS ( + SELECT * FROM totals + WHERE ... -- Late filter +) +... +``` + +**Limit ranking scope:** +```sql +-- If you only need top 5, don't rank beyond what's needed +WITH totals AS ( + SELECT symbol, count() as total + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) + ORDER BY total DESC + LIMIT 100 -- Rank only top 100, not all thousands +) +... +``` + +:::tip Custom Labels +Customize the "Others" label for your domain: +- `-Others-` (generic) +- `~Rest~` (shorter) +- `Other Symbols` (explicit) +- `Remaining Browsers` (domain-specific) + +Choose a label that sorts appropriately and is clear in your context. +::: + +:::warning Empty Others Row +If there are N or fewer distinct values, the "Others" row won't appear (or will have 0 count). Handle this in your visualization logic if needed. +::: + +:::info Related Documentation +- [rank() window function](/docs/reference/function/window/#rank) +- [row_number() window function](/docs/reference/function/window/#row_number) +- [CASE expressions](/docs/reference/sql/case/) +- [Window functions](/docs/reference/sql/over/) +::: diff --git a/documentation/playbook/sql/advanced/unpivot-table.md b/documentation/playbook/sql/advanced/unpivot-table.md new file mode 100644 index 000000000..25e4883bf --- /dev/null +++ b/documentation/playbook/sql/advanced/unpivot-table.md @@ -0,0 +1,177 @@ +--- +title: Unpivoting Query Results +sidebar_label: Unpivoting results +description: Convert wide-format data to long format using UNION ALL +--- + +Transform wide-format data (multiple columns) into long format (rows) using UNION ALL. + +## Problem: Wide Format to Long Format + +You have query results with multiple columns where only one column has a value per row: + +**Wide format (sparse):** + +| timestamp | symbol | buy | sell | +|-----------|-----------|--------|--------| +| 08:10:00 | ETH-USDT | NULL | 3678.25| +| 08:10:00 | ETH-USDT | NULL | 3678.25| +| 08:10:00 | ETH-USDT | 3678.01| NULL | +| 08:10:00 | ETH-USDT | NULL | 3678.00| + +You want to convert this to a format where side and price are explicit: + +**Long format (dense):** + +| timestamp | symbol | side | price | +|-----------|-----------|------|---------| +| 08:10:00 | ETH-USDT | sell | 3678.25 | +| 08:10:00 | ETH-USDT | sell | 3678.25 | +| 08:10:00 | ETH-USDT | buy | 3678.01 | +| 08:10:00 | ETH-USDT | sell | 3678.00 | + +## Solution: UNION ALL with Literal Values + +Use UNION ALL to stack columns as rows, then filter NULL values: + +```questdb-sql demo title="UNPIVOT buy/sell columns to side/price rows" +WITH pivoted AS ( + SELECT + timestamp, + symbol, + CASE WHEN side = 'buy' THEN price END as buy, + CASE WHEN side = 'sell' THEN price END as sell + FROM trades + WHERE timestamp >= dateadd('m', -5, now()) + AND symbol = 'ETH-USDT' +), +unpivoted AS ( + SELECT timestamp, symbol, 'buy' as side, buy as price + FROM pivoted + + UNION ALL + + SELECT timestamp, symbol, 'sell' as side, sell as price + FROM pivoted +) +SELECT * FROM unpivoted +WHERE price IS NOT NULL +ORDER BY timestamp; +``` + +**Results:** + +| timestamp | symbol | side | price | +|-----------|-----------|------|---------| +| 08:10:00 | ETH-USDT | sell | 3678.25 | +| 08:10:00 | ETH-USDT | sell | 3678.25 | +| 08:10:00 | ETH-USDT | buy | 3678.01 | +| 08:10:00 | ETH-USDT | sell | 3678.00 | + +## How It Works + +### Step 1: Create Wide Format (if needed) + +If your data is already in narrow format, you may need to pivot first: + +```sql +CASE WHEN side = 'buy' THEN price END as buy, +CASE WHEN side = 'sell' THEN price END as sell +``` + +This creates NULL values for the opposite side. + +### Step 2: UNION ALL + +```sql +SELECT timestamp, symbol, 'buy' as side, buy as price FROM pivoted +UNION ALL +SELECT timestamp, symbol, 'sell' as side, sell as price FROM pivoted +``` + +This creates two copies of every row: +- First copy: Has 'buy' literal with buy column value +- Second copy: Has 'sell' literal with sell column value + +### Step 3: Filter NULLs + +```sql +WHERE price IS NOT NULL +``` + +Removes rows where the price column is NULL (the opposite side). + +## Unpivoting Multiple Columns + +Transform multiple numeric columns to name-value pairs: + +```questdb-sql demo title="UNPIVOT sensor readings" +WITH sensor_data AS ( + SELECT + timestamp, + sensor_id, + temperature, + humidity, + pressure + FROM sensors + WHERE timestamp >= dateadd('h', -1, now()) +) +SELECT timestamp, sensor_id, 'temperature' as metric, temperature as value FROM sensor_data +WHERE temperature IS NOT NULL + +UNION ALL + +SELECT timestamp, sensor_id, 'humidity' as metric, humidity as value FROM sensor_data +WHERE humidity IS NOT NULL + +UNION ALL + +SELECT timestamp, sensor_id, 'pressure' as metric, pressure as value FROM sensor_data +WHERE pressure IS NOT NULL + +ORDER BY timestamp, sensor_id, metric; +``` + +**Results:** + +| timestamp | sensor_id | metric | value | +|-----------|-----------|-------------|-------| +| 10:00:00 | S001 | humidity | 65.2 | +| 10:00:00 | S001 | pressure | 1013.2| +| 10:00:00 | S001 | temperature | 22.5 | + +## Performance Considerations + +**UNION ALL vs UNION:** +```sql +-- Fast: UNION ALL (no deduplication) +SELECT ... UNION ALL SELECT ... + +-- Slower: UNION (deduplicates rows) +SELECT ... UNION SELECT ... +``` + +Always use `UNION ALL` for unpivoting unless you specifically need deduplication. + +## Reverse: Pivot (Long to Wide) + +To go back from long to wide format, use aggregation with CASE: + +```sql +SELECT + timestamp, + sensor_id, + MAX(CASE WHEN metric = 'temperature' THEN value END) as temperature, + MAX(CASE WHEN metric = 'humidity' THEN value END) as humidity, + MAX(CASE WHEN metric = 'pressure' THEN value END) as pressure +FROM sensor_readings_long +GROUP BY timestamp, sensor_id; +``` + +See the [Pivoting](/docs/playbook/sql/advanced/pivot-table/) guide for more details. + +:::info Related Documentation +- [UNION](/docs/reference/sql/union-except-intersect/) +- [CASE expressions](/docs/reference/sql/case/) +- [Pivoting (opposite operation)](/docs/playbook/sql/advanced/pivot-table/) +::: diff --git a/documentation/playbook/sql/finance/bollinger-bands.md b/documentation/playbook/sql/finance/bollinger-bands.md new file mode 100644 index 000000000..7822ba4a9 --- /dev/null +++ b/documentation/playbook/sql/finance/bollinger-bands.md @@ -0,0 +1,175 @@ +--- +title: Bollinger Bands +sidebar_label: Bollinger Bands +description: Calculate Bollinger Bands using window functions for volatility analysis and mean reversion trading strategies +--- + +Calculate Bollinger Bands for volatility analysis and mean reversion trading. Bollinger Bands consist of a moving average with upper and lower bands set at a specified number of standard deviations above and below it. They help identify overbought/oversold conditions and measure market volatility. + +## Problem: Calculate Rolling Bands with Standard Deviation + +You want to calculate Bollinger Bands with a 20-period simple moving average (SMA) and bands at ±2 standard deviations. The challenge is that QuestDB doesn't support `STDDEV` as a window function, so you need a workaround using the mathematical relationship between variance and standard deviation. + +## Solution: Calculate Variance Using Window Functions + +Since standard deviation is the square root of variance, and variance is the average of squared differences from the mean, we can calculate it using window functions: + +```questdb-sql demo title="Calculate Bollinger Bands with 20-period SMA" +WITH OHLC AS ( + SELECT + timestamp, symbol, + first(price) AS open, + max(price) as high, + min(price) as low, + last(price) AS close, + sum(amount) AS volume + FROM trades + WHERE symbol = 'BTC-USDT' AND timestamp IN yesterday() + SAMPLE BY 15m +), stats AS ( + SELECT + timestamp, + close, + AVG(close) OVER ( + ORDER BY timestamp + ROWS BETWEEN 19 PRECEDING AND CURRENT ROW + ) AS sma20, + AVG(close * close) OVER ( + ORDER BY timestamp + ROWS BETWEEN 19 PRECEDING AND CURRENT ROW + ) AS avg_close_sq + FROM OHLC +) +SELECT + timestamp, + close, + sma20, + sqrt(avg_close_sq - (sma20 * sma20)) as stdev20, + sma20 + 2 * sqrt(avg_close_sq - (sma20 * sma20)) as upper_band, + sma20 - 2 * sqrt(avg_close_sq - (sma20 * sma20)) as lower_band +FROM stats +ORDER BY timestamp; +``` + +This query: +1. Aggregates trades into 15-minute OHLC candles +2. Calculates a 20-period simple moving average of closing prices +3. Calculates the average of squared closing prices over the same 20-period window +4. Computes standard deviation using the mathematical identity: `σ = √(E[X²] - E[X]²)` +5. Adds/subtracts 2× standard deviation to create upper and lower bands + +## How It Works + +The mathematical relationship used here is: + +``` +Variance(X) = E[X²] - (E[X])² +StdDev(X) = √(E[X²] - (E[X])²) +``` + +Where: +- `E[X]` is the average (SMA) of closing prices +- `E[X²]` is the average of squared closing prices +- `√` is the square root function + +Breaking down the calculation: +1. **`AVG(close)`**: Simple moving average over 20 periods +2. **`AVG(close * close)`**: Average of squared prices over 20 periods +3. **`sqrt(avg_close_sq - (sma20 * sma20))`**: Standard deviation derived from variance +4. **Upper/Lower bands**: SMA ± (multiplier × standard deviation) + +### Window Frame Clause + +`ROWS BETWEEN 19 PRECEDING AND CURRENT ROW` creates a sliding window of exactly 20 rows (19 previous + current), which gives us the 20-period moving calculations required for standard Bollinger Bands. + +## Adapting the Parameters + +**Different period lengths:** +```sql +-- 10-period Bollinger Bands (change 19 to 9) +AVG(close) OVER (ORDER BY timestamp ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS sma10, +AVG(close * close) OVER (ORDER BY timestamp ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS avg_close_sq +``` + +**Different band multipliers:** +```sql +-- 1 standard deviation bands (tighter) +sma20 + 1 * sqrt(avg_close_sq - (sma20 * sma20)) as upper_band, +sma20 - 1 * sqrt(avg_close_sq - (sma20 * sma20)) as lower_band + +-- 3 standard deviation bands (wider) +sma20 + 3 * sqrt(avg_close_sq - (sma20 * sma20)) as upper_band, +sma20 - 3 * sqrt(avg_close_sq - (sma20 * sma20)) as lower_band +``` + +**Different time intervals:** +```sql +-- 5-minute candles +SAMPLE BY 5m + +-- 1-hour candles +SAMPLE BY 1h +``` + +**Multiple symbols:** +```questdb-sql demo title="Bollinger Bands for multiple symbols" +WITH OHLC AS ( + SELECT + timestamp, symbol, + first(price) AS open, + last(price) AS close, + sum(amount) AS volume + FROM trades + WHERE symbol IN ('BTC-USDT', 'ETH-USDT') + AND timestamp IN yesterday() + SAMPLE BY 15m +), stats AS ( + SELECT + timestamp, + symbol, + close, + AVG(close) OVER ( + PARTITION BY symbol + ORDER BY timestamp + ROWS BETWEEN 19 PRECEDING AND CURRENT ROW + ) AS sma20, + AVG(close * close) OVER ( + PARTITION BY symbol + ORDER BY timestamp + ROWS BETWEEN 19 PRECEDING AND CURRENT ROW + ) AS avg_close_sq + FROM OHLC +) +SELECT + timestamp, + symbol, + close, + sma20, + sma20 + 2 * sqrt(avg_close_sq - (sma20 * sma20)) as upper_band, + sma20 - 2 * sqrt(avg_close_sq - (sma20 * sma20)) as lower_band +FROM stats +ORDER BY symbol, timestamp; +``` + +Note the addition of `PARTITION BY symbol` to calculate separate Bollinger Bands for each symbol. + +:::tip Trading Signals +- **Bollinger Squeeze**: When bands narrow, it indicates low volatility and often precedes significant price moves +- **Band Walk**: Price consistently touching the upper band suggests strong uptrend; lower band suggests downtrend +- **Mean Reversion**: Price touching or exceeding bands often signals potential reversals back to the mean +- **Volatility Measure**: Width between bands indicates market volatility - wider bands mean higher volatility +::: + +:::tip Parameter Selection +- **Standard settings**: 20-period SMA with 2σ bands (captures ~95% of price action) +- **Day trading**: Use shorter periods (10 or 15) for more responsive bands +- **Swing trading**: Use standard 20-period or longer (50-period) for smoother signals +- **Volatility adjustment**: Use 2.5σ or 3σ bands in highly volatile markets +::: + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [AVG window function](/docs/reference/function/window/#avg) +- [SQRT function](/docs/reference/function/numeric/#sqrt) +- [Window frame clauses](/docs/reference/sql/over/#frame-types-and-behavior) +::: diff --git a/documentation/playbook/sql/finance/compound-interest.md b/documentation/playbook/sql/finance/compound-interest.md new file mode 100644 index 000000000..a8882e530 --- /dev/null +++ b/documentation/playbook/sql/finance/compound-interest.md @@ -0,0 +1,112 @@ +--- +title: Calculate Compound Interest +sidebar_label: Compound interest +description: Calculate compound interest over time using POWER and window functions +--- + +Calculate compound interest over multiple periods using SQL, where each period's interest is calculated on the previous period's ending balance. This is useful for financial modeling, investment projections, and interest calculations. + +:::info Generated Data +This query uses generated data from `long_sequence()` to create a time series of years, so it can run directly on the demo instance without requiring any existing tables. +::: + +## Problem: Need Year-by-Year Growth + +You want to calculate compound interest over 5 years, starting with an initial principal of 1000, with an annual interest rate of 0.1 (10%). Each year's interest should be calculated on the previous year's ending balance. + +## Solution: Use POWER Function with Window Functions + +Combine the `POWER()` function with `FIRST_VALUE()` window function to calculate compound interest: + +```questdb-sql demo title="Calculate compound interest over 5 years" +WITH +year_series AS ( + SELECT 2000 as start_year, 2000 + (x - 1) AS timestamp, + 0.1 AS interest_rate, 1000.0 as initial_principal + FROM long_sequence(5) +), +compounded_values AS ( + SELECT + timestamp, + initial_principal, + interest_rate, + initial_principal * + POWER( + 1 + interest_rate, + timestamp - start_year + 1 + ) AS compounding + FROM + year_series +), compounding_year_before AS ( +SELECT + timestamp, + initial_principal, + interest_rate, + FIRST_VALUE(cv.compounding) + OVER ( + ORDER BY timestamp + ROWS between 1 preceding and 1 preceding + ) AS year_principal, + cv.compounding as compounding_amount +FROM + compounded_values cv +ORDER BY + timestamp + ) +select timestamp, initial_principal, interest_rate, +coalesce(year_principal, initial_principal) as year_principal, +compounding_amount +from compounding_year_before +``` + +**Results:** + +| timestamp | initial_principal | interest_rate | year_principal | compounding_amount | +|-----------|-------------------|---------------|----------------|-------------------| +| 2000 | 1000.0 | 0.1 | 1000.0 | 1100.0 | +| 2001 | 1000.0 | 0.1 | 1100.0 | 1210.0 | +| 2002 | 1000.0 | 0.1 | 1210.0 | 1331.0 | +| 2003 | 1000.0 | 0.1 | 1331.0 | 1464.1 | +| 2004 | 1000.0 | 0.1 | 1464.1 | 1610.51 | + +Each row shows how the principal grows year over year, with interest compounding on the previous year's ending balance. + +## How It Works + +The query uses a multi-step CTE approach: + +1. **Generate year series**: Use `long_sequence(5)` to create 5 rows representing years 2000-2004 +2. **Calculate compound amount**: Use `POWER(1 + interest_rate, years)` to compute the ending balance for each year +3. **Get previous year's balance**: Use `FIRST_VALUE()` with window frame `ROWS between 1 preceding and 1 preceding` to access the previous row's compounding amount +4. **Handle first year**: Use `COALESCE()` to show the initial principal for the first year + +The `POWER()` function calculates the compound interest formula: `principal * (1 + rate)^periods` + +## Customizing the Calculation + +You can modify the parameters: +- **Start year**: Change `2000` to your desired start year (appears twice in the query) +- **Initial principal**: Change `1000.0` to your starting amount +- **Interest rate**: Change `0.1` to your rate (0.1 = 10%) +- **Number of periods**: Change `long_sequence(5)` to your desired number of years + +```questdb-sql demo title="Example with different parameters" +WITH +year_series AS ( + SELECT 2025 as start_year, 2025 + (x - 1) AS timestamp, + 0.05 AS interest_rate, 5000.0 as initial_principal + FROM long_sequence(10) +), +-- ... rest of query remains the same +``` + +:::tip +For more complex scenarios like monthly or quarterly compounding, adjust the time period generation and the exponent in the POWER function accordingly. +::: + +:::info Related Documentation +- [POWER function](/docs/reference/function/numeric/#power) +- [Window functions](/docs/reference/sql/over/) +- [FIRST_VALUE window function](/docs/reference/function/window/#first_value) +- [long_sequence](/docs/reference/function/row-generator/#long_sequence) +::: diff --git a/documentation/playbook/sql/finance/cumulative-product.md b/documentation/playbook/sql/finance/cumulative-product.md new file mode 100644 index 000000000..be4053f9a --- /dev/null +++ b/documentation/playbook/sql/finance/cumulative-product.md @@ -0,0 +1,123 @@ +--- +title: Cumulative Product for Random Walk +sidebar_label: Cumulative product +description: Calculate cumulative product to simulate stock price paths from daily returns +--- + +Calculate the cumulative product of daily returns to simulate a stock's price path (random walk). This is useful for financial modeling, backtesting trading strategies, and portfolio analysis where you need to compound returns over time. + +## Problem: Compound Daily Returns + +You have a table with daily returns for a stock and want to calculate the cumulative price starting from an initial value (e.g., $100). Each day's price is calculated by multiplying the previous price by `(1 + return)`. + +For example, with these daily returns: + +| Date | Daily Return (%) | +|------------|------------------| +| 2024-09-05 | 2.00 | +| 2024-09-06 | -1.00 | +| 2024-09-07 | 1.50 | +| 2024-09-08 | -3.00 | + +You want to calculate: + +| Date | Daily Return (%) | Stock Price | +|------------|------------------|-------------| +| 2024-09-05 | 2.00 | 102.00 | +| 2024-09-06 | -1.00 | 100.98 | +| 2024-09-07 | 1.50 | 102.49 | +| 2024-09-08 | -3.00 | 99.42 | + +## Solution: Use Logarithm Mathematics Trick + +Since QuestDB doesn't allow functions on top of window function results, we use a mathematical trick: **the exponential of the sum of logarithms equals the product**. + +```questdb-sql demo title="Calculate cumulative product via logarithms" +WITH ln_values AS ( + SELECT + date, + return, + SUM(ln(1 + return)) OVER (ORDER BY date) AS ln_value + FROM daily_returns +) +SELECT + date, + return, + 100 * exp(ln_value) AS stock_price +FROM ln_values; +``` + +This query: +1. Calculates `ln(1 + return)` for each day +2. Uses a cumulative `SUM` window function to add up the logarithms +3. Applies `exp()` to convert back to the product + +## How It Works + +The mathematical identity used here is: + +``` +product(1 + r₁, 1 + r₂, ..., 1 + rₙ) = exp(sum(ln(1 + r₁), ln(1 + r₂), ..., ln(1 + rₙ))) +``` + +Breaking it down: +- `ln(1 + return)` converts each multiplicative factor to an additive one +- `SUM(...) OVER (ORDER BY date)` creates a cumulative sum +- `exp(ln_value)` converts the cumulative sum back to a cumulative product +- Multiply by 100 to apply the starting price of $100 + +### Why This Works + +QuestDB doesn't support direct window functions like `PRODUCT() OVER()`, and attempting `exp(SUM(ln(1 + return)) OVER ())` fails with a "dangling literal" error because you can't nest functions around window functions. + +The workaround is to use a CTE to compute the cumulative sum first, then apply `exp()` in the outer query where it's operating on a regular column, not a window function result. + +## Adapting to Your Data + +You can easily modify this pattern: + +**Different starting price:** +```sql +SELECT date, return, 1000 * exp(ln_value) AS stock_price -- Start at $1000 +FROM ln_values; +``` + +**Different time granularity:** +```sql +-- For hourly returns +WITH ln_values AS ( + SELECT + timestamp, + return, + SUM(ln(1 + return)) OVER (ORDER BY timestamp) AS ln_value + FROM hourly_returns +) +SELECT timestamp, 100 * exp(ln_value) AS price FROM ln_values; +``` + +**Multiple assets:** +```sql +WITH ln_values AS ( + SELECT + date, + symbol, + return, + SUM(ln(1 + return)) OVER (PARTITION BY symbol ORDER BY date) AS ln_value + FROM daily_returns +) +SELECT + date, + symbol, + 100 * exp(ln_value) AS stock_price +FROM ln_values; +``` + +:::tip Use Case: Monte Carlo Simulation +This pattern is essential for Monte Carlo simulations in finance. Generate random returns, apply this cumulative product calculation, and run thousands of iterations to model possible future price paths. +::: + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [Mathematical functions](/docs/reference/function/numeric/) +- [SUM aggregate](/docs/reference/function/aggregation/#sum) +::: diff --git a/documentation/playbook/sql/finance/rolling-stddev.md b/documentation/playbook/sql/finance/rolling-stddev.md new file mode 100644 index 000000000..3769ef0cc --- /dev/null +++ b/documentation/playbook/sql/finance/rolling-stddev.md @@ -0,0 +1,73 @@ +--- +title: Rolling Standard Deviation +sidebar_label: Rolling std dev +description: Calculate rolling standard deviation using window functions and CTEs +--- + +Calculate rolling standard deviation to measure price volatility over time. + +## Problem + +You want to calculate the standard deviation in a time window. QuestDB supports stddev as an aggregate function, but not as a window function. + +## Solution + +The standard deviation can be calculated from the variance, which is the average of the square differences from the mean. + +In general we could write it in SQL like this: + +```sql +SELECT + symbol, + price, + AVG(price) OVER (PARTITION BY symbol ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rolling_mean, + SQRT(AVG(POWER(price - AVG(price) OVER (PARTITION BY symbol ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2)) + OVER (PARTITION BY symbol ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS rolling_stddev +FROM + trades +WHERE timestamp IN yesterday() +``` + +But in QuestDB we cannot do any operations on the return value of a window function, so we need to do this using CTEs: + +```questdb-sql demo title="Calculate rolling standard deviation" +WITH rolling_avg_cte AS ( + SELECT + timestamp, + symbol, + price, + AVG(price) OVER (PARTITION BY symbol ORDER BY timestamp) AS rolling_avg + FROM + trades + WHERE + timestamp IN yesterday() +), +variance_cte AS ( + SELECT + timestamp, + symbol, + price, + rolling_avg, + AVG(POWER(price - rolling_avg, 2)) OVER (PARTITION BY symbol ORDER BY timestamp) AS rolling_variance + FROM + rolling_avg_cte +) +SELECT + timestamp, + symbol, + price, + rolling_avg, + rolling_variance, + SQRT(rolling_variance) AS rolling_stddev +FROM + variance_cte; +``` + +I first get the rolling average/mean, then from that I get the variance, and then I can do the `sqrt` to get the standard deviation as requested. + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [AVG window function](/docs/reference/function/window/#avg) +- [POWER function](/docs/reference/function/numeric/#power) +- [SQRT function](/docs/reference/function/numeric/#sqrt) +::: diff --git a/documentation/playbook/sql/finance/tick-trin.md b/documentation/playbook/sql/finance/tick-trin.md new file mode 100644 index 000000000..07621c1aa --- /dev/null +++ b/documentation/playbook/sql/finance/tick-trin.md @@ -0,0 +1,191 @@ +--- +title: Cumulative Tick and Trin Indicators +sidebar_label: Tick & Trin +description: Calculate cumulative Tick and Trin (ARMS Index) for market sentiment analysis and breadth indicators +--- + +Calculate cumulative Tick and Trin (also known as the ARMS Index) to measure market sentiment and breadth. These indicators compare advancing versus declining trades in terms of both count and volume, helping identify overbought/oversold conditions and potential market reversals. + +## Problem: Calculate Running Market Breadth + +You have a table with trade data including `side` (buy/sell) and `amount`, and want to calculate cumulative Tick and Trin values throughout the trading day. Tick measures the ratio of upticks to downticks, while Trin (Trading Index) adjusts this ratio by volume to identify divergences between price action and volume. + +**Sample data:** + +| timestamp | side | amount | +|------------------------------|------|--------| +| 2023-12-01T10:00:00.000000Z | sell | 100 | +| 2023-12-01T10:01:00.000000Z | buy | 50 | +| 2023-12-01T10:02:00.000000Z | sell | 150 | +| 2023-12-01T10:03:00.000000Z | buy | 100 | +| 2023-12-01T10:04:00.000000Z | buy | 200 | + +## Solution: Use Window Functions with CASE Statements + +Use `SUM` as a window function combined with `CASE` statements to compute running totals of upticks, downticks, and their respective volumes: + +```questdb-sql demo title="Calculate cumulative Tick and Trin indicators" +WITH tick_vol AS ( + SELECT + timestamp, + side, + amount, + SUM(CASE WHEN side = 'sell' THEN 1.0 END) OVER (ORDER BY timestamp) as downtick, + SUM(CASE WHEN side = 'buy' THEN 1.0 END) OVER (ORDER BY timestamp) as uptick, + SUM(CASE WHEN side = 'sell' THEN amount END) OVER (ORDER BY timestamp) as downvol, + SUM(CASE WHEN side = 'buy' THEN amount END) OVER (ORDER BY timestamp) as upvol + FROM trades + WHERE timestamp IN yesterday() AND symbol = 'BTC-USDT' +) +SELECT + timestamp, + side, + amount, + uptick, + downtick, + upvol, + downvol, + uptick / downtick as tick, + (uptick / downtick) / (upvol / downvol) as trin +FROM tick_vol; +``` + +**Results:** + +| timestamp | side | amount | downtick | uptick | downvol | upvol | tick | trin | +|------------------------------|------|--------|----------|--------|---------|-------|------|----------------| +| 2023-12-01T10:00:00.000000Z | sell | 100.0 | 1.0 | NULL | 100.0 | NULL | NULL | NULL | +| 2023-12-01T10:01:00.000000Z | buy | 50.0 | 1.0 | 1.0 | 100.0 | 50.0 | 1.0 | 2.0 | +| 2023-12-01T10:02:00.000000Z | sell | 150.0 | 2.0 | 1.0 | 250.0 | 50.0 | 0.5 | 2.5 | +| 2023-12-01T10:03:00.000000Z | buy | 100.0 | 2.0 | 2.0 | 250.0 | 150.0 | 1.0 | 1.666666666666 | +| 2023-12-01T10:04:00.000000Z | buy | 200.0 | 2.0 | 3.0 | 250.0 | 350.0 | 1.5 | 1.071428571428 | + +Each row shows the cumulative values from the start of the day, with Tick and Trin calculated at every trade. + +## How It Works + +The indicators are calculated using these formulas: + +``` +Tick = Upticks / Downticks + +Trin = (Upticks / Downticks) / (Upvol / Downvol) + = Tick / Volume Ratio +``` + +Where: +- **Upticks**: Cumulative count of buy transactions +- **Downticks**: Cumulative count of sell transactions +- **Upvol**: Cumulative volume of buy transactions +- **Downvol**: Cumulative volume of sell transactions + +The query uses: +1. **Window functions**: `SUM(...) OVER (ORDER BY timestamp)` creates running totals from the start of the period +2. **CASE statements**: Conditionally sum only trades matching the specified side +3. **Type casting**: Using `1.0` instead of `1` ensures results are doubles, avoiding explicit casting + +### Interpreting the Indicators + +**Tick Indicator:** +- **Tick > 1.0**: More buying pressure (bullish sentiment) +- **Tick < 1.0**: More selling pressure (bearish sentiment) +- **Tick = 1.0**: Neutral market (equal buying and selling) + +**Trin (ARMS Index):** +- **Trin < 1.0**: Strong market (volume flowing into advancing trades) +- **Trin > 1.0**: Weak market (volume flowing into declining trades) +- **Trin = 1.0**: Balanced market +- **Extreme readings**: Trin > 2.0 suggests oversold conditions; Trin < 0.5 suggests overbought + +**Divergences:** +When Tick and Trin move in opposite directions, it can signal important market conditions: +- High Tick + High Trin: Advances lack volume confirmation (bearish divergence) +- Low Tick + Low Trin: Declines lack volume confirmation (bullish divergence) + +## Adapting the Query + +**Multiple symbols:** +```questdb-sql demo title="Tick and Trin for multiple symbols" +WITH tick_vol AS ( + SELECT + timestamp, + symbol, + side, + amount, + SUM(CASE WHEN side = 'sell' THEN 1.0 END) + OVER (PARTITION BY symbol ORDER BY timestamp) as downtick, + SUM(CASE WHEN side = 'buy' THEN 1.0 END) + OVER (PARTITION BY symbol ORDER BY timestamp) as uptick, + SUM(CASE WHEN side = 'sell' THEN amount END) + OVER (PARTITION BY symbol ORDER BY timestamp) as downvol, + SUM(CASE WHEN side = 'buy' THEN amount END) + OVER (PARTITION BY symbol ORDER BY timestamp) as upvol + FROM trades + WHERE timestamp IN yesterday() +) +SELECT + timestamp, + symbol, + uptick / downtick as tick, + (uptick / downtick) / (upvol / downvol) as trin +FROM tick_vol; +``` + +**Intraday periods (reset at intervals):** +```questdb-sql demo title="Tick and Trin reset every hour" +WITH tick_vol AS ( + SELECT + timestamp, + side, + amount, + SUM(CASE WHEN side = 'sell' THEN 1.0 END) + OVER (PARTITION BY timestamp_floor('h', timestamp) ORDER BY timestamp) as downtick, + SUM(CASE WHEN side = 'buy' THEN 1.0 END) + OVER (PARTITION BY timestamp_floor('h', timestamp) ORDER BY timestamp) as uptick, + SUM(CASE WHEN side = 'sell' THEN amount END) + OVER (PARTITION BY timestamp_floor('h', timestamp) ORDER BY timestamp) as downvol, + SUM(CASE WHEN side = 'buy' THEN amount END) + OVER (PARTITION BY timestamp_floor('h', timestamp) ORDER BY timestamp) as upvol + FROM trades + WHERE timestamp IN yesterday() AND symbol = 'BTC-USDT' +) +SELECT + timestamp, + uptick / downtick as tick, + (uptick / downtick) / (upvol / downvol) as trin +FROM tick_vol; +``` + +**Daily summary values only:** +```sql +WITH tick_vol AS ( + SELECT + SUM(CASE WHEN side = 'sell' THEN 1.0 END) as downtick, + SUM(CASE WHEN side = 'buy' THEN 1.0 END) as uptick, + SUM(CASE WHEN side = 'sell' THEN amount END) as downvol, + SUM(CASE WHEN side = 'buy' THEN amount END) as upvol + FROM trades + WHERE timestamp IN yesterday() +) +SELECT + uptick / downtick as tick, + (uptick / downtick) / (upvol / downvol) as trin +FROM tick_vol; +``` + +:::tip Market Analysis Applications +- **Intraday momentum**: Track Tick throughout the day to identify accumulation/distribution patterns +- **Overbought/oversold**: Extreme Trin readings often precede short-term reversals +- **Market breadth**: Persistently high/low values indicate broad market strength or weakness +- **Divergence trading**: When price makes new highs/lows but Trin doesn't confirm, it suggests weakening momentum +::: + +:::warning Handling NULL Values +The first buy or sell transaction will produce NULL values for some calculations since there's no previous opposite-side transaction yet. You can filter these out with `WHERE uptick IS NOT NULL AND downtick IS NOT NULL` if needed. +::: + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [SUM aggregate](/docs/reference/function/aggregation/#sum) +- [CASE expressions](/docs/reference/sql/case/) +::: diff --git a/documentation/playbook/sql/finance/volume-profile.md b/documentation/playbook/sql/finance/volume-profile.md new file mode 100644 index 000000000..69c750968 --- /dev/null +++ b/documentation/playbook/sql/finance/volume-profile.md @@ -0,0 +1,54 @@ +--- +title: Volume Profile +sidebar_label: Volume profile +description: Calculate volume profile by grouping trades into price bins +--- + +Calculate volume profile to show the distribution of trading volume across different price levels. + +## Solution + +Group trades into price bins using `FLOOR` and a tick size parameter: + +```questdb-sql demo title="Calculate volume profile with fixed tick size" +DECLARE @tick_size := 1.0 +SELECT + floor(price / @tick_size) * @tick_size AS price_bin, + round(SUM(amount), 2) AS volume +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp IN today() +ORDER BY price_bin; +``` + +Since QuestDB does an implicit GROUP BY on all non-aggregated columns, you can omit the explicit GROUP BY clause. + +## Dynamic Tick Size + +For consistent histograms across different price ranges, calculate the tick size dynamically to always produce approximately 50 bins: + +```questdb-sql demo title="Volume profile with dynamic 50-bin distribution" +WITH raw_data AS ( + SELECT price, amount + FROM trades + WHERE symbol = 'BTC-USDT' AND timestamp IN today() +), +tick_size AS ( + SELECT (max(price) - min(price)) / 49 as tick_size + FROM raw_data +) +SELECT + floor(price / tick_size) * tick_size AS price_bin, + round(SUM(amount), 2) AS volume +FROM raw_data CROSS JOIN tick_size +ORDER BY 1; +``` + +This will produce a histogram with a maximum of 50 buckets. If you have enough price difference between the first and last price for the interval, and if there are enough events with different prices, then you will get the full 50 buckets. If price difference is too small or if there are buckets with no events, then you might get less than 50. + +:::info Related Documentation +- [FLOOR function](/docs/reference/function/numeric/#floor) +- [SUM aggregate](/docs/reference/function/aggregation/#sum) +- [DECLARE variables](/docs/reference/sql/declare/) +- [CROSS JOIN](/docs/reference/sql/join/#cross-join) +::: diff --git a/documentation/playbook/sql/finance/volume-spike.md b/documentation/playbook/sql/finance/volume-spike.md new file mode 100644 index 000000000..340cad85d --- /dev/null +++ b/documentation/playbook/sql/finance/volume-spike.md @@ -0,0 +1,53 @@ +--- +title: Volume Spike Detection +sidebar_label: Volume spikes +description: Detect volume spikes by comparing current volume against previous volume using LAG +--- + +Detect volume spikes by comparing current trading volume against the previous candle's volume. + +## Problem + +You have candles aggregated at 30 seconds intervals, and you want to show a flag 'spike' if volume is bigger than twice the latest record for the same symbol. Otherwise it should display 'normal'. + +## Solution + +Use the `LAG` window function to retrieve the previous candle's volume, then compare with a `CASE` statement: + +```questdb-sql demo title="Detect volume spikes exceeding 2x previous volume" +DECLARE + @anchor_date := timestamp_floor('30s', now()), + @start_date := dateadd('h', -7, @anchor_date), + @symbol := 'BTC-USDT' +WITH candles AS ( + SELECT + timestamp, + symbol, + sum(amount) AS volume + FROM trades + WHERE timestamp >= @start_date + AND symbol = @symbol + SAMPLE BY 30s +), +prev_volumes AS ( + SELECT + timestamp, + symbol, + volume, + LAG(volume) OVER (PARTITION BY symbol ORDER BY timestamp) AS prev_volume + FROM candles +) +SELECT + *, + CASE + WHEN volume > 2 * prev_volume THEN 'spike' + ELSE 'normal' + END AS spike_flag +FROM prev_volumes; +``` + +:::info Related Documentation +- [LAG window function](/docs/reference/function/window/#lag) +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [CASE expressions](/docs/reference/sql/case/) +::: diff --git a/documentation/playbook/sql/finance/vwap.md b/documentation/playbook/sql/finance/vwap.md new file mode 100644 index 000000000..9f3c30ced --- /dev/null +++ b/documentation/playbook/sql/finance/vwap.md @@ -0,0 +1,130 @@ +--- +title: Volume Weighted Average Price (VWAP) +sidebar_label: VWAP +description: Calculate cumulative volume weighted average price using window functions for intraday trading analysis +--- + +Calculate the cumulative Volume Weighted Average Price (VWAP) for intraday trading analysis. VWAP is a trading benchmark that represents the average price at which an asset has traded throughout the day, weighted by volume. It's widely used by institutional traders to assess execution quality and identify trend strength. + +## Problem: Calculate Running VWAP + +You want to calculate the cumulative VWAP for a trading day, where each point shows the average price weighted by volume from market open until that moment. This helps traders determine if current prices are above or below the day's volume-weighted average. + +## Solution: Use Window Functions for Cumulative Sums + +While QuestDB doesn't have a built-in VWAP window function, we can calculate it using cumulative `SUM` window functions for both traded value and volume: + +```questdb-sql demo title="Calculate cumulative VWAP over 10-minute intervals" +WITH sampled AS ( + SELECT + timestamp, symbol, + SUM(amount) AS volume, + SUM(price * amount) AS traded_value + FROM trades + WHERE timestamp IN yesterday() + AND symbol = 'BTC-USDT' + SAMPLE BY 10m +), cumulative AS ( + SELECT timestamp, symbol, + SUM(traded_value) + OVER (ORDER BY timestamp) AS cumulative_value, + SUM(volume) + OVER (ORDER BY timestamp) AS cumulative_volume + FROM sampled +) +SELECT timestamp, symbol, cumulative_value/cumulative_volume AS vwap FROM cumulative; +``` + +This query: +1. Aggregates trades into 10-minute intervals, calculating total volume and total traded value (price × amount) for each interval +2. Uses window functions to compute running totals of both traded value and volume from the start of the day +3. Divides cumulative traded value by cumulative volume to get VWAP at each timestamp + +## How It Works + +VWAP is calculated as: + +``` +VWAP = Total Traded Value / Total Volume + = Σ(Price × Volume) / Σ(Volume) +``` + +The key insight is using `SUM(...) OVER (ORDER BY timestamp)` to create running totals: +- `cumulative_value`: Running sum of (price × amount) from market open +- `cumulative_volume`: Running sum of volume from market open +- Final VWAP: Dividing these cumulative values gives the volume-weighted average at each point in time + +### Window Function Behavior + +When using `SUM() OVER (ORDER BY timestamp)` without specifying a frame clause, QuestDB defaults to summing from the first row to the current row, which is exactly what we need for cumulative VWAP. + +## Adapting the Query + +**Different time intervals:** +```questdb-sql demo title="VWAP with 1-minute resolution" +WITH sampled AS ( + SELECT + timestamp, symbol, + SUM(amount) AS volume, + SUM(price * amount) AS traded_value + FROM trades + WHERE timestamp IN yesterday() + AND symbol = 'BTC-USDT' + SAMPLE BY 1m -- Changed from 10m to 1m +), cumulative AS ( + SELECT timestamp, symbol, + SUM(traded_value) OVER (ORDER BY timestamp) AS cumulative_value, + SUM(volume) OVER (ORDER BY timestamp) AS cumulative_volume + FROM sampled +) +SELECT timestamp, symbol, cumulative_value/cumulative_volume AS vwap FROM cumulative; +``` + +**Multiple symbols:** +```questdb-sql demo title="VWAP for multiple symbols" +WITH sampled AS ( + SELECT + timestamp, symbol, + SUM(amount) AS volume, + SUM(price * amount) AS traded_value + FROM trades + WHERE timestamp IN yesterday() + AND symbol IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT') + SAMPLE BY 10m +), cumulative AS ( + SELECT timestamp, symbol, + SUM(traded_value) + OVER (PARTITION BY symbol ORDER BY timestamp) AS cumulative_value, + SUM(volume) + OVER (PARTITION BY symbol ORDER BY timestamp) AS cumulative_volume + FROM sampled +) +SELECT timestamp, symbol, cumulative_value/cumulative_volume AS vwap FROM cumulative; +``` + +Note the addition of `PARTITION BY symbol` to calculate separate VWAP values for each symbol. + +**Different time ranges:** +```sql +-- Current trading day (today) +WHERE timestamp IN today() + +-- Specific date +WHERE timestamp IN '2024-09-05' + +-- Last hour +WHERE timestamp >= dateadd('h', -1, now()) +``` + +:::tip Trading Use Cases +- **Execution quality**: Institutional traders compare their execution prices against VWAP to assess trade quality +- **Trend identification**: Price consistently above VWAP suggests bullish momentum; below suggests bearish +- **Support/resistance**: VWAP often acts as dynamic support or resistance during the trading day +- **Mean reversion**: Traders use deviations from VWAP to identify potential reversal points +::: + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [SUM aggregate](/docs/reference/function/aggregation/#sum) +- [SAMPLE BY](/docs/reference/sql/sample-by/) +::: diff --git a/documentation/playbook/sql/time-series/distribute-discrete-values.md b/documentation/playbook/sql/time-series/distribute-discrete-values.md new file mode 100644 index 000000000..a69e9b166 --- /dev/null +++ b/documentation/playbook/sql/time-series/distribute-discrete-values.md @@ -0,0 +1,90 @@ +--- +title: Distribute Discrete Values Across Time Intervals +sidebar_label: Distribute discrete values +description: Spread cumulative measurements across time intervals using sessions and window functions +--- + +Distribute discrete cumulative measurements across the time intervals between observations. When devices report cumulative values at irregular timestamps, you can spread those values proportionally across the intervals to get per-period averages. + +This pattern is useful for scenarios like energy consumption, data transfer volumes, accumulated costs, or any metric where a cumulative value needs to be attributed to the intervals that contributed to it. + +## Problem + +You have IoT devices reporting watt-hour (Wh) values at irregular timestamps, identified by an `operationId`. You want to plot the sum of average power per operation, broken down by hour. + +When an IoT device sends a `wh` value at discrete timestamps, you need to distribute that energy across the hours between measurements to visualize average power consumption per hour. + +Raw data: + +| timestamp | operationId | wh | +|-----------------------------|-------------|-----| +| 2025-04-01T14:10:59.000000Z | 1001 | 0 | +| 2025-04-01T14:20:01.000000Z | 1002 | 0 | +| 2025-04-01T15:06:29.000000Z | 1003 | 0 | +| 2025-04-01T18:18:05.000000Z | 1001 | 200 | +| 2025-04-01T20:06:36.000000Z | 1003 | 200 | +| 2025-04-01T22:20:10.000000Z | 1002 | 300 | + +For operation 1001: 200 Wh consumed between 14:10:59 and 18:18:05 should be distributed across hours 14:00, 15:00, 16:00, 17:00, 18:00. + +## Solution + +```questdb-sql demo title="Distribute watt-hours across hourly intervals" +WITH +sampled AS ( + SELECT timestamp, operationId, sum(wh) as wh + FROM meter + SAMPLE BY 1h + FILL(0) +), +sessions AS ( + SELECT *, + SUM(CASE WHEN wh > 0 THEN 1 END) + OVER (PARTITION BY operationId ORDER BY timestamp DESC) as session + FROM sampled +), +counts AS ( + SELECT timestamp, operationId, + FIRST_VALUE(wh) OVER (PARTITION BY operationId, session ORDER BY timestamp DESC) as wh, + COUNT(*) OVER (PARTITION BY operationId, session) as attributable_hours + FROM sessions +) +SELECT + timestamp, + operationId, + wh / attributable_hours as wh_avg +FROM counts; +``` + +**How it works:** + +The `sampled` subquery creates an entry for each operationId and missing hourly interval, filling with 0 wh for interpolated rows. + +The key trick is dividing the data into "sessions". A session is defined by all the rows with no value for wh before a row with a value for wh. Or, if we reverse the timestamp order, a session would be defined by a row with a value for wh, followed by several rows with zero value for the same operationId: + +```sql +SUM(CASE WHEN wh > 0 THEN 1 END) OVER (PARTITION BY operationId ORDER BY timestamp DESC) as session +``` + +For each operationId we get multiple sessions (1, 2, 3...). If we did: + +```sql +COUNT() as attributable_hours GROUP BY operationId, session +``` + +We would get how many attributable rows each session has. + +The `counts` subquery uses a window function to `COUNT` the number of rows per session (notice the count window function is not using `order by` so this will not be a running count, but all rows for the same session will have the same value as `attributable_hours`). + +It also gets `FIRST_VALUE` for the session sorted by reverse timestamp, which is the `wh` value for the only row with value in each session. + +The final query divides the `wh` reported in the session by the number of `attributable_hours`. + +**Note:** If you want to filter the results by timestamp or operationId, you should add the filter at the first query (the one named `sampled`), so the rest of the process is done on the relevant subset of data. + +:::info Related Documentation +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [FILL](/docs/reference/sql/select/#fill) +- [Window functions](/docs/reference/sql/over/) +- [FIRST_VALUE](/docs/reference/function/window/#first_value) +::: diff --git a/documentation/playbook/sql/time-series/epoch-timestamps.md b/documentation/playbook/sql/time-series/epoch-timestamps.md new file mode 100644 index 000000000..ceedda95c --- /dev/null +++ b/documentation/playbook/sql/time-series/epoch-timestamps.md @@ -0,0 +1,30 @@ +--- +title: Query with Epoch Timestamps +sidebar_label: Epoch timestamps +description: Use epoch timestamps for timestamp filtering in QuestDB +--- + +Query using epoch timestamps instead of timestamp literals. + +## Problem + +You want to query data using an epoch time interval rather than using timestamp literals or timestamp_ns data types. + +## Solution + +Use epoch values directly in your WHERE clause. QuestDB expects microseconds by default for `timestamp` columns: + +```questdb-sql demo title="Query with epoch microseconds" +SELECT * +FROM trades +WHERE timestamp BETWEEN 1746552420000000 AND 1746811620000000; +``` + +**Note:** If you have epoch values in milliseconds, you need to multiply by 1000 to convert to microseconds. + +Nanoseconds can be used when the timestamp column is of type `timestamp_ns`. + +:::info Related Documentation +- [Timestamp types](/docs/reference/sql/datatypes/#timestamp-and-date-considerations) +- [WHERE clause](/docs/reference/sql/where/) +::: diff --git a/documentation/playbook/sql/time-series/fill-from-one-column.md b/documentation/playbook/sql/time-series/fill-from-one-column.md new file mode 100644 index 000000000..f23e79e74 --- /dev/null +++ b/documentation/playbook/sql/time-series/fill-from-one-column.md @@ -0,0 +1,55 @@ +--- +title: Fill Missing Intervals with Value from Another Column +sidebar_label: Fill from one column +description: Use window functions to propagate values from one column to fill multiple columns in SAMPLE BY queries +--- + +Fill missing intervals using the previous value from a specific column to populate multiple columns. + +## Problem + +You have a query like this: + +```sql +SELECT timestamp, symbol, avg(bid_price) as bid_price, avg(ask_price) as ask_price +FROM core_price +WHERE symbol = 'EURUSD' AND timestamp IN today() +SAMPLE BY 1s FILL(PREV, PREV); +``` + +But when there is an interpolation, instead of getting the PREV value for `bid_price` and previous for `ask_price`, you want both prices to show the PREV known value for the `ask_price`. Imagine this SQL was valid: + +```sql +SELECT timestamp, symbol, avg(bid_price) as bid_price, avg(ask_price) as ask_price +FROM core_price +WHERE symbol = 'EURUSD' AND timestamp IN today() +SAMPLE BY 1s FILL(PREV(ask_price), PREV); +``` + +## Solution + +The only way to do this is in multiple steps within a single query: first get the sampled data interpolating with null values, then use a window function to get the last non-null value for the reference column, and finally coalesce the missing columns with this filler value. + +```questdb-sql demo title="Fill bid and ask prices with value from ask price" +WITH sampled AS ( + SELECT timestamp, symbol, avg(bid_price) as bid_price, avg(ask_price) as ask_price + FROM core_price + WHERE symbol = 'EURUSD' AND timestamp IN today() + SAMPLE BY 1s FILL(null) +), with_previous_vals AS ( + SELECT *, + last_value(ask_price) IGNORE NULLS OVER(PARTITION BY symbol ORDER BY timestamp) as filler + FROM sampled +) +SELECT timestamp, symbol, coalesce(bid_price, filler) as bid_price, coalesce(ask_price, filler) as ask_price +FROM with_previous_vals; +``` + +Note the use of `IGNORE NULLS` modifier on the window function to make sure we always look back for a value, rather than just over the previous row. + +:::info Related Documentation +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [FILL keyword](/docs/reference/sql/sample-by/#fill-keywords) +- [Window functions](/docs/reference/sql/over/) +- [last_value()](/docs/reference/function/window/#last_value) +::: diff --git a/documentation/playbook/sql/time-series/fill-keyed-arbitrary-interval.md b/documentation/playbook/sql/time-series/fill-keyed-arbitrary-interval.md new file mode 100644 index 000000000..05ca5edd3 --- /dev/null +++ b/documentation/playbook/sql/time-series/fill-keyed-arbitrary-interval.md @@ -0,0 +1,64 @@ +--- +title: FILL on Keyed Queries with Arbitrary Intervals +sidebar_label: FILL keyed arbitrary interval +description: Use FILL with keyed queries across arbitrary time intervals by sandwiching data with null boundary rows +--- + +When using `SAMPLE BY` with `FILL` on keyed queries (queries with non-aggregated columns like symbol), the `FROM/TO` syntax doesn't work. This playbook shows how to fill gaps across an arbitrary time interval for keyed queries. + +## Problem + +Keyed queries - queries that include non-aggregated columns beyond the timestamp - do not support the `SAMPLE BY FROM x TO y` syntax when using `FILL`. Without this feature, gaps are only filled between the first and last existing row in the filtered results, not across your desired time interval. + +For example, if you want to sample by symbol and timestamp bucket with `FILL` for a specific time range, standard approaches will not fill gaps at the beginning or end of your interval. + +## Solution + +"Sandwich" your data by adding artificial boundary rows at the start and end of your time interval using `UNION ALL`. These rows contain your target timestamps with nulls for all other columns: + +```questdb-sql demo title="FILL arbitrary interval with keyed SAMPLE BY" + +DECLARE + @start_ts := dateadd('m', -2, now()), + @end_ts := dateadd('m', 2, now()) +WITH +sandwich AS ( + SELECT * FROM ( + SELECT @start_ts AS timestamp, null AS symbol, null AS open, null AS high, null AS close, null AS low + UNION ALL + SELECT timestamp, symbol, open_mid AS open, high_mid AS high, close_mid AS close, low_mid AS low + FROM core_price_1s + WHERE timestamp BETWEEN @start_ts AND @end_ts + UNION ALL + SELECT @end_ts AS timestamp, null AS symbol, null AS open, null AS high, null AS close, null AS low + ) ORDER BY timestamp +), +sampled AS ( + SELECT + timestamp, + symbol, + first(open) AS open, + first(high) AS high, + first(low) AS low, + first(close) AS close + FROM sandwich + SAMPLE BY 30s + FILL(PREV, PREV, PREV, PREV, 0) +) +SELECT * FROM sampled WHERE open IS NOT NULL AND symbol IN ('EURUSD', 'GBPUSD'); +``` + +This query: +1. Creates boundary rows with null values at the start and end timestamps +2. Combines them with filtered data using `UNION ALL` +3. Applies `ORDER BY timestamp` to preserve the designated timestamp +4. Performs `SAMPLE BY` with `FILL` - gaps are filled across the full interval +5. Filters out the artificial boundary rows using `open IS NOT NULL` + +The boundary rows ensure that gaps are filled from the beginning to the end of your specified interval, not just between existing data points. + +:::info Related Documentation +- [SAMPLE BY aggregation](/docs/reference/sql/sample-by/) +- [FILL keyword](/docs/reference/sql/sample-by/#fill-options) +- [Designated timestamp](/docs/concept/designated-timestamp/) +::: diff --git a/documentation/playbook/sql/time-series/fill-prev-with-history.md b/documentation/playbook/sql/time-series/fill-prev-with-history.md new file mode 100644 index 000000000..4b3a17d75 --- /dev/null +++ b/documentation/playbook/sql/time-series/fill-prev-with-history.md @@ -0,0 +1,63 @@ +--- +title: FILL PREV with Historical Values +sidebar_label: FILL PREV with history +description: Use FILL(PREV) with a filler row to carry historical values into a filtered time interval +--- + +When using `FILL(PREV)` with `SAMPLE BY` on a filtered time interval, gaps at the beginning may have null values because `PREV` only uses values from within the filtered interval. This playbook shows how to carry forward the last known value from before the interval. + +## Problem + +When you filter a time range and use `FILL(PREV)` or `FILL(LINEAR)`, QuestDB only considers values within the filtered interval. If the first sample bucket has no data, it will be null instead of carrying forward the last known value from before the interval. + +## Solution + +Use a "filler row" by querying the latest value before the filtered interval with `LIMIT -1`, then combine it with your filtered data using `UNION ALL`. The filler row provides the initial value for `FILL(PREV)` to use: + +```questdb-sql demo title="FILL with PREV values carried over last row before the time range in the WHERE" +DECLARE + @start_ts := dateadd('s', -3, now()), + @end_ts := now() +WITH +filler_row AS ( + SELECT timestamp, open_mid AS open, high_mid AS high, close_mid AS close, low_mid AS low + FROM core_price_1s + WHERE timestamp < @start_ts + LIMIT -1 +), +sandwich AS ( + SELECT * FROM ( + SELECT * FROM filler_row + UNION ALL + SELECT timestamp, open_mid AS open, high_mid AS high, close_mid AS close, low_mid AS low + FROM core_price_1s + WHERE timestamp BETWEEN @start_ts AND @end_ts + ) ORDER BY timestamp +), +sampled AS ( + SELECT + timestamp, + first(open) AS open, + first(high) AS high, + first(low) AS low, + first(close) AS close + FROM sandwich + SAMPLE BY 100T + FILL(PREV, PREV, PREV, PREV, 0) +) +SELECT * FROM sampled WHERE timestamp >= @start_ts; +``` + +This query: +1. Gets the latest row before the filtered interval using `LIMIT -1` (last row) +2. Combines it with filtered interval data using `UNION ALL` +3. Applies `SAMPLE BY` with `FILL(PREV)` - the filler row provides initial values +4. Filters results to exclude the filler row, keeping only the requested interval + +The filler row ensures that gaps at the beginning of the interval carry forward the last known value rather than showing nulls. + +:::info Related Documentation +- [SAMPLE BY aggregation](/docs/reference/sql/sample-by/) +- [FILL keyword](/docs/reference/sql/sample-by/#fill-options) +- [LIMIT keyword](/docs/reference/sql/limit/) +::: diff --git a/documentation/playbook/sql/time-series/filter-by-week.md b/documentation/playbook/sql/time-series/filter-by-week.md new file mode 100644 index 000000000..b0f241605 --- /dev/null +++ b/documentation/playbook/sql/time-series/filter-by-week.md @@ -0,0 +1,51 @@ +--- +title: Filter Data by Week Number +sidebar_label: Filter by week +description: Query data by ISO week number using week_of_year() or dateadd() for better performance +--- + +Filter time-series data by week number using either the built-in `week_of_year()` function or `dateadd()` for better performance on large tables. + +## Solution 1: Using week_of_year() + +There is a built-in `week_of_year()` function, so this could be solved as: + +```sql +SELECT * FROM trades +WHERE week_of_year(timestamp) = 24; +``` + +## Solution 2: Using dateadd() (Faster) + +However, depending on your table size, especially if you are not filtering by any timestamp, you might prefer this alternative, as it executes faster: + +```sql +SELECT * FROM trades +WHERE timestamp >= dateadd('w', 23, '2024-12-30') + AND timestamp < dateadd('w', 24, '2024-12-30'); +``` + +You need to be careful with that query, as it will start counting time from Jan 1st 1970, which is not a Monday. + +## Solution 3: Start at First Monday of Year + +This alternative would start at the Monday of the week that includes January 1st: + +```questdb-sql demo title="Filter by week using first Monday calculation" +DECLARE + @year := '2025', + @week := 24, + @first_monday := dateadd('d', -1 * day_of_week(@year) + 1, @year), + @week_start := dateadd('w', @week - 1, @first_monday), + @week_end := dateadd('w', @week, @first_monday) +SELECT * FROM trades +WHERE timestamp >= @week_start + AND timestamp < @week_end; +``` + +:::info Related Documentation +- [week_of_year()](/docs/reference/function/date-time/#week_of_year) +- [dateadd()](/docs/reference/function/date-time/#dateadd) +- [day_of_week()](/docs/reference/function/date-time/#day_of_week) +- [DECLARE](/docs/reference/sql/declare/) +::: diff --git a/documentation/playbook/sql/time-series/force-designated-timestamp.md b/documentation/playbook/sql/time-series/force-designated-timestamp.md new file mode 100644 index 000000000..3d4e0f7f7 --- /dev/null +++ b/documentation/playbook/sql/time-series/force-designated-timestamp.md @@ -0,0 +1,92 @@ +--- +title: Force a Designated Timestamp +sidebar_label: Force designated timestamp +description: Learn how to explicitly set a designated timestamp column in QuestDB queries using the TIMESTAMP keyword +--- + +Sometimes you need to force a designated timestamp in your query. This happens when you want to run operations like `SAMPLE BY` with a non-designated timestamp column, or when QuestDB applies certain functions or joins and loses track of the designated timestamp. + +## Problem: Lost Designated Timestamp + +When you run this query on the demo instance, you'll notice the `time` column is not recognized as a designated timestamp because we cast it to a string and back: + +```questdb-sql demo title="Query without designated timestamp" +SELECT + TO_TIMESTAMP(timestamp::STRING, 'yyyy-MM-ddTHH:mm:ss.SSSUUUZ') time, + symbol, + ecn, + bid_price +FROM + core_price +WHERE timestamp IN today() +LIMIT 10; +``` + +Without a designated timestamp, you cannot use time-series operations like `SAMPLE BY`. + +## Solution: Use the TIMESTAMP Keyword + +You can force the designated timestamp using the `TIMESTAMP()` keyword, which allows you to run time-series operations: + +```questdb-sql demo title="Force designated timestamp with TIMESTAMP keyword" +WITH t AS ( + ( + SELECT + TO_TIMESTAMP(timestamp::STRING, 'yyyy-MM-ddTHH:mm:ss.SSSUUUZ') time, + symbol, + ecn, + bid_price + FROM + core_price + WHERE timestamp >= dateadd('h', -1, now()) + ORDER BY time + ) TIMESTAMP (time) +) +SELECT * FROM t LATEST BY symbol; +``` + +The `TIMESTAMP(time)` clause explicitly tells QuestDB which column to use as the designated timestamp, enabling `LATEST BY` and other time-series operations. This query gets the most recent price for each symbol in the last hour. + +## Common Case: UNION Queries + +The designated timestamp is often lost when using `UNION` or `UNION ALL`. This is intentional - QuestDB cannot guarantee that the combined results are in order, and designated timestamps must always be in ascending order. + +You can restore the designated timestamp by applying `ORDER BY` and then using `TIMESTAMP()`: + +```questdb-sql demo title="Restore designated timestamp after UNION ALL" +( + SELECT * FROM + ( + SELECT timestamp, symbol FROM core_price WHERE timestamp >= dateadd('m', -1, now()) + UNION ALL + SELECT timestamp, symbol FROM core_price WHERE timestamp >= dateadd('m', -1, now()) + ) ORDER BY timestamp +) +TIMESTAMP(timestamp) +LIMIT 10; +``` + +This query combines the last minute of data twice using `UNION ALL`, then restores the designated timestamp. + +## Querying External Parquet Files + +When querying external parquet files using `read_parquet()`, the result does not have a designated timestamp. You need to force it using `TIMESTAMP()` to enable time-series operations like `SAMPLE BY`: + +```questdb-sql demo title="Query parquet file with designated timestamp" +SELECT timestamp, avg(price) +FROM ((read_parquet('trades.parquet') ORDER BY timestamp) TIMESTAMP(timestamp)) +SAMPLE BY 1m; +``` + +This query reads from a parquet file, applies ordering, forces the designated timestamp, and then performs time-series aggregation. + +:::warning Order is Required +The `TIMESTAMP()` keyword requires that the data is already sorted by the timestamp column. If the data is not in order, the query will fail. Always include `ORDER BY` before applying `TIMESTAMP()`. +::: + +:::info Related Documentation +- [Designated Timestamp concept](/docs/concept/designated-timestamp/) +- [TIMESTAMP keyword reference](/docs/reference/sql/select/#timestamp) +- [SAMPLE BY aggregation](/docs/reference/sql/sample-by/) +- [Parquet functions](/docs/reference/function/parquet/) +::: diff --git a/documentation/playbook/sql/time-series/latest-activity-window.md b/documentation/playbook/sql/time-series/latest-activity-window.md new file mode 100644 index 000000000..e06fbf5ee --- /dev/null +++ b/documentation/playbook/sql/time-series/latest-activity-window.md @@ -0,0 +1,42 @@ +--- +title: Query Last N Minutes of Activity +sidebar_label: Latest activity window +description: Get rows from the last N minutes of recorded activity using subqueries with LIMIT -1 +--- + +Query data from the last N minutes of recorded activity in a table, regardless of the current time. + +## Problem + +You want to get data from a table for the last 15 minutes of activity. + +You know you could do: + +```sql +SELECT * FROM my_tb +WHERE timestamp > dateadd('m', -15, now()); +``` + +But that would give you the last 15 minutes, not the last 15 minutes of activity in your table. Supposing the last timestamp recorded in your table was `2025-03-23T07:24:37.000000Z`, then you would like to get the data from `2025-03-23T07:09:37.000000Z` to `2025-03-23T07:24:37.000000Z`. + +## Solution + +Use a correlated subquery to find the latest timestamp, then filter relative to it: + +```questdb-sql demo title="Last 15 minutes of recorded activity" +SELECT * +FROM my_table +WHERE timestamp >= ( + SELECT dateadd('m', -15, timestamp) + FROM my_table + LIMIT -1 +); +``` + +QuestDB supports correlated subqueries when asking for a timestamp if the query returns a scalar value. Using `LIMIT -1` we get the latest row in the table (sorted by designated timestamp), and we apply the `dateadd` function on that date, so it needs to be executed just once. If we placed the `dateadd` on the left, the calculation would need to be applied once for each row on the main table. This query should return in just a few milliseconds, independently of table size. + +:::info Related Documentation +- [LIMIT](/docs/reference/sql/select/#limit) +- [dateadd()](/docs/reference/function/date-time/#dateadd) +- [Designated timestamp](/docs/concept/designated-timestamp/) +::: diff --git a/documentation/playbook/sql/time-series/latest-n-per-partition.md b/documentation/playbook/sql/time-series/latest-n-per-partition.md new file mode 100644 index 000000000..f2f679e38 --- /dev/null +++ b/documentation/playbook/sql/time-series/latest-n-per-partition.md @@ -0,0 +1,267 @@ +--- +title: Get Latest N Records Per Partition +sidebar_label: Latest N per partition +description: Retrieve the most recent N rows for each distinct value using window functions and filtering +--- + +Retrieve the most recent N rows for each distinct partition value (e.g., latest 5 trades per symbol, last 10 readings per sensor). While `LATEST ON` returns only the single most recent row per partition, this pattern extends it to get multiple recent rows per partition. + +## Problem: Need Multiple Recent Rows Per Group + +You want to get the latest N rows for each distinct value in a column. For example: +- Latest 5 trades for each trading symbol +- Last 10 sensor readings per device +- Most recent 3 log entries per service + +`LATEST ON` only returns one row per partition: + +```sql +-- Gets only 1 latest row per symbol +SELECT * FROM trades +LATEST ON timestamp PARTITION BY symbol; +``` + +But you need multiple rows per symbol. + +## Solution: Use ROW_NUMBER() Window Function + +Use `row_number()` to rank rows within each partition, then filter to keep only the top N: + +```questdb-sql demo title="Get latest 5 trades for each symbol" +WITH ranked AS ( + SELECT + *, + row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +) +SELECT timestamp, symbol, side, price, amount +FROM ranked +WHERE rn <= 5 +ORDER BY symbol, timestamp DESC; +``` + +This returns up to 5 most recent trades for each symbol from the last day. + +## How It Works + +The query uses a two-step approach: + +1. **Ranking step (CTE):** + - `row_number() OVER (...)`: Assigns sequential numbers to rows within each partition + - `PARTITION BY symbol`: Separate ranking for each symbol + - `ORDER BY timestamp DESC`: Newest rows get lower numbers (1, 2, 3, ...) + - Result: Each row gets a rank within its symbol group + +2. **Filtering step (outer query):** + - `WHERE rn <= 5`: Keep only rows ranked 1-5 (the 5 most recent) + - `ORDER BY symbol, timestamp DESC`: Sort final results + +### Understanding row_number() + +`row_number()` assigns a unique sequential number within each partition: + +| timestamp | symbol | price | (row number) | +|-----------|-----------|-------|--------------| +| 10:03:00 | BTC-USDT | 63000 | 1 (newest) | +| 10:02:00 | BTC-USDT | 62900 | 2 | +| 10:01:00 | BTC-USDT | 62800 | 3 | +| 10:03:30 | ETH-USDT | 3100 | 1 (newest) | +| 10:02:30 | ETH-USDT | 3095 | 2 | + +With `WHERE rn <= 3`, we keep rows 1-3 for each symbol. + +## Adapting the Query + +**Different partition columns:** +```sql +-- Latest 10 per sensor_id +PARTITION BY sensor_id + +-- Latest 5 per combination of symbol and exchange +PARTITION BY symbol, exchange + +-- Latest N per user_id +PARTITION BY user_id +``` + +**Different sort orders:** +```sql +-- Oldest N rows per partition +ORDER BY timestamp ASC + +-- Highest prices first +ORDER BY price DESC + +-- Alphabetically +ORDER BY name ASC +``` + +**Dynamic N value:** +```sql +-- Latest N trades where N is specified by user +DECLARE @limit := 10 + +WITH ranked AS ( + SELECT *, row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +) +SELECT * FROM ranked WHERE rn <= @limit; +``` + +**Include additional filtering:** +```questdb-sql demo title="Latest 5 buy orders per symbol" +WITH ranked AS ( + SELECT + *, + row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) + AND side = 'buy' -- Additional filter before ranking +) +SELECT timestamp, symbol, side, price, amount +FROM ranked +WHERE rn <= 5; +``` + +**Show rank in results:** +```sql +WITH ranked AS ( + SELECT *, row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +) +SELECT timestamp, symbol, price, rn as rank +FROM ranked +WHERE rn <= 5; +``` + +## Alternative: Use Negative LIMIT + +For a simpler approach when you need the latest N rows **total** (not per partition), use negative LIMIT: + +```questdb-sql demo title="Latest 100 trades overall (all symbols)" +SELECT * FROM trades +WHERE symbol = 'BTC-USDT' +ORDER BY timestamp DESC +LIMIT 100; +``` + +Or more efficiently with QuestDB's negative LIMIT feature: + +```questdb-sql demo title="Latest 100 trades using negative LIMIT" +SELECT * FROM trades +WHERE symbol = 'BTC-USDT' +LIMIT -100; +``` + +**But this doesn't work per partition** - it returns 100 total rows, not 100 per symbol. + +## Performance Optimization + +**Filter by timestamp first:** +```sql +-- Good: Reduces dataset before windowing +WITH ranked AS ( + SELECT *, row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades + WHERE timestamp >= dateadd('h', -24, now()) -- Filter early +) +SELECT * FROM ranked WHERE rn <= 5; + +-- Less efficient: Windows over entire table +WITH ranked AS ( + SELECT *, row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades -- No filter +) +SELECT * FROM ranked WHERE rn <= 5 AND timestamp >= dateadd('h', -24, now()); +``` + +**Limit partitions:** +```sql +-- Process only specific symbols +WHERE timestamp >= dateadd('d', -1, now()) + AND symbol IN ('BTC-USDT', 'ETH-USDT', 'SOL-USDT') +``` + +## Top N with Aggregates + +Combine with aggregates to get summary statistics for top N: + +```questdb-sql demo title="Average price of latest 10 trades per symbol" +WITH ranked AS ( + SELECT + timestamp, + symbol, + price, + row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades + WHERE timestamp >= dateadd('d', -1, now()) +) +SELECT + symbol, + count(*) as trade_count, + avg(price) as avg_price, + min(price) as min_price, + max(price) as max_price +FROM ranked +WHERE rn <= 10 +GROUP BY symbol; +``` + +## Comparison with LATEST ON + +| Feature | LATEST ON | row_number() + Filter | +|---------|-----------|----------------------| +| **Rows per partition** | Exactly 1 | Any number (N) | +| **Performance** | Very fast (optimized) | Moderate (requires ranking) | +| **Flexibility** | Limited | High (custom ordering, filtering) | +| **Use case** | Single latest value | Multiple recent values | + +**When to use LATEST ON:** +```sql +-- Get current price for each symbol (1 row per symbol) +SELECT * FROM trades LATEST ON timestamp PARTITION BY symbol; +``` + +**When to use row_number():** +```sql +-- Get latest 5 trades for each symbol (up to 5 rows per symbol) +WITH ranked AS ( + SELECT *, row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM trades +) +SELECT * FROM ranked WHERE rn <= 5; +``` + +:::tip Combining with LATEST ON +For very large tables, use `LATEST ON` to reduce the dataset first, then apply row_number(): + +```sql +WITH recent AS ( + -- Get latest 1000 rows overall + SELECT * FROM trades + ORDER BY timestamp DESC + LIMIT 1000 +) +, ranked AS ( + SELECT *, row_number() OVER (PARTITION BY symbol ORDER BY timestamp DESC) as rn + FROM recent +) +SELECT * FROM ranked WHERE rn <= 5; +``` + +This approach is faster when you only need recent data across all partitions. +::: + +:::warning Row Count +The number of rows returned is `N × number_of_partitions`. If you have 100 symbols and request top 5, you'll get up to 500 rows. Some partitions may have fewer than N rows if insufficient data exists. +::: + +:::info Related Documentation +- [row_number() window function](/docs/reference/function/window/#row_number) +- [LATEST ON](/docs/reference/sql/latest-on/) +- [Window functions](/docs/reference/sql/over/) +- [LIMIT](/docs/reference/sql/select/#limit) +::: diff --git a/documentation/playbook/sql/time-series/remove-outliers.md b/documentation/playbook/sql/time-series/remove-outliers.md new file mode 100644 index 000000000..031ec2e0a --- /dev/null +++ b/documentation/playbook/sql/time-series/remove-outliers.md @@ -0,0 +1,66 @@ +--- +title: Remove Outliers from Candle Data +sidebar_label: Remove outliers +description: Filter outliers using window functions to compare against moving averages +--- + +Remove outlier trades that differ significantly from recent average prices. + +## Problem + +You have candle data from trading pairs where some markets have very low volume trades that move the candle significantly. These are usually single trades with very low volume where the exchange rate differs a lot from the actual exchange rate. This makes charts hard to use and you would like to remove those from the chart. + +Current query: + +```sql +SELECT + timestamp, symbol, + first(price) AS open, + last(price) AS close, + min(price), + max(price), + sum(amount) AS volume +FROM trades +WHERE timestamp > dateadd('M', -1, now()) +SAMPLE BY 1d ALIGN TO CALENDAR; +``` + +The question is: is there a way to only select trades where the traded amount deviates significantly from recent patterns? + +## Solution + +Use a window function to get the moving average for the amount, then `SAMPLE BY` in an outer query and compare the value of the sampled interval against the moving data. You can do this for the whole interval (when you don't specify `ORDER BY` and `RANGE` in the window definition), or you can make it relative to an interval in the past. + +This query compares with the average of the past 6 days (7 days ago, but excluding the current row): + +```questdb-sql demo title="Filter outliers using 7-day moving average" +WITH moving_trades AS ( + SELECT timestamp, symbol, price, amount, + avg(amount) OVER ( + PARTITION BY symbol + ORDER BY timestamp + RANGE BETWEEN 7 days PRECEDING AND 1 day PRECEDING + ) moving_avg_7_days + FROM trades + WHERE timestamp > dateadd('d', -37, now()) +) +SELECT + timestamp, symbol, + first(price) AS open, + last(price) AS close, + min(price), + max(price), + sum(amount) AS volume +FROM moving_trades +WHERE timestamp > dateadd('M', -1, now()) + AND moving_avg_7_days IS NOT NULL + AND ABS(moving_avg_7_days - price) > moving_avg_7_days * 0.01 +SAMPLE BY 1d ALIGN TO CALENDAR; +``` + +:::info Related Documentation +- [Window functions](/docs/reference/sql/over/) +- [AVG window function](/docs/reference/function/window/#avg) +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [ALIGN TO CALENDAR](/docs/reference/sql/sample-by/#align-to-calendar) +::: diff --git a/documentation/playbook/sql/time-series/sample-by-interval-bounds.md b/documentation/playbook/sql/time-series/sample-by-interval-bounds.md new file mode 100644 index 000000000..2036aaa14 --- /dev/null +++ b/documentation/playbook/sql/time-series/sample-by-interval-bounds.md @@ -0,0 +1,52 @@ +--- +title: Right Interval Bound with SAMPLE BY +sidebar_label: Interval bounds +description: Shift SAMPLE BY timestamps to use right interval bound instead of left bound +--- + +Use the right interval bound (end of interval) instead of the left bound (start of interval) for SAMPLE BY timestamps. + +## Problem + +Records are grouped in a 15-minute interval. For example, records between 2025-03-22T00:00:00.000000Z and 2025-03-22T00:15:00.000000Z are aggregated with timestamp 2025-03-22T00:00:00.000000Z. + +You want the aggregation to show 2025-03-22T00:15:00.000000Z (the right bound of the interval rather than left). + +## Solution + +Simply shift the timestamp in the SELECT: + +```questdb-sql demo title="SAMPLE BY with right bound" +SELECT + dateadd('m', 15, timestamp) AS timestamp, symbol, + first(price) AS open, + last(price) AS close, + min(price), + max(price), + sum(amount) AS volume +FROM trades +WHERE symbol = 'BTC-USDT' AND timestamp IN today() +SAMPLE BY 15m; +``` + +Note that on executing this query, QuestDB is not displaying the timestamp in green on the web console. This is because we are not outputting the original designated timestamp, but a derived column. If you are not going to use this query in a subquery, then you are good to go. But if you want to use the output of this query in a subquery that requires a designated timestamp, you could do something like this to force sort order by the derived timestamp column: + +```sql +( +SELECT + dateadd('m', 15, timestamp) AS timestamp, symbol, + first(price) AS open, + last(price) AS close, + min(price), + max(price), + sum(amount) AS volume +FROM trades +WHERE symbol = 'BTC-USDT' AND timestamp IN today() +SAMPLE BY 15m +) ORDER BY timestamp; +``` + +:::info Related Documentation +- [SAMPLE BY](/docs/reference/sql/sample-by/) +- [dateadd()](/docs/reference/function/date-time/#dateadd) +::: diff --git a/documentation/playbook/sql/time-series/session-windows.md b/documentation/playbook/sql/time-series/session-windows.md new file mode 100644 index 000000000..c0e17870f --- /dev/null +++ b/documentation/playbook/sql/time-series/session-windows.md @@ -0,0 +1,334 @@ +--- +title: Calculate Sessions and Elapsed Time +sidebar_label: Session windows +description: Identify sessions by detecting state changes and calculate elapsed time between events using window functions +--- + +Calculate sessions and elapsed time by identifying when state changes occur in time-series data. This "flip-flop" or "session" pattern is useful for analyzing user sessions, vehicle rides, machine operating cycles, or any scenario where you need to track duration between state transitions. + +## Problem: Track Time Between State Changes + +You have a table tracking vehicle lock status over time and want to calculate ride duration. A ride starts when `lock_status` changes from `true` (locked) to `false` (unlocked), and ends when it changes back to `true`. + +**Table schema:** +```sql +CREATE TABLE vehicle_events ( + vehicle_id SYMBOL, + lock_status BOOLEAN, + timestamp TIMESTAMP +) TIMESTAMP(timestamp) PARTITION BY DAY; +``` + +**Sample data:** + +| timestamp | vehicle_id | lock_status | +|-----------|------------|-------------| +| 10:00:00 | V001 | true | +| 10:05:00 | V001 | false | ← Ride starts +| 10:25:00 | V001 | true | ← Ride ends (20 min) +| 10:30:00 | V001 | false | ← Next ride starts +| 10:45:00 | V001 | true | ← Ride ends (15 min) + +You want to calculate the duration of each ride. + +## Solution: Session Detection with Window Functions + +Use window functions to detect state changes, assign session IDs, then calculate durations: + +```questdb-sql demo title="Calculate ride duration from lock status changes" +WITH prevEvents AS ( + SELECT *, + first_value(CASE WHEN lock_status=false THEN 0 WHEN lock_status=true THEN 1 END) + OVER ( + PARTITION BY vehicle_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) as prev_status + FROM vehicle_events + WHERE timestamp IN today() +), +ride_sessions AS ( + SELECT *, + SUM(CASE + WHEN lock_status = true AND prev_status = 0 THEN 1 + WHEN lock_status = false AND prev_status = 1 THEN 1 + ELSE 0 + END) OVER (PARTITION BY vehicle_id ORDER BY timestamp) as ride + FROM prevEvents +), +global_sessions AS ( + SELECT *, concat(vehicle_id, '#', ride) as session + FROM ride_sessions +), +totals AS ( + SELECT + first(timestamp) as ts, + session, + FIRST(lock_status) as lock_status, + first(vehicle_id) as vehicle_id + FROM global_sessions + GROUP BY session +), +prev_ts AS ( + SELECT *, + first_value(timestamp::long) OVER ( + PARTITION BY vehicle_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) as prev_ts + FROM totals +) +SELECT + timestamp as ride_end, + vehicle_id, + (timestamp::long - prev_ts) / 1000000 as duration_seconds +FROM prev_ts +WHERE lock_status = false AND prev_ts IS NOT NULL; +``` + +**Results:** + +| ride_end | vehicle_id | duration_seconds | +|----------|------------|------------------| +| 10:25:00 | V001 | 1200 | +| 10:45:00 | V001 | 900 | + +## How It Works + +The query uses a five-step approach: + +### 1. Get Previous Status (`prevEvents`) + +```sql +first_value(...) OVER (... ROWS 1 PRECEDING EXCLUDE CURRENT ROW) +``` + +For each row, get the status from the previous row. Convert boolean to numbers (0/1) since `first_value` requires numeric types. + +### 2. Detect State Changes (`ride_sessions`) + +```sql +SUM(CASE WHEN lock_status != prev_status THEN 1 ELSE 0 END) + OVER (PARTITION BY vehicle_id ORDER BY timestamp) +``` + +Whenever status changes, increment a counter. This creates sequential session IDs for each vehicle: +- Ride 0: Initial state +- Ride 1: After first state change +- Ride 2: After second state change +- ... + +### 3. Create Global Session IDs (`global_sessions`) + +```sql +concat(vehicle_id, '#', ride) +``` + +Combine vehicle_id with ride number to create unique session identifiers across all vehicles. + +### 4. Get Session Start Times (`totals`) + +```sql +SELECT first(timestamp) as ts, ... +FROM global_sessions +GROUP BY session +``` + +For each session, get the timestamp and status at the beginning of that session. + +### 5. Calculate Duration (`prev_ts`) + +```sql +first_value(timestamp::long) OVER (... ROWS 1 PRECEDING) +``` + +Get the timestamp from the previous session (for the same vehicle), then calculate duration by subtracting. + +### Filter for Rides + +```sql +WHERE lock_status = false +``` + +Only show sessions where status is `false` (unlocked), which represents completed rides. The duration is from the previous session end (lock) to this session start (unlock). + +## Monthly Aggregation + +Calculate total ride duration per vehicle per month: + +```questdb-sql demo title="Monthly ride duration by vehicle" +WITH prevEvents AS ( + SELECT *, + first_value(CASE WHEN lock_status=false THEN 0 WHEN lock_status=true THEN 1 END) + OVER ( + PARTITION BY vehicle_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) as prev_status + FROM vehicle_events + WHERE timestamp >= dateadd('M', -3, now()) +), +ride_sessions AS ( + SELECT *, + SUM(CASE + WHEN lock_status = true AND prev_status = 0 THEN 1 + WHEN lock_status = false AND prev_status = 1 THEN 1 + ELSE 0 + END) OVER (PARTITION BY vehicle_id ORDER BY timestamp) as ride + FROM prevEvents +), +global_sessions AS ( + SELECT *, concat(vehicle_id, '#', ride) as session + FROM ride_sessions +), +totals AS ( + SELECT + first(timestamp) as ts, + session, + FIRST(lock_status) as lock_status, + first(vehicle_id) as vehicle_id + FROM global_sessions + GROUP BY session +), +prev_ts AS ( + SELECT *, + first_value(timestamp::long) OVER ( + PARTITION BY vehicle_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) as prev_ts + FROM totals +) +SELECT + timestamp_floor('M', timestamp) as month, + vehicle_id, + SUM((timestamp::long - prev_ts) / 1000000) as total_ride_duration_seconds, + COUNT(*) as ride_count +FROM prev_ts +WHERE lock_status = false AND prev_ts IS NOT NULL +GROUP BY month, vehicle_id +ORDER BY month, vehicle_id; +``` + +## Adapting to Different Use Cases + +**User website sessions (1 hour timeout):** +```sql +WITH prevEvents AS ( + SELECT *, + first_value(timestamp::long) OVER ( + PARTITION BY user_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) as prev_ts + FROM page_views +), +sessions AS ( + SELECT *, + SUM(CASE + WHEN datediff('h', prev_ts::timestamp, timestamp) > 1 THEN 1 + ELSE 0 + END) OVER (PARTITION BY user_id ORDER BY timestamp) as session_id + FROM prevEvents +) +SELECT + user_id, + session_id, + min(timestamp) as session_start, + max(timestamp) as session_end, + datediff('s', min(timestamp), max(timestamp)) as session_duration_seconds, + count(*) as page_views +FROM sessions +GROUP BY user_id, session_id; +``` + +**Machine operating cycles:** +```sql +-- When machine changes from 'off' to 'running' to 'off' +WITH prevStatus AS ( + SELECT *, + first_value(status) OVER ( + PARTITION BY machine_id ORDER BY timestamp + ROWS 1 PRECEDING EXCLUDE CURRENT ROW + ) as prev_status + FROM machine_status +), +cycles AS ( + SELECT *, + SUM(CASE + WHEN status != prev_status THEN 1 + ELSE 0 + END) OVER (PARTITION BY machine_id ORDER BY timestamp) as cycle + FROM prevStatus +) +SELECT + machine_id, + cycle, + min(timestamp) as cycle_start, + max(timestamp) as cycle_end +FROM cycles +WHERE status = 'running' +GROUP BY machine_id, cycle; +``` + +## Performance Considerations + +**Filter by timestamp first:** +```sql +-- Good: Reduce dataset before windowing +WHERE timestamp >= dateadd('M', -1, now()) +``` + +**Partition by high-cardinality column:** +```sql +-- Good: Each vehicle processed independently +PARTITION BY vehicle_id + +-- Avoid: All vehicles in one partition (slow) +-- (no PARTITION BY) +``` + +**Limit output:** +```sql +-- For testing, limit to specific vehicles +WHERE vehicle_id IN ('V001', 'V002', 'V003') +``` + +## Alternative: Using LAG (QuestDB 8.0+) + +With `LAG` function, the query is simpler: + +```sql +WITH prevEvents AS ( + SELECT *, + LAG(lock_status) OVER (PARTITION BY vehicle_id ORDER BY timestamp) as prev_status, + LAG(timestamp) OVER (PARTITION BY vehicle_id ORDER BY timestamp) as prev_timestamp + FROM vehicle_events + WHERE timestamp IN today() +) +SELECT + timestamp as ride_end, + vehicle_id, + datediff('s', prev_timestamp, timestamp) as duration_seconds +FROM prevEvents +WHERE lock_status = false -- Ride ended (locked) + AND prev_status = true -- Previous state was unlocked (riding) + AND prev_timestamp IS NOT NULL; +``` + +This directly accesses the previous row's values without converting to numbers. + +:::tip Common Session Patterns +This pattern applies to many scenarios: +- **User sessions**: Time between last action and timeout +- **IoT device cycles**: Power on/off cycles +- **Vehicle trips**: Ignition on/off periods +- **Connection sessions**: Login/logout tracking +- **Process steps**: Start/complete state transitions +::: + +:::warning First Row Handling +The first row in each partition will have `NULL` for previous values. Always filter these out with `WHERE prev_ts IS NOT NULL` or similar conditions. +::: + +:::info Related Documentation +- [first_value() window function](/docs/reference/function/window/#first_value) +- [LAG window function](/docs/reference/function/window/#lag) +- [Window functions](/docs/reference/sql/over/) +- [datediff()](/docs/reference/function/date-time/#datediff) +::: diff --git a/documentation/playbook/sql/time-series/sparse-sensor-data.md b/documentation/playbook/sql/time-series/sparse-sensor-data.md new file mode 100644 index 000000000..06013328c --- /dev/null +++ b/documentation/playbook/sql/time-series/sparse-sensor-data.md @@ -0,0 +1,131 @@ +--- +title: Join Strategies for Sparse Sensor Data +sidebar_label: Sparse sensor data +description: Compare CROSS JOIN, LEFT JOIN, and ASOF JOIN strategies for combining data from sensors stored in separate tables +--- + +Efficiently query sparse sensor data by splitting wide tables into narrow tables and joining them with different strategies. + +## Problem + +You have a sparse sensors table with 120 sensor columns, in which you are getting just a few sensor values at any given timestamp, so most values are null. + +When you want to query data from any given sensor, you first SAMPLE the data with an `avg` or a `last_not_null` function aggregation, and then often build a CTE and call `LATEST ON` to get results: + +```sql +SELECT + timestamp, + vehicle_id, + avg(sensor_1) AS avg_sensor_1, avg(sensor_2) AS avg_sensor_2, + ... + avg(sensor_119) AS avg_sensor_119, avg(sensor_120) AS avg_sensor_120 +FROM + vehicle_sensor_data +-- WHERE vehicle_id = 'AAA0000' +SAMPLE BY 30s +LIMIT 100000; +``` + +This works, but it is not super fast (1sec for 10 million rows, in a table with 120 sensor columns and with 10k different vehicle_ids), and it is also not very efficient because `null` columns take some bytes on disk. + +## Solution: Multiple Narrow Tables with Joins + +A single table works, but there is a more efficient (although a bit more cumbersome if you compose queries by hand) way to do this. + +You can create 120 tables, one per sensor, rather than a table with 120 columns. Well, technically you probably want 121 tables, one with the common dimensions, then 1 per sensor. Or maybe you want N tables, one for the common dimensions, then N depending on how many sensor groups you have, as some groups might always send in sync. In any case, rather than a wide table you would end up with several narrow tables that you would need to join. + +Now for joining the tables there are three potential ways, depending on the results you are after: + * To see the _LATEST_ known value for all the metrics _for a given series_, use a `CROSS JOIN` strategy (example below). This returns a single row. + * To see the _LATEST_ known value for all the metrics and _for all or several series_, use a `LEFT JOIN` strategy. This returns a single row per series (example below). + * To see the _rolling view of all the latest known values_ regarding the current row for one of the metrics, use an `ASOF JOIN` strategy. This returns as many rows as you have in the main metric you are querying (example below). + +### Performance + +The three approaches perform well. The three queries were executed on a table like the initial one, with 10 million rows representing sparse data from 10k series and across 120 metrics, so 120 tables. Each of the 120 tables had ~83k records (which times 120 is ~10 million rows). + +`CROSS JOIN` is the fastest, executing in 23ms, `ASOF JOIN` is second with 123 ms, and `LEFT JOIN` is the slowest at 880ms. Still not too bad, as you probably will not want to get all the sensors from all the devices all the time, and joining fewer tables would perform better. + +## Strategy 1: CROSS JOIN + +We first find the latest point in each of the 120 tables for the given series (AAA0000), so we get a value per table, and then do a `CROSS JOIN`, to get a single row. + +```sql +WITH +s1 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_1 + WHERE vehicle_id = 'AAA0000' LATEST ON timestamp PARTITION BY vehicle_id), +s2 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_2 + WHERE vehicle_id = 'AAA0000' LATEST ON timestamp PARTITION BY vehicle_id), +... +s119 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_119 + WHERE vehicle_id = 'AAA0000' LATEST ON timestamp PARTITION BY vehicle_id), +s120 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_120 + WHERE vehicle_id = 'AAA0000' LATEST ON timestamp PARTITION BY vehicle_id) +SELECT s1.timestamp, s1.vehicle_id, s1.value AS value_1, +s2.value AS value_2, +... +s119.value AS value_119, +s120.value AS value_120 +FROM s1 +CROSS JOIN s2 +CROSS JOIN ... +CROSS JOIN s119 +CROSS JOIN s120; +``` + +## Strategy 2: LEFT JOIN + +We first find the latest point in each of the 120 tables for each series, so we get a value per table and series, and then do a `LEFT JOIN` on the series ID, to get a single row for each different series (10K rows in our example). + +```sql +WITH +s1 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_1 + LATEST ON timestamp PARTITION BY vehicle_id), +s2 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_2 + LATEST ON timestamp PARTITION BY vehicle_id), +... +s119 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_119 + LATEST ON timestamp PARTITION BY vehicle_id), +s120 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_120 + LATEST ON timestamp PARTITION BY vehicle_id) +SELECT s1.timestamp, s1.vehicle_id, s1.value AS value_1, +s2.value AS value_2, +... +s119.value AS value_119, +s120.value AS value_120 +FROM s1 +LEFT JOIN s2 ON s1.vehicle_id = s2.vehicle_id +LEFT JOIN ... +LEFT JOIN s119 ON s1.vehicle_id = s119.vehicle_id +LEFT JOIN s120 ON s1.vehicle_id = s120.vehicle_id; +``` + +## Strategy 3: ASOF JOIN + +We get all the rows in all the tables, then do an `ASOF JOIN` on the series ID, so we get a row for each row of the first table in the query, in our example ~83K results. + +```sql +WITH +s1 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_1 ), +s2 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_2 ), +... +s118 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_118 ), +s119 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_119 ), +s120 AS (SELECT timestamp, vehicle_id, value FROM vehicle_sensor_120 ) +SELECT s1.timestamp, s1.vehicle_id, s1.value AS value_1, + s2.value AS value_2, + ... + s119.value AS value_119, + s120.value AS value_120 +FROM s1 +ASOF JOIN s2 ON s1.vehicle_id = s2.vehicle_id +ASOF JOIN ... +ASOF JOIN s119 ON s1.vehicle_id = s119.vehicle_id +ASOF JOIN s120 ON s1.vehicle_id = s120.vehicle_id; +``` + +:::info Related Documentation +- [ASOF JOIN](/docs/reference/sql/join/#asof-join) +- [LEFT JOIN](/docs/reference/sql/join/#left-outer-join) +- [CROSS JOIN](/docs/reference/sql/join/#cross-join) +- [LATEST ON](/docs/reference/sql/select/#latest-on) +::: diff --git a/documentation/sidebars.js b/documentation/sidebars.js index 59751550b..ebde7c4d3 100644 --- a/documentation/sidebars.js +++ b/documentation/sidebars.js @@ -546,6 +546,137 @@ module.exports = { "web-console/create-table", ], }, + { + label: "Playbook (useful snippets)", + type: "category", + collapsed: false, + items: [ + "playbook/overview", + "playbook/demo-data-schema", + { + type: "category", + label: "SQL Recipes", + collapsed: true, + items: [ + { + type: "category", + label: "Capital Markets", + collapsed: true, + items: [ + "playbook/sql/finance/compound-interest", + "playbook/sql/finance/cumulative-product", + "playbook/sql/finance/vwap", + "playbook/sql/finance/bollinger-bands", + "playbook/sql/finance/tick-trin", + "playbook/sql/finance/volume-profile", + "playbook/sql/finance/volume-spike", + "playbook/sql/finance/rolling-stddev", + ], + }, + { + type: "category", + label: "Time-Series Patterns", + collapsed: true, + items: [ + "playbook/sql/time-series/force-designated-timestamp", + "playbook/sql/time-series/latest-n-per-partition", + "playbook/sql/time-series/session-windows", + "playbook/sql/time-series/latest-activity-window", + "playbook/sql/time-series/filter-by-week", + "playbook/sql/time-series/distribute-discrete-values", + "playbook/sql/time-series/epoch-timestamps", + "playbook/sql/time-series/sample-by-interval-bounds", + "playbook/sql/time-series/remove-outliers", + "playbook/sql/time-series/fill-from-one-column", + "playbook/sql/time-series/fill-prev-with-history", + "playbook/sql/time-series/fill-keyed-arbitrary-interval", + "playbook/sql/time-series/sparse-sensor-data", + ], + }, + { + type: "category", + label: "Advanced SQL", + collapsed: true, + items: [ + "playbook/sql/advanced/rows-before-after-value-match", + "playbook/sql/advanced/top-n-plus-others", + "playbook/sql/advanced/pivot-table", + "playbook/sql/advanced/unpivot-table", + "playbook/sql/advanced/sankey-funnel", + "playbook/sql/advanced/conditional-aggregates", + "playbook/sql/advanced/general-and-sampled-aggregates", + "playbook/sql/advanced/consistent-histogram-buckets", + "playbook/sql/advanced/array-from-string", + ], + }, + ], + }, + { + type: "category", + label: "Integrations", + collapsed: true, + items: [ + "playbook/integrations/opcua-dense-format", + { + type: "category", + label: "Grafana", + collapsed: true, + items: [ + "playbook/integrations/grafana/dynamic-table-queries", + "playbook/integrations/grafana/read-only-user", + "playbook/integrations/grafana/variable-dropdown", + "playbook/integrations/grafana/overlay-timeshift", + ], + }, + ], + }, + { + type: "category", + label: "Programmatic", + collapsed: true, + items: [ + "playbook/programmatic/tls-ca-configuration", + { + type: "category", + label: "PHP", + items: [ + "playbook/programmatic/php/inserting-ilp", + ], + }, + { + type: "category", + label: "Ruby", + items: [ + "playbook/programmatic/ruby/inserting-ilp", + ], + }, + { + type: "category", + label: "C++", + items: [ + "playbook/programmatic/cpp/missing-columns", + ], + }, + ], + }, + { + type: "category", + label: "Operations", + collapsed: true, + items: [ + "playbook/operations/docker-compose-config", + "playbook/operations/store-questdb-metrics", + "playbook/operations/csv-import-milliseconds", + "playbook/operations/tls-pgbouncer", + "playbook/operations/copy-data-between-instances", + "playbook/operations/query-times-histogram", + "playbook/operations/optimize-many-tables", + "playbook/operations/check-transaction-applied", + "playbook/operations/show-non-default-params", + ], + }, + ], + }, { label: "Blog tutorials 🔗", type: "link",