diff --git a/docs/superpowers/specs/2026-03-27-wire-type-oid-fix-design.md b/docs/superpowers/specs/2026-03-27-wire-type-oid-fix-design.md new file mode 100644 index 00000000..639d03cf --- /dev/null +++ b/docs/superpowers/specs/2026-03-27-wire-type-oid-fix-design.md @@ -0,0 +1,111 @@ +# Wire Protocol Type OID Fix + +**Issue**: [#68](https://github.com/erans/pgsqlite/issues/68) - Wire protocol: type oid is incorrect for integer and float types +**Date**: 2026-03-27 +**Scope**: Fix fallback type OID inference for pre-existing SQLite tables that lack `__pgsqlite_schema` metadata + +## Problem + +When returning query results, pgsqlite sends OID 25 (TEXT) for integer, bigint, smallint, boolean, and float columns. This happens because the type inference chain defaults to TEXT when `__pgsqlite_schema` has no entry for a column. + +Tables created through pgsqlite's `CREATE TABLE` already work correctly — they populate `__pgsqlite_schema` with PG type names. But pre-existing SQLite tables (or tables created by other tools) have no metadata, so every column gets OID 25. + +PostgreSQL client libraries use the wire OID to deserialize values into the correct host language type (e.g., Python `int`, Go `int64`). With OID 25, all values arrive as strings. + +## Root Cause + +Multiple code paths default to OID 25 when `__pgsqlite_schema` has no entry: + +- **executor.rs line 1249**: final fallback in FieldDescription builder +- **extended.rs lines 851, 855, 899, 903**: Parse-time `inferred_types` construction — when `get_schema_type_with_session` returns `None` for source table.column lookups +- **extended.rs lines 4487-4511**: Execute-time `schema_types` population — same `get_schema_type_with_session` fallback + +The codebase already has `sqlite_type_info.rs` which contains a complete `PRAGMA table_info` → PG OID mapping, but it's never called from the main query paths. + +## Design + +### New DbHandler method + +**File**: `src/session/db_handler.rs` + +Add `get_column_types_from_pragma` — uses `with_session_connection` to run `PRAGMA table_info()` and returns `HashMap` mapping column names to PG type name strings. Internally calls `sqlite_type_info::sqlite_type_to_pg_type_name` to convert SQLite declared types to PG type names. + +```rust +pub async fn get_column_types_from_pragma( + &self, + session_id: &Uuid, + table_name: &str, +) -> Result, PgSqliteError> +``` + +Returns PG type name strings (e.g., `"integer"`, `"double precision"`, `"boolean"`) so callers can insert them directly into `schema_types` HashMaps. + +### New helper function + +**File**: `src/types/sqlite_type_info.rs` + +Add `sqlite_type_to_pg_type_name` that maps SQLite declared types to PG type name strings. Uses the same logic as the existing `sqlite_type_to_pg_oid` but returns `&'static str`: + +- `"INTEGER"` → `"integer"` +- `"INT8"` / `"BIGINT"` → `"bigint"` +- `"INT2"` / `"SMALLINT"` → `"smallint"` +- `"REAL"` / `"FLOAT"` / `"DOUBLE"` → `"double precision"` +- `"BOOLEAN"` / `"BOOL"` → `"boolean"` +- `"TEXT"` / `"VARCHAR"` / `"CHAR"` → `"text"` +- `"BLOB"` → `"bytea"` +- `"DATE"` → `"date"` +- `"TIMESTAMP"` → `"timestamp"` +- `"NUMERIC"` / `"DECIMAL"` → `"numeric"` +- Default → `"text"` + +### Simple query protocol fix + +**File**: `src/query/executor.rs` + +In the `schema_types` population loop (lines 1088-1155), after attempting `get_schema_type_with_session` for each column and finding no match, fall back to PRAGMA-based lookup. Fetch PRAGMA results once per table (before the column loop) and cache in a local `HashMap`. For each column without schema metadata, look up its type in the PRAGMA cache and insert the PG type name into `schema_types`. + +This ensures the final fallback at line 1249 is rarely reached — only for computed expressions or columns not in any table. + +### Extended query protocol fix + +**File**: `src/query/extended.rs` + +Three locations need the PRAGMA fallback: + +**1. Parse-time `schema_types` population (lines 563-640):** After `get_schema_type_with_session` returns `None`, fall back to PRAGMA lookup. Same pattern as executor.rs. + +**2. Parse-time `inferred_types` construction (lines 654-912):** This is the critical path for clients using the extended protocol (e.g., psycopg3). The fallback chain at lines 848-903 calls `get_schema_type_with_session` multiple times via alias resolution. At each `Ok(None)` branch (lines 851, 899) and `Err` branch (lines 855, 903), instead of pushing `PgType::Text.to_oid()`, call `get_column_types_from_pragma` and use `pg_type_string_to_oid` to get the correct OID. + +To avoid calling PRAGMA per-column, fetch the PRAGMA results once per table when the table name is first resolved (line 837 or 886) and cache in a local variable. + +**3. Execute-time `schema_types` population (lines 4487-4511):** Same pattern as #1 — after `get_schema_type_with_session` returns `None`, fall back to PRAGMA lookup. + +The downstream consumers (lines 2799, 2832, 3783 in `send_select_response` and `encode_row`) receive `field_types` from the Parse or Execute paths above. Once those paths produce correct OIDs, no changes are needed at these downstream locations. + +## Files Changed + +| File | Change | +|------|--------| +| `src/session/db_handler.rs` | Add `get_column_types_from_pragma` method | +| `src/types/sqlite_type_info.rs` | Add `sqlite_type_to_pg_type_name` function + tests | +| `src/query/executor.rs` | Add PRAGMA fallback in schema_types population loop | +| `src/query/extended.rs` | Add PRAGMA fallback in 3 locations: schema_types (2) + inferred_types (1) | + +## Testing + +- Pre-existing SQLite table with INTEGER column → OID 23 (int4) +- Pre-existing SQLite table with REAL column → OID 701 (float8) +- Pre-existing SQLite table with BOOLEAN column → OID 16 (bool) +- Pre-existing SQLite table with TEXT column → OID 25 (text) +- Pre-existing SQLite table with BLOB column → OID 17 (bytea) +- Pre-existing SQLite table with BIGINT column → OID 20 (int8) +- Pre-existing SQLite table with SMALLINT column → OID 21 (int2) +- Tables created via pgsqlite `CREATE TABLE` still work correctly (no regression) +- `sqlite_type_to_pg_type_name` unit tests for all type mappings +- `get_column_types_from_pragma` unit test with a test table + +## Known Limitations + +- Computed expressions without a table (e.g., `SELECT 1+2`) still default to TEXT — would require value-based inference, which is out of scope +- SQLite's type affinity system means `INTEGER` column could contain any type at runtime; we trust the declared type +- `PRAGMA table_info` doesn't distinguish between `INT4` and `INT8` when the declared type is just `INTEGER` — defaults to `int4` diff --git a/src/query/executor.rs b/src/query/executor.rs index a9c71ab4..4b5b2857 100644 --- a/src/query/executor.rs +++ b/src/query/executor.rs @@ -1087,7 +1087,10 @@ impl QueryExecutor { if let Some(ref table) = table_name { debug!("Type inference: Found table name '{}', looking up schema for {} columns", table, response.columns.len()); - + + // Pre-fetch PRAGMA table_info as fallback for tables without __pgsqlite_schema + let pragma_types = db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default(); + // Extract column mappings from query if possible let column_mappings = extract_column_mappings_from_query(query, table); @@ -1161,6 +1164,13 @@ impl QueryExecutor { debug!("Type inference: No schema type found for '{}.{}'", table, col_name); } + // PRAGMA table_info fallback for pre-existing SQLite tables + if !schema_types.contains_key(col_name) { + if let Some(pg_type) = pragma_types.get(col_name) { + debug!("Type inference: Found PRAGMA type for '{}.{}' -> {}", table, col_name, pg_type); + schema_types.insert(col_name.clone(), pg_type.clone()); + } + } } } else { debug!("Type inference: No table name extracted from query, using fallback logic"); diff --git a/src/query/extended.rs b/src/query/extended.rs index 8eca4f38..695b294f 100644 --- a/src/query/extended.rs +++ b/src/query/extended.rs @@ -563,6 +563,7 @@ impl ExtendedQueryHandler { let mut schema_types = std::collections::HashMap::new(); if let Some(ref table) = table_name { info!("PARSE: Fetching schema types for table '{}'", table); + let pragma_types = db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default(); // For aliased columns, try to find the source column for col_name in &response.columns { // First try direct lookup @@ -646,13 +647,25 @@ impl ExtendedQueryHandler { } } } + // PRAGMA table_info fallback + if !schema_types.contains_key(col_name) { + if let Some(pg_type) = pragma_types.get(col_name) { + info!("PARSE: Found PRAGMA type for '{}.{}' -> {}", table, col_name, pg_type); + schema_types.insert(col_name.clone(), pg_type.clone()); + } + } } } - + // Try to infer types from the first row if available // We need to handle this asynchronously for schema lookup let mut inferred_types = Vec::new(); - + let pragma_types_for_inferred = if let Some(ref table) = table_name { + db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default() + } else { + std::collections::HashMap::new() + }; + for (i, col_name) in response.columns.iter().enumerate() { info!("PARSE: Processing column {}: '{}'", i, col_name); let inferred_type = { @@ -846,13 +859,23 @@ impl ExtendedQueryHandler { inferred_types.push(type_oid); } Ok(None) => { - info!("Column '{}': no schema type found for '{}.{}', defaulting to text", - col_name, source_table, source_col); - inferred_types.push(PgType::Text.to_oid()); + if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) { + let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type); + info!("Column '{}': resolved type from PRAGMA -> {} (OID {})", col_name, pg_type, type_oid); + inferred_types.push(type_oid); + } else { + info!("Column '{}': no schema type found for '{}.{}', defaulting to text", + col_name, source_table, source_col); + inferred_types.push(PgType::Text.to_oid()); + } } Err(_) => { - // Schema lookup error, defaulting to text - inferred_types.push(PgType::Text.to_oid()); + if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) { + let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type); + inferred_types.push(type_oid); + } else { + inferred_types.push(PgType::Text.to_oid()); + } } } } else { @@ -894,13 +917,23 @@ impl ExtendedQueryHandler { inferred_types.push(type_oid); } Ok(None) => { - info!("Column '{}': no schema type found for '{}.{}', defaulting to text", - col_name, table_name, col_name); - inferred_types.push(PgType::Text.to_oid()); + if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) { + let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type); + info!("Column '{}': resolved type from PRAGMA -> {} (OID {})", col_name, pg_type, type_oid); + inferred_types.push(type_oid); + } else { + info!("Column '{}': no schema type found for '{}.{}', defaulting to text", + col_name, table_name, col_name); + inferred_types.push(PgType::Text.to_oid()); + } } Err(_) => { - // Schema lookup error, defaulting to text - inferred_types.push(PgType::Text.to_oid()); + if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) { + let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type); + inferred_types.push(type_oid); + } else { + inferred_types.push(PgType::Text.to_oid()); + } } } } else { @@ -4486,6 +4519,7 @@ impl ExtendedQueryHandler { // Pre-fetch schema types for all columns if we have a table name let mut schema_types = std::collections::HashMap::new(); if let Some(ref table) = table_name { + let pragma_types = db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default(); for col_name in &response.columns { // Try direct lookup first if let Ok(Some(pg_type)) = db.get_schema_type_with_session(&session.id, table, col_name).await { @@ -4508,6 +4542,13 @@ impl ExtendedQueryHandler { } } } + // PRAGMA table_info fallback + if !schema_types.contains_key(col_name) { + if let Some(pg_type) = pragma_types.get(col_name) { + info!("Found PRAGMA type for '{}.{}' -> {}", table, col_name, pg_type); + schema_types.insert(col_name.clone(), pg_type.clone()); + } + } } } diff --git a/src/session/db_handler.rs b/src/session/db_handler.rs index eb033fe9..9d2ff85b 100644 --- a/src/session/db_handler.rs +++ b/src/session/db_handler.rs @@ -2270,7 +2270,38 @@ impl DbHandler { } } } - + + /// Get column types from PRAGMA table_info for tables without __pgsqlite_schema metadata. + /// Returns a HashMap mapping column names to PostgreSQL type name strings. + pub async fn get_column_types_from_pragma( + &self, + session_id: &Uuid, + table_name: &str, + ) -> Result, PgSqliteError> { + let table_name = table_name.to_string(); + self.with_session_connection(session_id, move |conn| { + let mut result = std::collections::HashMap::new(); + let query = format!("PRAGMA table_info(\"{}\")", table_name); + let mut stmt = conn.prepare(&query)?; + let mut rows = stmt.query_map([], |row| { + let col_name: String = row.get(1)?; + let col_type: String = row.get(2)?; + Ok((col_name, col_type)) + })?; + while let Some(Ok((col_name, col_type))) = rows.next() { + if col_type.is_empty() { + continue; + } + let pg_type_name = crate::types::sqlite_type_info::sqlite_type_to_pg_type_name(&col_type); + if pg_type_name == "text" { + continue; + } + result.insert(col_name, pg_type_name.to_string()); + } + Ok(result) + }).await + } + /// Try fast path execution with parameters pub async fn try_execute_fast_path_with_params( &self, diff --git a/src/types/sqlite_type_info.rs b/src/types/sqlite_type_info.rs index b6290182..f68d91f4 100644 --- a/src/types/sqlite_type_info.rs +++ b/src/types/sqlite_type_info.rs @@ -115,6 +115,59 @@ pub fn sqlite_type_to_pg_oid(sqlite_type: &str) -> i32 { PgType::Text.to_oid() // text } +/// Convert SQLite type declaration to PostgreSQL type name string +pub fn sqlite_type_to_pg_type_name(sqlite_type: &str) -> &'static str { + let type_upper = sqlite_type.to_uppercase(); + + if type_upper.contains("BLOB") { + return "bytea"; + } + + if type_upper.contains("REAL") || type_upper.contains("FLOAT") || type_upper.contains("DOUBLE") { + return "double precision"; + } + + if type_upper.contains("INT") { + if type_upper.contains("INT2") || type_upper.contains("SMALLINT") { + return "smallint"; + } else if type_upper.contains("INT8") || type_upper.contains("BIGINT") { + return "bigint"; + } else { + return "integer"; + } + } + + if type_upper.contains("BOOL") { + return "boolean"; + } + + if type_upper.contains("DATE") && !type_upper.contains("TIME") { + return "date"; + } + + if type_upper.contains("TIME") && !type_upper.contains("STAMP") { + return "time"; + } + + if type_upper.contains("TIMESTAMP") { + return "timestamp"; + } + + if type_upper.contains("NUMERIC") || type_upper.contains("DECIMAL") { + return "numeric"; + } + + if type_upper.contains("UUID") { + return "uuid"; + } + + if type_upper.contains("JSON") { + return "json"; + } + + "text" +} + /// Infer PostgreSQL type from a text value pub fn infer_pg_type_from_text(value: &str) -> i32 { // Try boolean - but be careful not to confuse with regular integers @@ -161,7 +214,38 @@ pub fn infer_pg_type_from_text(value: &str) -> i32 { // For date/time patterns, return text type since SQLite stores them as text // and we can't be sure about the format without more context - + // Default to text PgType::Text.to_oid() // text +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_sqlite_type_to_pg_type_name() { + assert_eq!(sqlite_type_to_pg_type_name("INTEGER"), "integer"); + assert_eq!(sqlite_type_to_pg_type_name("integer"), "integer"); + assert_eq!(sqlite_type_to_pg_type_name("INT"), "integer"); + assert_eq!(sqlite_type_to_pg_type_name("BIGINT"), "bigint"); + assert_eq!(sqlite_type_to_pg_type_name("INT8"), "bigint"); + assert_eq!(sqlite_type_to_pg_type_name("SMALLINT"), "smallint"); + assert_eq!(sqlite_type_to_pg_type_name("INT2"), "smallint"); + assert_eq!(sqlite_type_to_pg_type_name("REAL"), "double precision"); + assert_eq!(sqlite_type_to_pg_type_name("FLOAT"), "double precision"); + assert_eq!(sqlite_type_to_pg_type_name("DOUBLE PRECISION"), "double precision"); + assert_eq!(sqlite_type_to_pg_type_name("BOOLEAN"), "boolean"); + assert_eq!(sqlite_type_to_pg_type_name("BOOL"), "boolean"); + assert_eq!(sqlite_type_to_pg_type_name("TEXT"), "text"); + assert_eq!(sqlite_type_to_pg_type_name("VARCHAR(50)"), "text"); + assert_eq!(sqlite_type_to_pg_type_name("BLOB"), "bytea"); + assert_eq!(sqlite_type_to_pg_type_name("DATE"), "date"); + assert_eq!(sqlite_type_to_pg_type_name("TIMESTAMP"), "timestamp"); + assert_eq!(sqlite_type_to_pg_type_name("NUMERIC"), "numeric"); + assert_eq!(sqlite_type_to_pg_type_name("DECIMAL"), "numeric"); + assert_eq!(sqlite_type_to_pg_type_name("UUID"), "uuid"); + assert_eq!(sqlite_type_to_pg_type_name("JSON"), "json"); + assert_eq!(sqlite_type_to_pg_type_name("SOMETHING_UNKNOWN"), "text"); + } } \ No newline at end of file diff --git a/tests/pg_index_test.rs b/tests/pg_index_test.rs index ecf3c5bf..d487dd79 100644 --- a/tests/pg_index_test.rs +++ b/tests/pg_index_test.rs @@ -37,10 +37,10 @@ async fn test_pg_index_basic() { for row in &rows { let indexrelid: i32 = row.get(0); let indrelid: i32 = row.get(1); - let indnatts: i32 = row.get(2); - let indnkeyatts: i32 = row.get(3); - let indisunique: i32 = row.get(4); - let indisprimary: i32 = row.get(5); + let indnatts: i16 = row.get(2); + let indnkeyatts: i16 = row.get(3); + let indisunique: bool = row.get(4); + let indisprimary: bool = row.get(5); let indkey: Option = row.get(6); let indkey_str = indkey.unwrap_or_else(|| "NULL".to_string());