Skip to content

Commit 56fa9a7

Browse files
eransclaude
andauthored
fix: correct wire protocol type OIDs for pre-existing SQLite tables (#68) (#74)
* docs: add design spec for wire protocol type OID fix (#68) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add sqlite_type_to_pg_type_name helper (#68) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add get_column_types_from_pragma to DbHandler (#68) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add PRAGMA type fallback in simple query protocol (#68) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add PRAGMA type fallback in extended query protocol (#68) Add PRAGMA table_info fallback for type inference in the extended protocol's Parse-time schema_types, Execute-time schema_types, and Parse-time inferred_types paths. Skip empty and default-text types from PRAGMA to preserve value-based inference for untyped columns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 085c34d commit 56fa9a7

6 files changed

Lines changed: 296 additions & 19 deletions

File tree

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Wire Protocol Type OID Fix
2+
3+
**Issue**: [#68](https://github.com/erans/pgsqlite/issues/68) - Wire protocol: type oid is incorrect for integer and float types
4+
**Date**: 2026-03-27
5+
**Scope**: Fix fallback type OID inference for pre-existing SQLite tables that lack `__pgsqlite_schema` metadata
6+
7+
## Problem
8+
9+
When returning query results, pgsqlite sends OID 25 (TEXT) for integer, bigint, smallint, boolean, and float columns. This happens because the type inference chain defaults to TEXT when `__pgsqlite_schema` has no entry for a column.
10+
11+
Tables created through pgsqlite's `CREATE TABLE` already work correctly — they populate `__pgsqlite_schema` with PG type names. But pre-existing SQLite tables (or tables created by other tools) have no metadata, so every column gets OID 25.
12+
13+
PostgreSQL client libraries use the wire OID to deserialize values into the correct host language type (e.g., Python `int`, Go `int64`). With OID 25, all values arrive as strings.
14+
15+
## Root Cause
16+
17+
Multiple code paths default to OID 25 when `__pgsqlite_schema` has no entry:
18+
19+
- **executor.rs line 1249**: final fallback in FieldDescription builder
20+
- **extended.rs lines 851, 855, 899, 903**: Parse-time `inferred_types` construction — when `get_schema_type_with_session` returns `None` for source table.column lookups
21+
- **extended.rs lines 4487-4511**: Execute-time `schema_types` population — same `get_schema_type_with_session` fallback
22+
23+
The codebase already has `sqlite_type_info.rs` which contains a complete `PRAGMA table_info` → PG OID mapping, but it's never called from the main query paths.
24+
25+
## Design
26+
27+
### New DbHandler method
28+
29+
**File**: `src/session/db_handler.rs`
30+
31+
Add `get_column_types_from_pragma` — uses `with_session_connection` to run `PRAGMA table_info(<table>)` and returns `HashMap<String, String>` mapping column names to PG type name strings. Internally calls `sqlite_type_info::sqlite_type_to_pg_type_name` to convert SQLite declared types to PG type names.
32+
33+
```rust
34+
pub async fn get_column_types_from_pragma(
35+
&self,
36+
session_id: &Uuid,
37+
table_name: &str,
38+
) -> Result<HashMap<String, String>, PgSqliteError>
39+
```
40+
41+
Returns PG type name strings (e.g., `"integer"`, `"double precision"`, `"boolean"`) so callers can insert them directly into `schema_types` HashMaps.
42+
43+
### New helper function
44+
45+
**File**: `src/types/sqlite_type_info.rs`
46+
47+
Add `sqlite_type_to_pg_type_name` that maps SQLite declared types to PG type name strings. Uses the same logic as the existing `sqlite_type_to_pg_oid` but returns `&'static str`:
48+
49+
- `"INTEGER"` → `"integer"`
50+
- `"INT8"` / `"BIGINT"` → `"bigint"`
51+
- `"INT2"` / `"SMALLINT"` → `"smallint"`
52+
- `"REAL"` / `"FLOAT"` / `"DOUBLE"` → `"double precision"`
53+
- `"BOOLEAN"` / `"BOOL"` → `"boolean"`
54+
- `"TEXT"` / `"VARCHAR"` / `"CHAR"` → `"text"`
55+
- `"BLOB"` → `"bytea"`
56+
- `"DATE"` → `"date"`
57+
- `"TIMESTAMP"` → `"timestamp"`
58+
- `"NUMERIC"` / `"DECIMAL"` → `"numeric"`
59+
- Default → `"text"`
60+
61+
### Simple query protocol fix
62+
63+
**File**: `src/query/executor.rs`
64+
65+
In the `schema_types` population loop (lines 1088-1155), after attempting `get_schema_type_with_session` for each column and finding no match, fall back to PRAGMA-based lookup. Fetch PRAGMA results once per table (before the column loop) and cache in a local `HashMap`. For each column without schema metadata, look up its type in the PRAGMA cache and insert the PG type name into `schema_types`.
66+
67+
This ensures the final fallback at line 1249 is rarely reachedonly for computed expressions or columns not in any table.
68+
69+
### Extended query protocol fix
70+
71+
**File**: `src/query/extended.rs`
72+
73+
Three locations need the PRAGMA fallback:
74+
75+
**1. Parse-time `schema_types` population (lines 563-640):** After `get_schema_type_with_session` returns `None`, fall back to PRAGMA lookup. Same pattern as executor.rs.
76+
77+
**2. Parse-time `inferred_types` construction (lines 654-912):** This is the critical path for clients using the extended protocol (e.g., psycopg3). The fallback chain at lines 848-903 calls `get_schema_type_with_session` multiple times via alias resolution. At each `Ok(None)` branch (lines 851, 899) and `Err` branch (lines 855, 903), instead of pushing `PgType::Text.to_oid()`, call `get_column_types_from_pragma` and use `pg_type_string_to_oid` to get the correct OID.
78+
79+
To avoid calling PRAGMA per-column, fetch the PRAGMA results once per table when the table name is first resolved (line 837 or 886) and cache in a local variable.
80+
81+
**3. Execute-time `schema_types` population (lines 4487-4511):** Same pattern as #1after `get_schema_type_with_session` returns `None`, fall back to PRAGMA lookup.
82+
83+
The downstream consumers (lines 2799, 2832, 3783 in `send_select_response` and `encode_row`) receive `field_types` from the Parse or Execute paths above. Once those paths produce correct OIDs, no changes are needed at these downstream locations.
84+
85+
## Files Changed
86+
87+
| File | Change |
88+
|------|--------|
89+
| `src/session/db_handler.rs` | Add `get_column_types_from_pragma` method |
90+
| `src/types/sqlite_type_info.rs` | Add `sqlite_type_to_pg_type_name` function + tests |
91+
| `src/query/executor.rs` | Add PRAGMA fallback in schema_types population loop |
92+
| `src/query/extended.rs` | Add PRAGMA fallback in 3 locations: schema_types (2) + inferred_types (1) |
93+
94+
## Testing
95+
96+
- Pre-existing SQLite table with INTEGER columnOID 23 (int4)
97+
- Pre-existing SQLite table with REAL columnOID 701 (float8)
98+
- Pre-existing SQLite table with BOOLEAN columnOID 16 (bool)
99+
- Pre-existing SQLite table with TEXT columnOID 25 (text)
100+
- Pre-existing SQLite table with BLOB columnOID 17 (bytea)
101+
- Pre-existing SQLite table with BIGINT columnOID 20 (int8)
102+
- Pre-existing SQLite table with SMALLINT columnOID 21 (int2)
103+
- Tables created via pgsqlite `CREATE TABLE` still work correctly (no regression)
104+
- `sqlite_type_to_pg_type_name` unit tests for all type mappings
105+
- `get_column_types_from_pragma` unit test with a test table
106+
107+
## Known Limitations
108+
109+
- Computed expressions without a table (e.g., `SELECT 1+2`) still default to TEXTwould require value-based inference, which is out of scope
110+
- SQLite's type affinity system means `INTEGER` column could contain any type at runtime; we trust the declared type
111+
- `PRAGMA table_info` doesn't distinguish between `INT4` and `INT8` when the declared type is just `INTEGER` — defaults to `int4`

src/query/executor.rs

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1087,7 +1087,10 @@ impl QueryExecutor {
10871087

10881088
if let Some(ref table) = table_name {
10891089
debug!("Type inference: Found table name '{}', looking up schema for {} columns", table, response.columns.len());
1090-
1090+
1091+
// Pre-fetch PRAGMA table_info as fallback for tables without __pgsqlite_schema
1092+
let pragma_types = db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default();
1093+
10911094
// Extract column mappings from query if possible
10921095
let column_mappings = extract_column_mappings_from_query(query, table);
10931096

@@ -1161,6 +1164,13 @@ impl QueryExecutor {
11611164

11621165
debug!("Type inference: No schema type found for '{}.{}'", table, col_name);
11631166
}
1167+
// PRAGMA table_info fallback for pre-existing SQLite tables
1168+
if !schema_types.contains_key(col_name) {
1169+
if let Some(pg_type) = pragma_types.get(col_name) {
1170+
debug!("Type inference: Found PRAGMA type for '{}.{}' -> {}", table, col_name, pg_type);
1171+
schema_types.insert(col_name.clone(), pg_type.clone());
1172+
}
1173+
}
11641174
}
11651175
} else {
11661176
debug!("Type inference: No table name extracted from query, using fallback logic");

src/query/extended.rs

Lines changed: 53 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -563,6 +563,7 @@ impl ExtendedQueryHandler {
563563
let mut schema_types = std::collections::HashMap::new();
564564
if let Some(ref table) = table_name {
565565
info!("PARSE: Fetching schema types for table '{}'", table);
566+
let pragma_types = db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default();
566567
// For aliased columns, try to find the source column
567568
for col_name in &response.columns {
568569
// First try direct lookup
@@ -646,13 +647,25 @@ impl ExtendedQueryHandler {
646647
}
647648
}
648649
}
650+
// PRAGMA table_info fallback
651+
if !schema_types.contains_key(col_name) {
652+
if let Some(pg_type) = pragma_types.get(col_name) {
653+
info!("PARSE: Found PRAGMA type for '{}.{}' -> {}", table, col_name, pg_type);
654+
schema_types.insert(col_name.clone(), pg_type.clone());
655+
}
656+
}
649657
}
650658
}
651-
659+
652660
// Try to infer types from the first row if available
653661
// We need to handle this asynchronously for schema lookup
654662
let mut inferred_types = Vec::new();
655-
663+
let pragma_types_for_inferred = if let Some(ref table) = table_name {
664+
db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default()
665+
} else {
666+
std::collections::HashMap::new()
667+
};
668+
656669
for (i, col_name) in response.columns.iter().enumerate() {
657670
info!("PARSE: Processing column {}: '{}'", i, col_name);
658671
let inferred_type = {
@@ -846,13 +859,23 @@ impl ExtendedQueryHandler {
846859
inferred_types.push(type_oid);
847860
}
848861
Ok(None) => {
849-
info!("Column '{}': no schema type found for '{}.{}', defaulting to text",
850-
col_name, source_table, source_col);
851-
inferred_types.push(PgType::Text.to_oid());
862+
if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) {
863+
let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type);
864+
info!("Column '{}': resolved type from PRAGMA -> {} (OID {})", col_name, pg_type, type_oid);
865+
inferred_types.push(type_oid);
866+
} else {
867+
info!("Column '{}': no schema type found for '{}.{}', defaulting to text",
868+
col_name, source_table, source_col);
869+
inferred_types.push(PgType::Text.to_oid());
870+
}
852871
}
853872
Err(_) => {
854-
// Schema lookup error, defaulting to text
855-
inferred_types.push(PgType::Text.to_oid());
873+
if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) {
874+
let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type);
875+
inferred_types.push(type_oid);
876+
} else {
877+
inferred_types.push(PgType::Text.to_oid());
878+
}
856879
}
857880
}
858881
} else {
@@ -894,13 +917,23 @@ impl ExtendedQueryHandler {
894917
inferred_types.push(type_oid);
895918
}
896919
Ok(None) => {
897-
info!("Column '{}': no schema type found for '{}.{}', defaulting to text",
898-
col_name, table_name, col_name);
899-
inferred_types.push(PgType::Text.to_oid());
920+
if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) {
921+
let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type);
922+
info!("Column '{}': resolved type from PRAGMA -> {} (OID {})", col_name, pg_type, type_oid);
923+
inferred_types.push(type_oid);
924+
} else {
925+
info!("Column '{}': no schema type found for '{}.{}', defaulting to text",
926+
col_name, table_name, col_name);
927+
inferred_types.push(PgType::Text.to_oid());
928+
}
900929
}
901930
Err(_) => {
902-
// Schema lookup error, defaulting to text
903-
inferred_types.push(PgType::Text.to_oid());
931+
if let Some(pg_type) = pragma_types_for_inferred.get(col_name.as_str()) {
932+
let type_oid = crate::types::SchemaTypeMapper::pg_type_string_to_oid(pg_type);
933+
inferred_types.push(type_oid);
934+
} else {
935+
inferred_types.push(PgType::Text.to_oid());
936+
}
904937
}
905938
}
906939
} else {
@@ -4486,6 +4519,7 @@ impl ExtendedQueryHandler {
44864519
// Pre-fetch schema types for all columns if we have a table name
44874520
let mut schema_types = std::collections::HashMap::new();
44884521
if let Some(ref table) = table_name {
4522+
let pragma_types = db.get_column_types_from_pragma(&session.id, table).await.unwrap_or_default();
44894523
for col_name in &response.columns {
44904524
// Try direct lookup first
44914525
if let Ok(Some(pg_type)) = db.get_schema_type_with_session(&session.id, table, col_name).await {
@@ -4508,6 +4542,13 @@ impl ExtendedQueryHandler {
45084542
}
45094543
}
45104544
}
4545+
// PRAGMA table_info fallback
4546+
if !schema_types.contains_key(col_name) {
4547+
if let Some(pg_type) = pragma_types.get(col_name) {
4548+
info!("Found PRAGMA type for '{}.{}' -> {}", table, col_name, pg_type);
4549+
schema_types.insert(col_name.clone(), pg_type.clone());
4550+
}
4551+
}
45114552
}
45124553
}
45134554

src/session/db_handler.rs

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2270,7 +2270,38 @@ impl DbHandler {
22702270
}
22712271
}
22722272
}
2273-
2273+
2274+
/// Get column types from PRAGMA table_info for tables without __pgsqlite_schema metadata.
2275+
/// Returns a HashMap mapping column names to PostgreSQL type name strings.
2276+
pub async fn get_column_types_from_pragma(
2277+
&self,
2278+
session_id: &Uuid,
2279+
table_name: &str,
2280+
) -> Result<std::collections::HashMap<String, String>, PgSqliteError> {
2281+
let table_name = table_name.to_string();
2282+
self.with_session_connection(session_id, move |conn| {
2283+
let mut result = std::collections::HashMap::new();
2284+
let query = format!("PRAGMA table_info(\"{}\")", table_name);
2285+
let mut stmt = conn.prepare(&query)?;
2286+
let mut rows = stmt.query_map([], |row| {
2287+
let col_name: String = row.get(1)?;
2288+
let col_type: String = row.get(2)?;
2289+
Ok((col_name, col_type))
2290+
})?;
2291+
while let Some(Ok((col_name, col_type))) = rows.next() {
2292+
if col_type.is_empty() {
2293+
continue;
2294+
}
2295+
let pg_type_name = crate::types::sqlite_type_info::sqlite_type_to_pg_type_name(&col_type);
2296+
if pg_type_name == "text" {
2297+
continue;
2298+
}
2299+
result.insert(col_name, pg_type_name.to_string());
2300+
}
2301+
Ok(result)
2302+
}).await
2303+
}
2304+
22742305
/// Try fast path execution with parameters
22752306
pub async fn try_execute_fast_path_with_params(
22762307
&self,

0 commit comments

Comments
 (0)