Skip to content

Conversation

@fordN
Copy link
Contributor

@fordN fordN commented Feb 2, 2026

LLM coding agent focused documentation for implementing new loaders with minimal context (no reverse-engineering of code required). Supplements the documentation we already have in /docs for humans.

We'll continue to refine and optimize this setup as we use it.

Summary

  • Add NEW_LOADER_GUIDE.md with step-by-step instructions for implementing new data loaders
  • Update CLAUDE.md with quick reference and corrected method names
  • Documents all 7 existing loaders with their characteristics

- Create NEW_LOADER_GUIDE.md with step-by-step instructions, code templates,
  type mapping reference, and validation checklist
- Update CLAUDE.md with quick reference, correct method names, and links
  to the detailed guide
- Document all 7 existing loaders with their characteristics

This enables agents to implement new loaders without reverse-engineering
existing implementations.
@fordN fordN self-assigned this Feb 3, 2026
@fordN fordN added documentation Improvements or additions to documentation agents labels Feb 3, 2026
@fordN fordN merged commit bcaf064 into main Feb 9, 2026
5 checks passed
@fordN fordN deleted the ford/loader-guide branch February 9, 2026 22:01
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Performance Benchmark Results

Test Summary: 19 passed, 4 deselected, 3 warnings
Git Commit: bcaf064e
Environment: GitHub Actions

Results

Loader Test Throughput (rows/sec) Memory (MB) Duration (s) Dataset Size
postgresql large_table_loading_performance 141232 271.57 0.35 50,000
redis pipeline_performance 38525 0.00 0.00 50,000
redis data_structure_performance_hash 36739 0.00 0.00 50,000
redis data_structure_performance_string 54191 0.00 0.00 50,000
redis data_structure_performance_sorted_set 82400 0.00 0.00 50,000
redis memory_efficiency 36944 13.98 1.35 50,000
delta_lake large_file_write_performance 302023 283.56 0.17 50,000
lmdb large_table_loading_performance 38981 669.32 1.28 50,000
lmdb key_generation_strategy_performance_pattern_based 38607 0.00 0.00 50,000
lmdb key_generation_strategy_performance_single_column 42476 0.00 0.00 50,000
lmdb key_generation_strategy_performance_composite_key 38247 0.00 0.00 50,000
lmdb writemap_performance_with 43016 0.00 0.00 50,000
lmdb writemap_performance_without 48356 0.00 0.00 50,000
lmdb memory_efficiency 42966 177.73 1.16 50,000
lmdb concurrent_read_performance 134600 0.00 0.37 50,000
lmdb large_value_performance 31459 0.04 0.03 1,000
postgresql throughput_comparison 141859 0.00 0.00 10,000
redis throughput_comparison 37329 0.00 0.00 10,000
lmdb throughput_comparison 52067 0.00 0.00 10,000
delta_lake throughput_comparison 450768 0.00 0.00 10,000
iceberg large_file_write_performance 332227 433.07 0.15 50,000
Raw JSON Results
{
  "postgresql_large_table_loading_performance": {
    "test_name": "large_table_loading_performance",
    "loader_type": "postgresql",
    "throughput_rows_per_sec": 141232.05769015785,
    "memory_mb": 271.56640625,
    "duration_seconds": 0.35402727127075195,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:01:53.928905",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "redis_pipeline_performance": {
    "test_name": "pipeline_performance",
    "loader_type": "redis",
    "throughput_rows_per_sec": 38525.24938560631,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:08.031184",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "redis_data_structure_performance_hash": {
    "test_name": "data_structure_performance_hash",
    "loader_type": "redis",
    "throughput_rows_per_sec": 36738.74026962564,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:11.015529",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "redis_data_structure_performance_string": {
    "test_name": "data_structure_performance_string",
    "loader_type": "redis",
    "throughput_rows_per_sec": 54191.472477404415,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:11.018273",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "redis_data_structure_performance_sorted_set": {
    "test_name": "data_structure_performance_sorted_set",
    "loader_type": "redis",
    "throughput_rows_per_sec": 82400.40077404398,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:11.020565",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "redis_memory_efficiency": {
    "test_name": "memory_efficiency",
    "loader_type": "redis",
    "throughput_rows_per_sec": 36944.04519072545,
    "memory_mb": 13.981819152832031,
    "duration_seconds": 1.353398084640503,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:12.435376",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "delta_lake_large_file_write_performance": {
    "test_name": "large_file_write_performance",
    "loader_type": "delta_lake",
    "throughput_rows_per_sec": 302022.6997461004,
    "memory_mb": 283.55859375,
    "duration_seconds": 0.16555047035217285,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:12.670521",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_large_table_loading_performance": {
    "test_name": "large_table_loading_performance",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 38981.295724654745,
    "memory_mb": 669.32421875,
    "duration_seconds": 1.2826664447784424,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:15.736904",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_key_generation_strategy_performance_pattern_based": {
    "test_name": "key_generation_strategy_performance_pattern_based",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 38606.86625908723,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:19.728109",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_key_generation_strategy_performance_single_column": {
    "test_name": "key_generation_strategy_performance_single_column",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 42475.52163431206,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:19.730906",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_key_generation_strategy_performance_composite_key": {
    "test_name": "key_generation_strategy_performance_composite_key",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 38247.20493745294,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:19.733045",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_writemap_performance_with": {
    "test_name": "writemap_performance_with",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 43016.0671929138,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:28.187691",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_writemap_performance_without": {
    "test_name": "writemap_performance_without",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 48355.60212195842,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:28.190573",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_memory_efficiency": {
    "test_name": "memory_efficiency",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 42966.01792067331,
    "memory_mb": 177.734375,
    "duration_seconds": 1.163710355758667,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:29.495444",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_concurrent_read_performance": {
    "test_name": "concurrent_read_performance",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 134599.77600421035,
    "memory_mb": 0,
    "duration_seconds": 0.371471643447876,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:31.132519",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_large_value_performance": {
    "test_name": "large_value_performance",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 31459.24620288768,
    "memory_mb": 0.0390625,
    "duration_seconds": 0.03178715705871582,
    "dataset_size": 1000,
    "timestamp": "2026-02-09T22:02:31.682172",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "postgresql_throughput_comparison": {
    "test_name": "throughput_comparison",
    "loader_type": "postgresql",
    "throughput_rows_per_sec": 141859.05089171263,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 10000,
    "timestamp": "2026-02-09T22:02:32.285993",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "redis_throughput_comparison": {
    "test_name": "throughput_comparison",
    "loader_type": "redis",
    "throughput_rows_per_sec": 37328.85728042406,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 10000,
    "timestamp": "2026-02-09T22:02:32.288598",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "lmdb_throughput_comparison": {
    "test_name": "throughput_comparison",
    "loader_type": "lmdb",
    "throughput_rows_per_sec": 52066.934803118325,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 10000,
    "timestamp": "2026-02-09T22:02:32.290767",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "delta_lake_throughput_comparison": {
    "test_name": "throughput_comparison",
    "loader_type": "delta_lake",
    "throughput_rows_per_sec": 450767.7757716447,
    "memory_mb": 0,
    "duration_seconds": 0,
    "dataset_size": 10000,
    "timestamp": "2026-02-09T22:02:32.292875",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  },
  "iceberg_large_file_write_performance": {
    "test_name": "large_file_write_performance",
    "loader_type": "iceberg",
    "throughput_rows_per_sec": 332227.3620176161,
    "memory_mb": 433.07421875,
    "duration_seconds": 0.1504993438720703,
    "dataset_size": 50000,
    "timestamp": "2026-02-09T22:02:32.804735",
    "git_commit": "bcaf064e",
    "environment": "github-actions"
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants