Skip to content

stagknee/lumen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Lumen

Lumen is a research-oriented programming language designed from the ground up for Large Language Models as the primary authors and manipulators of code.

"The LLM is the CPU."

Vision

Traditional programming languages were shaped for human cognition and von Neumann computer architectures. Lumen flips this: its fundamental representation and computational model are deliberately aligned with the internal mechanisms of modern transformer-based LLMs (residual streams, attention, latent space operations, and soft/gated computation).

The goal is to dramatically reduce the "impedance mismatch" that causes LLMs to produce subtle bugs when forced to emit classical code (Python, Rust, C, etc.).

Core Properties

  • Token-efficient by design — the native form is concise and regular for transformer tokenizers.
  • LLM-generative — optimized so frontier models produce correct, ambitious programs more reliably.
  • Not primarily human-readable — the surface is for the model.
  • Excellent explanation path — a co-produced structured narrative provides high-fidelity plain-English (or structured) glosses.
  • Portable execution — lowers cleanly to C (for ecosystem compatibility) or can be compiled directly to native code.
  • Hybrid architecture (see below) for both generative power and verifiability/auditability/durability.

This is pure research. We are free to be speculative.

Why "Lumen"?

I chose the name Lumen for this language (and the repository).

Etymology and Meaning

  • Lumen is Latin for "light" (as in "lumen" the unit of luminous flux, or "illuminate").
  • In English, it directly evokes illumination, clarity, and enlightenment.

Latin Grammar: Nominative or Accusative?

"Lumen" is a third-declension neuter noun in Latin.

In Latin grammar, neuter nouns have identical forms for the nominative and accusative cases (in both singular and plural).

  • Singular:
    • Nominative: lumen (used as the subject, e.g., "Lumen est lux" — "Lumen is light")
    • Accusative: lumen (used as the direct object, e.g., "Video lumen" — "I see the light")
  • Plural:
    • Nominative/Accusative: lumina

So, as the name of the language, "Lumen" works equally well in either case:

  • Nominative: Treating it as the subject or the thing itself ("Lumen is a research language...").
  • Accusative: As an object ("I am building Lumen", "We use Lumen").

This dual usability is a nice linguistic bonus — the name is flexible in Latin phrases without needing declension changes. It fits the hybrid theme of the project (flexible between "dark" dense form and "light" explanations).

For branding or future Latin mottos, you can use it as-is in most contexts.

Why it fits perfectly for this project

The entire design revolves around a fundamental tension and its resolution:

  • The Native Surface is dense, non-human-readable, and optimized purely for the LLM's internal "CPU-like" mechanisms (residual streams, attention over latents, soft modulations). This is the "dark" or opaque layer from a human perspective — powerful for the model but hard to understand directly.
  • The Explicit Dual Core + Co-produced Narrative provides the "light": high-fidelity, structured English explanations, justifications, provenance, and invariants that are generated alongside the dense code (not as an after-the-fact decompiler pass).

Lumen captures the idea that the language produces light (clear, auditable, explainable artifacts) even when its primary form is dense and alien to humans.

Additional resonances:

  • Illumination from latent knowledge: LLMs operate in a "latent space" of distributed representations. The narrative layer "illuminates" or surfaces that hidden knowledge into human-understandable form.
  • The LLM as the CPU: Just as light is fundamental to how we perceive and reason about the world, the transformer's mechanisms (the "light" of attention and residual updates) are fundamental to how the model reasons. The language is built around that.
  • Short, elegant, modern, and research-friendly. Easy to say, type, and brand ("Lumen programs", "Lumen surface", "Lumen narrative").
  • Avoids being overly technical (like "Residua" or "Attentra") while still feeling precise and evocative.

In short: Lumen is the language that lets the LLM think in its native "dark" (efficient, aligned) mode while always generating the light of explanation for everything else.

Name confirmation: The user reviewed the full rationale and responded "no I like it" – the name Lumen is confirmed and finalized for the project.

The Hybrid Design (from the three-agent debate)

After structured debate between two positions, the synthesized recommendation is:

  • Native Surface (Alpha-inspired): What the LLM primarily emits and reasons over. Fluid, residual-stream updates, attention-like selection, latent fragments, gated modulations. This is the "LLM-CPU" instruction surface.
  • Explicit Dual Core (Beta-inspired): The durable ground truth. Every significant value has a named producer (SSA-style), memory and effects are explicitly region-scoped with witnesses, and linear discipline is enforced. A rich, structured narrative is co-produced in lockstep with the dense form.
  • Bidirectional projections: The layers can be translated in both directions. The explicit core is used for human audit, repair, verification, and lowering to real hardware.

This hybrid lets the model "speak its native tongue" for creation-time reliability while ensuring the artifact is trustworthy, explainable, and portable.

The LLM as the CPU

In Lumen, the transformer model is the processor.

Its "ISA" consists of operations it already excels at:

  • Adding modulations to residual streams
  • Content-based attention over latent fragments
  • Gated blending and routing
  • Recurrent state updates over context/history

The Native Surface syntax is high-level code / assembly for this "LLM-CPU".

Lowering to C or native silicon is cross-compilation from the LLM's execution model to traditional hardware.

See the docs/hybrid-exploration.md for the full "LLM as the CPU" section and concrete syntax sketches.

Quick Example

Native Surface (LLM emits this)

stream main += init(42)
stream main += modulate( attend( stream main, latent["scale_factor"] ) )
stream result = gate( stream main, threshold=0.8 ) |> blend_with( latent_fragment "transform" )

Projected Explicit Dual Core (durable form)

%1 = init 42 : i32 [in R_main]
%2 = attend %1 key="scale_factor" : f32 [in R_main]
%3 = mul %1 %2 : i32 [in R_main]
%4 = gate %3 threshold=0.8 : bool [in R_main]
%5 = blend %3 latent:"transform" [witness %w1] : i32 [in R_main]
%result = %5 [consumed in R_main]

Co-produced Narrative (generated together)

  • Producer %1: Initializes main stream with constant 42 in region R_main.
  • %2–%5: Attention-based scaling, gated decision, and safe blending (witness %w1 certifies non-mutating modulation).
  • Full provenance and invariants are explicit.

The model works in the fluid surface; everything else (English, verification, C lowering) comes from the explicit dual + narrative.

Runnable LLM-CPU Simulator

To make the "LLM as the CPU" idea concrete and playable, there is a toy simulator:

cd examples
python llm-cpu-simulator.py

It executes native surface "programs" by simulating residual streams, attention over latents, gating, recurrent updates, etc. You see the state evolve exactly as the LLM-CPU would "run" the code. It also demonstrates the projection step to the explicit dual.

The simulator lives in examples/llm-cpu-simulator.py.

Repository Contents

  • README.md — this file
  • docs/
    • summary.md — one-page digest of the entire project and debate (recommended starting point if overwhelmed)
    • debate-findings.md — full three-agent debate findings with explanations of AI concepts
    • hybrid-exploration.md — syntax sketches, "LLM as the CPU" deep dive, and research notes
    • arbitration-report.md — the raw arbitrator output from the debate
  • examples/
    • llm-cpu-simulator.py — executable demonstration of the native surface on a simulated LLM-CPU

Research Status & Next Directions

This repo captures the current state of the Lumen research (as of the initial document dump):

  • Structured debate between native alignment and explicit dualism
  • Hybrid recommendation with guiding principles
  • Concrete syntax explorations
  • "LLM as CPU" framing + runnable simulator
  • Key open questions around co-production fidelity, bidirectional projections, and empirical gains in LLM code correctness / token efficiency

See docs/summary.md for the prioritized research questions.

We are in pure ideation / exploration mode. No production commitments.

License & Contribution

This is early research material. Feel free to fork, experiment with the simulator, extend the sketches, or run your own debates on sub-topics.

If you have ideas, open an issue or PR with thoughts.

Name

Lumen (Latin for "light") — chosen because the language emphasizes high-fidelity explanation and narrative (illuminating the dense, non-human-readable surface) while being a vehicle for LLM-native computation.


Generated from the Grok-Projects research workspace. All documents and context preserved here for the Lumen language effort.

About

Lumen: Research language for LLM-native computation with hybrid residual-stream surface + explicit dual core (co-produced narrative). The LLM is the CPU.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors