A TypeScript/JavaScript parser for 3md (Trilingual Markdown) files.
This is the reference implementation of the 3md specification, demonstrating how to correctly parse and split 3md documents into separate language files.
- ✅ Parse 3md documents with inline separators (~)
- ✅ Parse 3md documents with block separators (෴)
- ✅ Split into separate Sinhala, Tamil, and English markdown files
- ✅ Preserve markdown formatting (headings, lists, paragraphs, etc.)
- ✅ Handle multi-line content with hard line breaks
- ✅ CLI tool for batch processing
- ✅ Full TypeScript types
- ✅ Remark plugin for markdown processing
npm install 3md-parserOr for development:
git clone https://github.com/mooniak/3md.git
cd 3md/parser
npm install
npm run buildimport { splitToLanguageFiles } from '3md-parser';
const content = `{{langs|si|ta|en}}
# හැඳින්වීම~அறிமுகம~Introduction
මෙය සරල පරිච්ඡේදයකි.~இது எளிய பத்தி.~This is a simple paragraph.`;
const result = await splitToLanguageFiles(content);
console.log(result.si); // Sinhala markdown
console.log(result.ta); // Tamil markdown
console.log(result.en); // English markdownSplit a 3md file into three language files:
3md-split document.3mdThis creates:
document.si.md- Sinhaladocument.ta.md- Tamildocument.en.md- English
Specify a custom output basename:
3md-split document.3md my-outputCreates my-output.si.md, my-output.ta.md, my-output.en.md
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkStringify from 'remark-stringify';
import { remark3md, preprocessText } from '3md-parser';
const content = `{{langs|si|ta|en}}
# හැඳින්වීම~அறிமுகம~Introduction`;
// Preprocess to handle block separators
const preprocessed = preprocessText(content, 'si');
// Process with remark
const processor = unified()
.use(remarkParse)
.use(remark3md, { locale: 'si' })
.use(remarkStringify);
const file = await processor.process(preprocessed);
console.log(String(file)); // Sinhala markdown outputSplits a 3md document into three separate language strings.
Parameters:
content(string) - The 3md document content
Returns:
{
si: string, // Sinhala markdown
ta: string, // Tamil markdown
en: string // English markdown
}Splits a 3md document and returns file objects ready to write.
Parameters:
content(string) - The 3md document contentbaseName(string) - Base name for output files
Returns:
{
files: [
{ filename: string, content: string },
{ filename: string, content: string },
{ filename: string, content: string }
]
}Preprocesses 3md text to handle block separators before markdown parsing.
Parameters:
text(string) - The 3md document contentlocale(optional) - Target locale ('si', 'ta', 'en', or combined like 'si-ta')
Returns: Preprocessed text ready for markdown parsing
Remark plugin for processing inline separators.
Options:
{
locale?: string // Target locale: 'si', 'ta', 'en', or 'si-ta', etc.
}The parser uses a two-phase processing architecture:
Handles block separators (෴) before markdown parsing:
- Extracts language declaration
- Splits multi-line blocks by block separator
- Selects content for target language
- Adds hard line breaks () to preserve multi-line paragraphs
Handles inline separators (~) during remark processing:
- Parses markdown into AST
- Processes headings with inline separators
- Processes paragraphs with inline separators
- Processes list items with inline separators
See: src/remark/locale.ts
This two-phase approach correctly handles both separator types while preserving markdown structure.
parser/
├── src/
│ ├── index.ts # Main exports
│ ├── types.ts # TypeScript type definitions
│ ├── cli.ts # CLI tool entry point
│ ├── splitter.ts # High-level splitting logic
│ └── remark/
│ ├── index.ts # Remark plugin
│ ├── locale.ts # Inline separator processing
│ └── preprocess.ts # Block separator preprocessing
├── test/
│ ├── locale.test.ts # Tests for inline separators
│ └── splitter.test.ts # Tests for file splitting
├── lib/ # Compiled JavaScript output
├── package.json
├── tsconfig.json
└── jest.config.json
npm run buildCompiles TypeScript to JavaScript in the lib/ directory.
npm testRuns the test suite with Jest.
npm run build:watchContinuously rebuilds on file changes during development.
The parser includes comprehensive tests covering:
- Inline separators in headings
- Inline separators in paragraphs
- Inline separators in list items
- Block separators with multi-line content
- Mixed inline and block separators
- Language declaration parsing
- Edge cases and error conditions
All tests pass against the canonical examples in ../spec/examples/.
This implementation conforms to the 3md specification v0.1.0 and:
- ✅ Correctly parses language declarations
- ✅ Handles inline separators (~)
- ✅ Handles block separators (෴)
- ✅ Preserves markdown formatting
- ✅ Passes all canonical examples
- ✅ Follows error handling spec
- unified - Markdown processing framework
- remark-parse - Markdown parser
- remark-stringify - Markdown generator
- unist-util-visit - AST traversal utility
Full TypeScript definitions are included:
import type { Lang, Locale, TrimdDocument } from '3md-parser';
const lang: Lang = 'si';
const locale: Locale = 'si-ta';MIT License - See LICENSE file for details
This is a reference implementation. If you find bugs or have suggestions:
- Check the specification to confirm expected behavior
- Open an issue describing the problem
- Submit a pull request with tests
- 3md Specification - Format specification
- Examples - Canonical test documents
- Root README - Project overview and use cases
A reference implementation for parsing trilingual Sri Lankan content.