-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Summary
The OpenSpec markdown parser does not skip code blocks when searching for section headers. This causes lines starting with # inside triple-backtick code blocks to be incorrectly parsed as markdown headers, breaking the document structure and resulting in incorrect requirement counts.
Environment
- OpenSpec version:
@fission-ai/openspec@0.15.0 - Node.js version: v20.19.3
- Platform: macOS (Darwin 24.6.0)
Expected Behavior
According to the CommonMark specification, content inside fenced code blocks should be treated as literal text and not parsed for markdown structure.
When a spec file contains code blocks with lines starting with # (bash comments, shell scripts, .env file examples, etc.), the parser should:
- Ignore those lines when detecting section headers
- Maintain the correct document hierarchy
- Count all requirements accurately
Actual Behavior
Lines starting with # inside code blocks are treated as level-1 markdown headers (#), which:
- Breaks the section hierarchy
- Causes the Requirements section to close prematurely
- Results in incorrect requirement counts
- Orphans subsequent requirements under unexpected parent sections
Reproduction
Minimal Test Case
Create a spec file with the following content:
# test-bug Specification
## Purpose
Minimal test case to reproduce the code block parsing bug.
## Requirements
### Requirement: First requirement before code block
This requirement comes before the problematic code block.
#### Scenario: Example with code block containing hash symbols
- **GIVEN** a code block with bash comments
\`\`\`bash
# This is a comment
echo "hello"
# Another comment
\`\`\`
- **THEN** the parser should ignore hash symbols inside code blocks
### Requirement: Second requirement after code block
This requirement comes after the code block.
#### Scenario: Another scenario
- **GIVEN** something
- **THEN** something happens
### Requirement: Third requirement
This is the third requirement.
#### Scenario: Third scenario
- **GIVEN** more conditions
- **THEN** more resultsVerification
# Count actual requirements in the file
$ grep -c "^### Requirement:" test-bug/spec.md
3
# Check what OpenSpec counts
$ openspec list --specs | grep test-bug
test-bug requirements 1Result: OpenSpec incorrectly reports 1 requirement instead of 3.
Control Test (Without Hash Symbols)
Create the same spec but remove the # symbols from the code block:
#### Scenario: Example with code block WITHOUT hash symbols
- **GIVEN** a code block without bash comments
\`\`\`bash
echo "hello"
echo "world"
\`\`\`
- **THEN** the parser works correctlyResult: OpenSpec correctly reports 3 requirements.
This proves the issue is specifically caused by # symbols inside code blocks.
Real-World Impact
This bug affects any spec that includes code examples with:
- Bash/shell script comments (
# comment) - Python comments (
# comment) - YAML comments (
# comment) - Environment file examples (
.envfiles use#for comments) - Configuration file examples
- Any other language/format that uses
#for comments
Users are forced to either:
- Remove legitimate code examples from their specs
- Omit comment lines from code examples (reducing clarity)
- Work around the issue by avoiding
#symbols entirely
Root Cause Analysis
After examining the source code in node_modules/@fission-ai/openspec/dist/core/parsers/markdown-parser.js:
File: markdown-parser.js
Method: parseSections() (lines 55-84)
The parser uses a simple regex to detect headers:
const headerMatch = line.match(/^(#{1,6})\s+(.+)$/);This regex matches ANY line starting with 1-6 hash symbols, without checking whether the line is inside a code block.
Missing logic:
- No state tracking for code block boundaries (triple backticks)
- No check to skip header detection when inside code blocks
- The
normalizeContent()method removes HTML comments but doesn't strip code block content
Suggested Fix
The parser should track code block state and skip header detection inside code blocks:
parseSections(content) {
const lines = content.split('\n');
const sections = [];
let inCodeBlock = false;
for (const line of lines) {
// Track code block boundaries
if (line.trim().startsWith('```')) {
inCodeBlock = !inCodeBlock;
continue;
}
// Skip header detection inside code blocks
if (inCodeBlock) {
continue;
}
// Existing header matching logic
const headerMatch = line.match(/^(#{1,6})\s+(.+)$/);
if (headerMatch) {
// ... process header
}
}
return sections;
}Alternative approaches:
- Use a proper markdown AST parser library (e.g.,
remark,marked) that handles code blocks correctly - Pre-process content to strip code block content in
normalizeContent()before parsing sections