Skip to content

Parser incorrectly treats hash symbols inside code blocks as markdown headers #312

@Brandon-Yang-Yu

Description

@Brandon-Yang-Yu

Summary

The OpenSpec markdown parser does not skip code blocks when searching for section headers. This causes lines starting with # inside triple-backtick code blocks to be incorrectly parsed as markdown headers, breaking the document structure and resulting in incorrect requirement counts.

Environment

  • OpenSpec version: @fission-ai/openspec@0.15.0
  • Node.js version: v20.19.3
  • Platform: macOS (Darwin 24.6.0)

Expected Behavior

According to the CommonMark specification, content inside fenced code blocks should be treated as literal text and not parsed for markdown structure.

When a spec file contains code blocks with lines starting with # (bash comments, shell scripts, .env file examples, etc.), the parser should:

  1. Ignore those lines when detecting section headers
  2. Maintain the correct document hierarchy
  3. Count all requirements accurately

Actual Behavior

Lines starting with # inside code blocks are treated as level-1 markdown headers (#), which:

  1. Breaks the section hierarchy
  2. Causes the Requirements section to close prematurely
  3. Results in incorrect requirement counts
  4. Orphans subsequent requirements under unexpected parent sections

Reproduction

Minimal Test Case

Create a spec file with the following content:

# test-bug Specification

## Purpose
Minimal test case to reproduce the code block parsing bug.

## Requirements
### Requirement: First requirement before code block
This requirement comes before the problematic code block.

#### Scenario: Example with code block containing hash symbols
- **GIVEN** a code block with bash comments
\`\`\`bash
# This is a comment
echo "hello"
# Another comment
\`\`\`
- **THEN** the parser should ignore hash symbols inside code blocks

### Requirement: Second requirement after code block
This requirement comes after the code block.

#### Scenario: Another scenario
- **GIVEN** something
- **THEN** something happens

### Requirement: Third requirement
This is the third requirement.

#### Scenario: Third scenario
- **GIVEN** more conditions
- **THEN** more results

Verification

# Count actual requirements in the file
$ grep -c "^### Requirement:" test-bug/spec.md
3

# Check what OpenSpec counts
$ openspec list --specs | grep test-bug
test-bug               requirements 1

Result: OpenSpec incorrectly reports 1 requirement instead of 3.

Control Test (Without Hash Symbols)

Create the same spec but remove the # symbols from the code block:

#### Scenario: Example with code block WITHOUT hash symbols
- **GIVEN** a code block without bash comments
\`\`\`bash
echo "hello"
echo "world"
\`\`\`
- **THEN** the parser works correctly

Result: OpenSpec correctly reports 3 requirements.

This proves the issue is specifically caused by # symbols inside code blocks.

Real-World Impact

This bug affects any spec that includes code examples with:

  • Bash/shell script comments (# comment)
  • Python comments (# comment)
  • YAML comments (# comment)
  • Environment file examples (.env files use # for comments)
  • Configuration file examples
  • Any other language/format that uses # for comments

Users are forced to either:

  1. Remove legitimate code examples from their specs
  2. Omit comment lines from code examples (reducing clarity)
  3. Work around the issue by avoiding # symbols entirely

Root Cause Analysis

After examining the source code in node_modules/@fission-ai/openspec/dist/core/parsers/markdown-parser.js:

File: markdown-parser.js
Method: parseSections() (lines 55-84)

The parser uses a simple regex to detect headers:

const headerMatch = line.match(/^(#{1,6})\s+(.+)$/);

This regex matches ANY line starting with 1-6 hash symbols, without checking whether the line is inside a code block.

Missing logic:

  • No state tracking for code block boundaries (triple backticks)
  • No check to skip header detection when inside code blocks
  • The normalizeContent() method removes HTML comments but doesn't strip code block content

Suggested Fix

The parser should track code block state and skip header detection inside code blocks:

parseSections(content) {
  const lines = content.split('\n');
  const sections = [];
  let inCodeBlock = false;

  for (const line of lines) {
    // Track code block boundaries
    if (line.trim().startsWith('```')) {
      inCodeBlock = !inCodeBlock;
      continue;
    }

    // Skip header detection inside code blocks
    if (inCodeBlock) {
      continue;
    }

    // Existing header matching logic
    const headerMatch = line.match(/^(#{1,6})\s+(.+)$/);
    if (headerMatch) {
      // ... process header
    }
  }

  return sections;
}

Alternative approaches:

  1. Use a proper markdown AST parser library (e.g., remark, marked) that handles code blocks correctly
  2. Pre-process content to strip code block content in normalizeContent() before parsing sections

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions