Skip to content

Latest commit

 

History

History
831 lines (661 loc) · 21.9 KB

File metadata and controls

831 lines (661 loc) · 21.9 KB

Setup

Here are the commands to create the basic structure:

mkdir -p mdtk/src/mdtk mdtk/tests
touch mdtk/src/mdtk/__init__.py
touch mdtk/src/mdtk/bookmarks.py
touch mdtk/tests/__init__.py
touch mdtk/pyproject.toml
touch mdtk/README.md
touch mdtk/.gitignore

Set GitHub username as environment variable

export GITHUB_USER=tommcd

Move to project directory if not already there

cd mdtk

Initialize git and create repo

git init
gh repo create $GITHUB_USER/mdtk --public --description "Markdown Toolkit - Tools for working with markdown files"

First, let's create a Python-specific .gitignore:

cat > .gitignore << 'EOF'
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
.env
.venv
venv/
ENV/
.idea/
.vscode/
EOF

Now let's commit and tag:

git add .
git commit -m "Initial commit: Basic project structure"
git tag -a v0.1.0 -m "Initial release"
# Create and switch to main branch
git branch -M main

# Add the remote repository (if not already added)
git remote add origin https://github.com/$GITHUB_USER/mdtk.git

# Now try pushing again
git push -u origin main --tags
cat > README.md << 'EOF'
# mdtk (Markdown Toolkit)

Tools for working with markdown files.

## Installation

```bash
pip install mdtk
```

## Features

- Chrome bookmarks to markdown converter
EOF

Let's start with the most minimal GitHub Action that just checks if the package builds. We'll create a simple workflow file:

mkdir -p .github/workflows
cat > .github/workflows/ci.yml << 'EOF'
name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.8'
    - name: Build package
      run: pip install build && python -m build
EOF

This is the simplest possible CI setup. Would you like me to explain what each part does before we commit and push it?

mkdir -p .github/workflows
cat > .github/workflows/ci.yml << 'EOF'
name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.8', '3.9', '3.10', '3.11', '3.12', '3.13']
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v5
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    - name: Build package
      run: pip install build && python -m build
EOF

Would you like to commit and test this version now?

git add .github/workflows/ci.yml
git commit -m "Add GitHub Actions CI workflow"
git push origin main

Let's start with updating the code to work as a proper module. Here's the first step - I'll show you the basic structure for bookmarks.py:

cat > src/mdtk/bookmarks.py << 'EOF'
"""Convert Chrome bookmarks to markdown format."""

from bs4 import BeautifulSoup
from pathlib import Path
from typing import Optional

def convert_bookmarks(input_file: Path, output_file: Path, folder_name: Optional[str] = "EXPORT_FOLDER") -> None:
    """Convert Chrome bookmarks HTML file to markdown format.

    Args:
        input_file: Path to the Chrome bookmarks HTML file
        output_file: Path where to save the markdown file
        folder_name: Name of the folder to extract (defaults to "EXPORT_FOLDER")
    """
    pass  # We'll implement this next

def main():
    """Command line interface."""
    pass  # We'll implement this next
EOF

This gives us the basic structure. Would you like me to explain any part before we implement the functions?

Great! Let's implement the convert_bookmarks function first. Here's the updated code:

cat > src/mdtk/bookmarks.py << 'EOF'
"""Convert Chrome bookmarks to markdown format."""

from bs4 import BeautifulSoup
from pathlib import Path
from typing import Optional

def convert_bookmarks(input_file: Path, output_file: Path, folder_name: Optional[str] = "EXPORT_FOLDER") -> None:
    """Convert Chrome bookmarks HTML file to markdown format."""
    with open(input_file, 'r', encoding='utf-8') as f:
        soup = BeautifulSoup(f, 'html.parser')

    folders = soup.find_all('h3')
    target_folder = None
    for folder in folders:
        if folder.string == folder_name:
            target_folder = folder
            break

    if not target_folder:
        raise ValueError(f"Folder '{folder_name}' not found!")

    bookmarks_dl = target_folder.find_next('dl')
    bookmarks = bookmarks_dl.find_all('a')

    with open(output_file, 'w', encoding='utf-8') as f:
        for bookmark in bookmarks:
            f.write(f"- [{bookmark.string}]({bookmark['href']})\n")

def main():
    """Command line interface."""
    pass  # We'll implement this next

if __name__ == '__main__':
    main()
EOF

Would you like to:

  1. Add the command-line interface next, or
  2. Add some error handling and input validation first?

Let's add error handling and input validation. Here's the updated version:

cat > src/mdtk/bookmarks.py << 'EOF'
"""Convert Chrome bookmarks to markdown format."""

from bs4 import BeautifulSoup
from pathlib import Path
from typing import Optional

class BookmarkError(Exception):
    """Base exception for bookmark conversion errors."""
    pass

def convert_bookmarks(input_file: Path, output_file: Path, folder_name: Optional[str] = "EXPORT_FOLDER") -> None:
    """Convert Chrome bookmarks HTML file to markdown format."""
    # Validate inputs
    if not isinstance(input_file, Path):
        input_file = Path(input_file)
    if not isinstance(output_file, Path):
        output_file = Path(output_file)

    # Check if input file exists and is readable
    if not input_file.exists():
        raise BookmarkError(f"Input file not found: {input_file}")
    if not input_file.is_file():
        raise BookmarkError(f"Not a file: {input_file}")

    # Check if output directory exists and is writable
    if not output_file.parent.exists():
        raise BookmarkError(f"Output directory does not exist: {output_file.parent}")

    try:
        with open(input_file, 'r', encoding='utf-8') as f:
            soup = BeautifulSoup(f, 'html.parser')
    except Exception as e:
        raise BookmarkError(f"Failed to parse HTML file: {e}")

    # Find target folder
    folders = soup.find_all('h3')
    target_folder = None
    for folder in folders:
        if folder.string == folder_name:
            target_folder = folder
            break

    if not target_folder:
        raise BookmarkError(f"Folder '{folder_name}' not found!")

    try:
        bookmarks_dl = target_folder.find_next('dl')
        if not bookmarks_dl:
            raise BookmarkError(f"No bookmarks found in folder '{folder_name}'")
        bookmarks = bookmarks_dl.find_all('a')

        with open(output_file, 'w', encoding='utf-8') as f:
            for bookmark in bookmarks:
                title = bookmark.string or "Untitled"
                url = bookmark.get('href', '')
                f.write(f"- [{title}]({url})\n")
    except Exception as e:
        raise BookmarkError(f"Failed to process bookmarks: {e}")

def main():
    """Command line interface."""
    pass  # We'll implement this next

if __name__ == '__main__':
    main()
EOF

Let's add the command-line interface using argparse. Here's the updated version:

cat > src/mdtk/bookmarks.py << 'EOF'
"""Convert Chrome bookmarks to markdown format."""

from bs4 import BeautifulSoup
from pathlib import Path
from typing import Optional
import argparse
import sys

class BookmarkError(Exception):
    """Base exception for bookmark conversion errors."""
    pass

def convert_bookmarks(input_file: Path, output_file: Path, folder_name: Optional[str] = "EXPORT_FOLDER") -> None:
    """Convert Chrome bookmarks HTML file to markdown format."""
    # [Previous implementation remains the same]
    # Validate inputs
    if not isinstance(input_file, Path):
        input_file = Path(input_file)
    if not isinstance(output_file, Path):
        output_file = Path(output_file)

    # Check if input file exists and is readable
    if not input_file.exists():
        raise BookmarkError(f"Input file not found: {input_file}")
    if not input_file.is_file():
        raise BookmarkError(f"Not a file: {input_file}")

    # Check if output directory exists and is writable
    if not output_file.parent.exists():
        raise BookmarkError(f"Output directory does not exist: {output_file.parent}")

    try:
        with open(input_file, 'r', encoding='utf-8') as f:
            soup = BeautifulSoup(f, 'html.parser')
    except Exception as e:
        raise BookmarkError(f"Failed to parse HTML file: {e}")

    # Find target folder
    folders = soup.find_all('h3')
    target_folder = None
    for folder in folders:
        if folder.string == folder_name:
            target_folder = folder
            break

    if not target_folder:
        raise BookmarkError(f"Folder '{folder_name}' not found!")

    try:
        bookmarks_dl = target_folder.find_next('dl')
        if not bookmarks_dl:
            raise BookmarkError(f"No bookmarks found in folder '{folder_name}'")
        bookmarks = bookmarks_dl.find_all('a')

        with open(output_file, 'w', encoding='utf-8') as f:
            for bookmark in bookmarks:
                title = bookmark.string or "Untitled"
                url = bookmark.get('href', '')
                f.write(f"- [{title}]({url})\n")
    except Exception as e:
        raise BookmarkError(f"Failed to process bookmarks: {e}")

def main():
    """Command line interface."""
    parser = argparse.ArgumentParser(
        description="Convert Chrome bookmarks to markdown format"
    )
    parser.add_argument(
        'input_file',
        help="Chrome bookmarks HTML file"
    )
    parser.add_argument(
        'output_file',
        help="Output markdown file"
    )
    parser.add_argument(
        '--folder',
        default="EXPORT_FOLDER",
        help="Folder name to extract (default: EXPORT_FOLDER)"
    )

    args = parser.parse_args()

    try:
        convert_bookmarks(args.input_file, args.output_file, args.folder)
    except BookmarkError as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)
    except Exception as e:
        print(f"Unexpected error: {e}", file=sys.stderr)
        sys.exit(1)

if __name__ == '__main__':
    main()
EOF

Now you can use it from the command line like:

mdtk-bookmarks bookmarks.html output.md --folder "My Folder"

I'll show you how to create a minimal test bookmarks file with a known structure:

cat > test_bookmarks.html << 'EOF'
<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL><p>
    <DT><H3>EXPORT_FOLDER</H3>
    <DL><p>
        <DT><A HREF="https://github.com">GitHub - Where the world builds software</A>
        <DT><A HREF="https://python.org">Welcome to Python.org</A>
    </DL><p>
    <DT><H3>Other Folder</H3>
    <DL><p>
        <DT><A HREF="https://example.com">Example Domain</A>
    </DL><p>
</DL><p>
EOF

Yes, using a venv is a good practice! And uv is indeed faster. Let's set it up:

# Install uv if not already installed
#pip install uv

# Create and activate a venv using uv
uv venv
source .venv/Scripts/activate


# Install package in editable mode
uv pip install -e .

Would you like me to explain any of these steps before we proceed with testing the package?

Yes, you're absolutely right! Pytest is more modern and provides better features. Let's update our approach:

First, let's add pytest to our project dependencies in pyproject.toml:

cat > pyproject.toml << 'EOF'
[build-system]
requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"]
build-backend = "setuptools.build_meta"

[project]
name = "mdtk"
dynamic = ["version"]
description = "Markdown Toolkit - Tools for working with markdown files"
readme = "README.md"
requires-python = ">=3.8"
license = {text = "MIT"}
dependencies = [
    "beautifulsoup4",
]

[project.optional-dependencies]
test = [
    "pytest",
]

[tool.setuptools_scm]
write_to = "src/mdtk/_version.py"

[project.scripts]
mdtk-bookmarks = "mdtk.bookmarks:main"

[tool.pytest.ini_options]
testpaths = ["tests"]
EOF

Now let's create a simpler pytest-style test file:

cat > tests/test_bookmarks.py << 'EOF'
from pathlib import Path
import pytest
from mdtk.bookmarks import convert_bookmarks, BookmarkError

def test_basic_conversion(tmp_path):
    """Test basic bookmark conversion with default folder"""
    test_html = Path('test_bookmarks.html')
    test_md = tmp_path / "output.md"

    convert_bookmarks(test_html, test_md)

    content = test_md.read_text().strip().split('\n')
    assert len(content) == 2
    assert '- [GitHub - Where the world builds software](https://github.com)' in content
    assert '- [Welcome to Python.org](https://python.org)' in content

def test_nonexistent_folder(tmp_path):
    """Test error handling for non-existent folder"""
    test_html = Path('test_bookmarks.html')
    test_md = tmp_path / "output.md"

    with pytest.raises(BookmarkError, match="Folder 'NonExistent' not found"):
        convert_bookmarks(test_html, test_md, "NonExistent")
EOF

Would you like me to:

  1. Explain the pytest-specific features being used
  2. Show how to install and run the tests
  3. Add more test cases?

Since we're using uv and already have our venv activated, let's install the package with test dependencies and run the tests:

# Install package with test dependencies
uv pip install -e ".[test]"

# Run pytest
pytest

Ah, I see the issue. The test is failing because the test_bookmarks.html file isn't present in the GitHub Actions environment. We need to create the test file as part of the test setup. Let's modify our test file to create its own test data:

cat > tests/test_bookmarks.py << 'EOF'
from pathlib import Path
import pytest
from mdtk.bookmarks import convert_bookmarks, BookmarkError

@pytest.fixture
def test_bookmarks_file(tmp_path):
    """Create a test bookmarks file."""
    bookmarks_content = """<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL><p>
    <DT><H3>EXPORT_FOLDER</H3>
    <DL><p>
        <DT><A HREF="https://github.com">GitHub - Where the world builds software</A>
        <DT><A HREF="https://python.org">Welcome to Python.org</A>
    </DL><p>
    <DT><H3>Other Folder</H3>
    <DL><p>
        <DT><A HREF="https://example.com">Example Domain</A>
    </DL><p>
</DL><p>"""

    test_file = tmp_path / "test_bookmarks.html"
    test_file.write_text(bookmarks_content)
    return test_file

def test_basic_conversion(test_bookmarks_file, tmp_path):
    """Test basic bookmark conversion with default folder"""
    test_md = tmp_path / "output.md"

    convert_bookmarks(test_bookmarks_file, test_md)

    content = test_md.read_text().strip().split('\n')
    assert len(content) == 2
    assert '- [GitHub - Where the world builds software](https://github.com)' in content
    assert '- [Welcome to Python.org](https://python.org)' in content

def test_nonexistent_folder(test_bookmarks_file, tmp_path):
    """Test error handling for non-existent folder"""
    test_md = tmp_path / "output.md"

    with pytest.raises(BookmarkError, match="Folder 'NonExistent' not found"):
        convert_bookmarks(test_bookmarks_file, test_md, "NonExistent")
EOF

Key changes:

  1. Added a pytest fixture to create the test file
  2. Using tmp_path for all file operations
  3. Tests now use the fixture instead of expecting an external file

Would you like me to explain any part before we commit and try again?

Let's update our GitHub Actions workflow to include running the tests. We'll modify the ci.yml file:

cat > .github/workflows/ci.yml << 'EOF'
name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.8', '3.9', '3.10', '3.11', '3.12', '3.13']
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v5
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    - name: Install package with test dependencies
      run: pip install -e ".[test]"
    - name: Run tests
      run: pytest -v
EOF

Key changes:

  1. Renamed job from 'build' to 'test'
  2. Added installation of test dependencies
  3. Added pytest run with verbose output

Would you like to commit and push this to see it in action?

Here are the commands to commit and push the changes:

git add .github/workflows/ci.yml
git commit -m "Update CI workflow to run tests"
git push origin main

After pushing, we can check the Actions tab on GitHub to see the tests running across all Python versions. Would you like me to explain what to look for in the test results?

Ah yes, we need to install beautifulsoup4 as well since it's a dependency. Let's update the workflow file to explicitly install all dependencies:

cat > .github/workflows/ci.yml << 'EOF'
name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.8', '3.9', '3.10', '3.11', '3.12', '3.13']
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v5
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest beautifulsoup4
    - name: Install package
      run: pip install -e .
    - name: Run tests
      run: pytest -v
EOF

The key change is adding the explicit installation of pytest and beautifulsoup4 before installing our package. Would you like to commit and try again?

git add tests/test_bookmarks.py .github/workflows/ci.yml
git commit -m "Fix CI: Add test fixtures and explicit dependency installation"
git push origin main

Let's check both files to make sure they're properly synchronized. First, let's verify the module exports the function correctly. We need to make sure the function name and imports match exactly.

cat > src/mdtk/bookmarks.py << 'EOF'
"""Convert Chrome bookmarks to markdown format."""

from bs4 import BeautifulSoup
from pathlib import Path
from typing import Optional

class BookmarkError(Exception):
    """Base exception for bookmark conversion errors."""
    pass

def convert_bookmarks(input_file: Path, output_file: Path, folder_name: Optional[str] = "EXPORT_FOLDER") -> None:
    """Convert Chrome bookmarks HTML file to markdown format."""
    # Validate inputs
    if not isinstance(input_file, Path):
        input_file = Path(input_file)
    if not isinstance(output_file, Path):
        output_file = Path(output_file)

    # Check if input file exists and is readable
    if not input_file.exists():
        raise BookmarkError(f"Input file not found: {input_file}")
    if not input_file.is_file():
        raise BookmarkError(f"Not a file: {input_file}")

    # Check if output directory exists and is writable
    if not output_file.parent.exists():
        raise BookmarkError(f"Output directory does not exist: {output_file.parent}")

    try:
        with open(input_file, 'r', encoding='utf-8') as f:
            soup = BeautifulSoup(f, 'html.parser')
    except Exception as e:
        raise BookmarkError(f"Failed to parse HTML file: {e}")

    # Find target folder
    folders = soup.find_all('h3')
    target_folder = None
    for folder in folders:
        if folder.string == folder_name:
            target_folder = folder
            break

    if not target_folder:
        raise BookmarkError(f"Folder '{folder_name}' not found!")

    try:
        bookmarks_dl = target_folder.find_next('dl')
        if not bookmarks_dl:
            raise BookmarkError(f"No bookmarks found in folder '{folder_name}'")
        bookmarks = bookmarks_dl.find_all('a')

        with open(output_file, 'w', encoding='utf-8') as f:
            for bookmark in bookmarks:
                title = bookmark.string or "Untitled"
                url = bookmark.get('href', '')
                f.write(f"- [{title}]({url})\n")
    except Exception as e:
        raise BookmarkError(f"Failed to process bookmarks: {e}")

def main():
    """Command line interface."""
    import argparse
    import sys

    parser = argparse.ArgumentParser(
        description="Convert Chrome bookmarks to markdown format"
    )
    parser.add_argument(
        'input_file',
        help="Chrome bookmarks HTML file"
    )
    parser.add_argument(
        'output_file',
        help="Output markdown file"
    )
    parser.add_argument(
        '--folder',
        default="EXPORT_FOLDER",
        help="Folder name to extract (default: EXPORT_FOLDER)"
    )

    args = parser.parse_args()

    try:
        convert_bookmarks(args.input_file, args.output_file, args.folder)
    except BookmarkError as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)
    except Exception as e:
        print(f"Unexpected error: {e}", file=sys.stderr)
        sys.exit(1)

if __name__ == '__main__':
    main()
EOF

Would you like me to show the test file again to make sure they match exactly? Or shall we commit this and see if it fixes the issue?

Let's handle these changes systematically:

# First, let's add the important files
git add pyproject.toml src/mdtk/bookmarks.py

# Ignore files that shouldn't be in git
cat >> .gitignore << 'EOF'
output.md
test_bookmarks.html
src/mdtk/_version.py
EOF

git add .gitignore

# Add README and Setup if they contain meaningful changes
git add README.md Setup.md

# Commit and push
git commit -m "Update package structure and dependencies"
git push origin main

After this, would you like to:

  1. Review what's in the commit before pushing
  2. Push directly and check the CI results
  3. See what changes are in README.md and Setup.md first?