Contributing Guide¶

Thank you for considering contributing to VLM OCR Pipeline! This guide will help you get started.

Development Setup¶

1. Fork and Clone¶

# Fork the repository on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/vlm-ocr-pipeline.git
cd vlm-ocr-pipeline

2. Set Up Development Environment¶

# Create virtual environment
uv venv --python 3.11 .venv
source .venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

# Install development dependencies
uv pip install pytest pytest-cov ruff pyright

# Run setup script
python setup.py

3. Create a Branch¶

git checkout -b feature/your-feature-name
# or
git checkout -b fix/your-bugfix-name

Code Quality Standards¶

Type Annotations¶

All functions and methods must have type annotations:

def process_blocks(
    self,
    image: np.ndarray,
    blocks: Sequence[Block]
) -> list[Block]:
    """Process blocks to extract text."""
    ...

Docstrings¶

Use Google-style docstrings for all public functions and classes:

def detect_layout(image: np.ndarray, confidence_threshold: float = 0.5) -> list[Block]:
    """Detect layout blocks in an image.

    Args:
        image: Input image as numpy array (H, W, C)
        confidence_threshold: Minimum confidence score for detection

    Returns:
        List of detected blocks with bounding boxes

    Raises:
        DetectionError: If detection fails

    Example:
        >>> detector = DocLayoutYOLO()
        >>> blocks = detector.detect(image, confidence_threshold=0.7)
        >>> len(blocks)
        15
    """

Code Style¶

We use ruff for linting and formatting:

# Format code
uv run ruff format .

# Check linting
uv run ruff check .

# Auto-fix linting issues
uv run ruff check . --fix

Configuration (ruff.toml): - Line length: 120 characters - Import order: isort compatible - First-party modules: ["pipeline", "models"]

Type Checking¶

We use pyright for static type checking:

# Type check entire project
npx pyright

# Type check specific files
npx pyright pipeline/__init__.py

Note: Use npx pyright, not global pyright

Testing¶

Running Tests¶

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_types.py

# Run with coverage
uv run pytest --cov=pipeline --cov-report=term-missing

# Run verbose
uv run pytest -v

Writing Tests¶

Place tests in tests/ directory with naming convention test_*.py:

def test_bbox_from_yolo():
    """Test BBox conversion from YOLO format."""
    bbox = BBox.from_yolo([0.5, 0.5, 0.3, 0.4], 1000, 800)
    assert bbox.x0 == 350
    assert bbox.y0 == 240
    assert bbox.x1 == 650
    assert bbox.y1 == 640

Coverage Goal: 90%+ for new code

Test Fixtures¶

Use fixtures in tests/fixtures/: - Sample images - Sample PDFs - Expected outputs

BBox Handling Rules¶

Critical: BBox Standards

Always use BBox class - Never use raw lists/tuples
Internal operations use xyxy - Access via bbox.x0, bbox.y0, bbox.x1, bbox.y1
JSON serialization uses xywh - Call bbox.to_xywh_list()
Accept floats, output integers - All BBox methods round to nearest integer
PyPDF requires page height - Use BBox.from_pypdf_rect(rect, page_height)

Adding New Components¶

Adding a Detector¶

Create detector file in pipeline/layout/detection/
Implement Detector protocol from pipeline/types.py
Register in create_detector() factory
Add validation rule in validate_combination() if needed
Write tests in tests/test_detectors.py

Example:

# pipeline/layout/detection/my_detector.py
from pipeline.types import Block, Detector

class MyDetector:
    """My custom detector implementation."""

    def detect(self, image: np.ndarray) -> list[Block]:
        """Detect layout blocks.

        Args:
            image: Input image (H, W, C)

        Returns:
            List of detected blocks
        """
        # Your detection logic
        return blocks

# pipeline/layout/detection/__init__.py
def create_detector(name: str, **kwargs) -> Detector:
    if name == "my-detector":
        from .my_detector import MyDetector  # noqa: PLC0415
        return MyDetector(**kwargs)
    ...

Adding a Sorter¶

Similar process in pipeline/layout/ordering/:

from pipeline.types import Block, Sorter

class MySorter:
    """My custom sorter implementation."""

    def sort(self, blocks: list[Block], image: np.ndarray, **kwargs) -> list[Block]:
        """Sort blocks in reading order.

        Args:
            blocks: Detected blocks
            image: Original image for context

        Returns:
            Sorted blocks with order field
        """
        # Your sorting logic
        return sorted_blocks

Adding Prompts¶

Place YAML prompts in settings/prompts/{model}/:

# settings/prompts/my-model/text_extraction.yaml
system: |
  You are an expert OCR system.

user: |
  Extract text from this image.
  Preserve formatting and structure.

fallback: |
  [OCR failed]

Error Handling¶

Follow the error handling policy (see Error Handling Guide):

Custom Exceptions¶

Use specific exception types from pipeline/exceptions.py:

from pipeline.exceptions import DetectionError, InvalidConfigError

if confidence < 0 or confidence > 1:
    raise InvalidConfigError(f"Confidence must be between 0 and 1, got {confidence}")

try:
    blocks = self.detector.detect(image)
except Exception as e:
    raise DetectionError(f"Detection failed: {e}") from e

Error Logging¶

Use proper logging with %s formatting (not f-strings):

import logging

logger = logging.getLogger(__name__)

# ✅ Good
logger.error("Failed to load file %s: %s", file_path, error)

# ❌ Bad
logger.error(f"Failed to load file {file_path}: {error}")

Add exc_info=True for unexpected errors:

except Exception as e:
    logger.error("Unexpected error: %s", e, exc_info=True)

Commit Guidelines¶

Commit Messages¶

Follow conventional commits:

feat: add new detector for layout analysis
fix: resolve type error in BBox conversion
docs: update installation guide
test: add tests for multi-column detection
refactor: simplify block sorting logic
perf: optimize image preprocessing

Before Committing¶

# Format code
uv run ruff format .

# Check linting
uv run ruff check .

# Run tests
uv run pytest

# Type check
npx pyright

Creating a Pull Request¶

Ensure all tests pass
Update documentation if needed
Add entry to CHANGELOG (if exists)
Create PR with clear description:

## Summary

Brief description of changes

## Changes

- Added feature X
- Fixed bug Y
- Updated documentation Z

## Testing

- Tested on Python 3.11
- All existing tests pass
- Added new tests for feature X

## Breaking Changes

None (or list if applicable)

Common Pitfalls¶

Avoid These Mistakes

Don't use bare except: Catch specific exceptions
Don't create empty __init__.py: Use PEP 420 namespace packages
Don't install with -e: Never use editable mode from external directories
Don't mix xywh/xyxy: Always convert via BBox methods
Don't forget page_height for PyPDF: Y-axis flip required

Documentation¶

Building Docs Locally¶

# Install MkDocs
uv pip install mkdocs mkdocs-material mkdocstrings[python]

# Serve docs locally
mkdocs serve

# Build docs
mkdocs build

Writing Docs¶

Use Markdown with admonitions
Include code examples
Add mermaid diagrams where helpful
Cross-reference related pages

Getting Help¶

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: This site

Code Review Process¶

All PRs require review
Address review comments
Keep PRs focused (one feature/fix per PR)
Maintain backward compatibility when possible
Update tests and docs

License¶

By contributing, you agree that your contributions will be licensed under the same license as the project.