# cjm-text-plugin-system


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

``` bash
pip install cjm_text_plugin_system
```

## Project Structure

    nbs/
    ├── core.ipynb             # DTOs for text processing with character-level span tracking
    ├── plugin_interface.ipynb # Domain-specific plugin interface for text processing operations
    └── storage.ipynb          # Standardized SQLite storage for text processing results with content hashing

Total: 3 notebooks

## Module Dependencies

``` mermaid
graph LR
    core[core<br/>Core Data Structures]
    plugin_interface[plugin_interface<br/>Text Processing Plugin Interface]
    storage[storage<br/>Text Processing Storage]

    plugin_interface --> core
```

*1 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Core Data Structures (`core.ipynb`)

> DTOs for text processing with character-level span tracking

#### Import

``` python
from cjm_text_plugin_system.core import (
    TextSpan,
    TextProcessResult
)
```

#### Classes

``` python
@dataclass
class TextSpan:
    "Represents a segment of text with its original character coordinates."
    
    text: str  # The text content of this span
    start_char: int  # 0-indexed start position in original string
    end_char: int  # 0-indexed end position (exclusive)
    label: str = 'sentence'  # Span type: 'sentence', 'token', 'paragraph', etc.
    metadata: Dict[str, Any] = field(...)  # Additional span metadata
    
    def to_dict(self) -> Dict[str, Any]:  # Dictionary representation
        "Convert span to dictionary for serialization."
```

``` python
@dataclass
class TextProcessResult:
    "Container for text processing results."
    
    spans: List[TextSpan]  # List of text spans from processing
    metadata: Dict[str, Any] = field(...)  # Processing metadata
```

### Text Processing Plugin Interface (`plugin_interface.ipynb`)

> Domain-specific plugin interface for text processing operations

#### Import

``` python
from cjm_text_plugin_system.plugin_interface import (
    TextProcessingPlugin
)
```

#### Classes

``` python
class TextProcessingPlugin(PluginInterface):
    """
    Abstract base class for plugins that perform NLP operations.
    
    Extends PluginInterface with text processing requirements:
    - `execute`: Dispatch method for different text operations
    - `split_sentences`: Split text into sentence spans with character positions
    """
    
    def execute(
            self,
            action: str = "split_sentences",  # Operation to perform: 'split_sentences', 'tokenize', etc.
            **kwargs
        ) -> Dict[str, Any]:  # JSON-serializable result
        "Execute a text processing operation."
    
    def split_sentences(
            self,
            text: str,  # Input text to split
            **kwargs
        ) -> TextProcessResult:  # Result with TextSpan objects containing character indices
        "Split text into sentence spans with accurate character positions."
```

### Text Processing Storage (`storage.ipynb`)

> Standardized SQLite storage for text processing results with content
> hashing

#### Import

``` python
from cjm_text_plugin_system.storage import (
    TextProcessRow,
    TextProcessStorage
)
```

#### Classes

``` python
@dataclass
class TextProcessRow:
    "A single row from the text_jobs table."
    
    job_id: str  # Unique job identifier
    input_text: str  # Original input text
    input_hash: str  # Hash of input text in "algo:hexdigest" format
    spans: Optional[List[Dict[str, Any]]]  # Processed text spans
    metadata: Optional[Dict[str, Any]]  # Processing metadata
    created_at: Optional[float]  # Unix timestamp
```

``` python
class TextProcessStorage:
    def __init__(
        self,
        db_path: str  # Absolute path to the SQLite database file
    )
    "Standardized SQLite storage for text processing results."
    
    def __init__(
            self,
            db_path: str  # Absolute path to the SQLite database file
        )
        "Initialize storage and create table if needed."
    
    def save(
            self,
            job_id: str,       # Unique job identifier
            input_text: str,   # Original input text
            input_hash: str,   # Hash of input text in "algo:hexdigest" format
            spans: Optional[List[Dict[str, Any]]] = None,  # Processed text spans
            metadata: Optional[Dict[str, Any]] = None       # Processing metadata
        ) -> None
        "Save a text processing result to the database."
    
    def get_by_job_id(
            self,
            job_id: str  # Job identifier to look up
        ) -> Optional[TextProcessRow]:  # Row or None if not found
        "Retrieve a text processing result by job ID."
    
    def list_jobs(
            self,
            limit: int = 100  # Maximum number of rows to return
        ) -> List[TextProcessRow]:  # List of text processing rows
        "List text processing jobs ordered by creation time (newest first)."
    
    def verify_input(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if input matches, False if changed, None if not found
        "Verify the stored input text still matches its hash."
```
