cjm-transcription-plugin-system
A flexible plugin system for audio transcription intended to make it easy to add support for multiple backends.
Install
pip install cjm_transcription_plugin_systemProject Structure
nbs/
├── core.ipynb # Standardized result DTO for audio transcription plugins
├── forced_alignment_core.ipynb # Data structures for word-level forced alignment results
├── forced_alignment_interface.ipynb # Domain-specific plugin interface for word-level audio-text alignment
├── forced_alignment_storage.ipynb # Standardized SQLite storage for forced alignment results with content hashing
├── plugin_interface.ipynb # Domain-specific plugin interface for audio transcription
└── storage.ipynb # Standardized SQLite storage for transcription results with content hashing
Total: 6 notebooks
Module Dependencies
graph LR
core["core<br/>Core Data Structures"]
forced_alignment_core["forced_alignment_core<br/>Forced Alignment Core"]
forced_alignment_interface["forced_alignment_interface<br/>Forced Alignment Plugin Interface"]
forced_alignment_storage["forced_alignment_storage<br/>Forced Alignment Storage"]
plugin_interface["plugin_interface<br/>Transcription Plugin Interface"]
storage["storage<br/>Transcription Storage"]
forced_alignment_interface --> forced_alignment_core
plugin_interface --> core
2 cross-module dependencies detected
CLI Reference
No CLI commands found in this project.
Module Overview
Detailed documentation for each module in the project:
Core Data Structures (core.ipynb)
Standardized result DTO for audio transcription plugins
Import
from cjm_transcription_plugin_system.core import (
TranscriptionResult
)Classes
@dataclass
class TranscriptionResult:
"Standardized output for all transcription plugins."
text: str # The transcribed text
confidence: Optional[float] # Overall confidence (0.0 to 1.0)
segments: Optional[List[Dict[str, Any]]] # Timestamped segments
metadata: Dict[str, Any] = field(...) # Additional metadataForced Alignment Core (forced_alignment_core.ipynb)
Data structures for word-level forced alignment results
Import
from cjm_transcription_plugin_system.forced_alignment_core import (
ForcedAlignItem,
ForcedAlignResult
)Classes
@dataclass
class ForcedAlignItem:
"A single word-level alignment result."
text: str # The aligned word (punctuation typically stripped by model)
start_time: float # Start time in seconds
end_time: float # End time in seconds@dataclass
class ForcedAlignResult:
"Standardized output for all forced alignment plugins."
items: List[ForcedAlignItem] # Word-level alignments
metadata: Dict[str, Any] = field(...) # Plugin-specific metadataForced Alignment Plugin Interface (forced_alignment_interface.ipynb)
Domain-specific plugin interface for word-level audio-text alignment
Import
from cjm_transcription_plugin_system.forced_alignment_interface import (
ForcedAlignmentPlugin
)Classes
class ForcedAlignmentPlugin(PluginInterface):
"""
Abstract base class for all forced alignment plugins.
Extends PluginInterface with forced-alignment-specific requirements:
- `supported_formats`: List of audio file extensions this plugin can handle
- `execute`: Accepts an audio file path and transcript text, returns ForcedAlignResult
Input contract: plugins receive a path to a decodable audio file. Producing a
model-ready file (format / sample-rate / channel normalization) is the caller's
responsibility — e.g. an upstream ffmpeg step in the orchestration pipeline —
not the plugin's. This keeps the interface library dependency-light.
"""
def supported_formats(self) -> List[str]: # e.g., ['wav', 'mp3', 'flac']
"""List of supported audio file extensions (without the dot)."""
...
@abstractmethod
def execute(
self,
audio: Union[str, Path], # Path to a decodable audio file
text: str, # Transcript text to align against
**kwargs
) -> ForcedAlignResult: # Word-level alignment result
"List of supported audio file extensions (without the dot)."
def execute(
self,
audio: Union[str, Path], # Path to a decodable audio file
text: str, # Transcript text to align against
**kwargs
) -> ForcedAlignResult: # Word-level alignment result
"Perform forced alignment of text against audio.
`audio` is a path to a decodable audio file; the caller guarantees it is in
a form the plugin/model can consume."Forced Alignment Storage (forced_alignment_storage.ipynb)
Standardized SQLite storage for forced alignment results with content hashing
Import
from cjm_transcription_plugin_system.forced_alignment_storage import (
ForcedAlignmentRow,
ForcedAlignmentStorage
)Classes
@dataclass
class ForcedAlignmentRow:
"A single row from the forced_alignments table."
job_id: str # Unique job identifier
audio_path: str # Path to the source audio file
audio_hash: str # Hash of source audio in "algo:hexdigest" format
text: str # Input transcript text that was aligned
text_hash: str # Hash of input text in "algo:hexdigest" format
config_hash: str # Hash of the effective alignment config used
items: Optional[List[Dict[str, Any]]] # Serialized ForcedAlignItems
metadata: Optional[Dict[str, Any]] # Plugin metadata
created_at: Optional[float] # Unix timestampclass ForcedAlignmentStorage:
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Standardized SQLite storage for forced alignment results."
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Initialize storage, create table, run migrations, and build indexes."
def save(
self,
job_id: str, # Unique job identifier
audio_path: str, # Path to the source audio file
audio_hash: str, # Hash of source audio in "algo:hexdigest" format
text: str, # Input transcript text
text_hash: str, # Hash of input text in "algo:hexdigest" format
config_hash: str, # Hash of the effective alignment config
items: Optional[List[Dict[str, Any]]] = None, # Serialized ForcedAlignItems
metadata: Optional[Dict[str, Any]] = None # Plugin metadata
) -> None
"Save or replace a forced alignment result (upsert by audio_path + text_hash + config_hash)."
def save_with_logging(
self,
*,
job_id: str, # Unique job identifier
audio_path: str, # Path to the source audio file
audio_hash: str, # Hash of source audio in "algo:hexdigest" format
text: str, # Input transcript text
text_hash: str, # Hash of input text in "algo:hexdigest" format
config_hash: str, # Hash of the effective alignment config
items: Optional[List[Dict[str, Any]]] = None, # Serialized ForcedAlignItems
metadata: Optional[Dict[str, Any]] = None, # Plugin metadata
logger: Optional[logging.Logger] = None # Optional logger for success/failure messages
) -> bool: # True if saved; False if the save failed (error logged, not raised)
"Save a result, logging success/failure. Failures are logged and swallowed (returns False).
Centralizes the try/save/log/except block every forced-alignment plugin reimplements.
Returns True on success so callers can gate post-save side effects on the result."
def get_cached(
self,
audio_path: str, # Path to the source audio file
audio_hash: str, # Content hash of the audio (cache miss if the file changed)
text_hash: str, # Hash of the input transcript text (part of the cache key)
config_hash: str # Hash of the effective alignment config
) -> Optional[ForcedAlignmentRow]: # Cached row or None
"Retrieve a content-correct cached alignment for an (audio, transcript) pair.
Matches on audio_path + audio_hash + text_hash + config_hash. A changed audio
file (new audio_hash) misses even if a stale row exists at the same
(audio_path, text_hash, config_hash) — the next save() replaces it."
def get_by_job_id(
self,
job_id: str # Job identifier to look up
) -> Optional[ForcedAlignmentRow]: # Row or None if not found
"Retrieve a forced alignment result by job ID."
def list_jobs(
self,
limit: int = 100 # Maximum number of rows to return
) -> List[ForcedAlignmentRow]: # List of forced alignment rows
"List forced alignment jobs ordered by creation time (newest first)."
def verify_audio(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if audio matches, False if tampered, None if job not found
"Verify the source audio file still matches its stored hash."
def verify_text(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if text matches, False if tampered, None if job not found
"Verify the input text still matches its stored hash."Transcription Plugin Interface (plugin_interface.ipynb)
Domain-specific plugin interface for audio transcription
Import
from cjm_transcription_plugin_system.plugin_interface import (
TranscriptionPlugin
)Classes
class TranscriptionPlugin(PluginInterface):
"""
Abstract base class for all transcription plugins.
Extends PluginInterface with transcription-specific requirements:
- `supported_formats`: List of audio file extensions this plugin can handle
- `execute`: Accepts an audio file path (str or Path), returns TranscriptionResult
Input contract: plugins receive a path to a decodable audio file. Producing a
model-ready file (format / sample-rate / channel normalization) is the caller's
responsibility — e.g. an upstream ffmpeg step in the orchestration pipeline —
not the plugin's. This keeps the interface library dependency-light (no audio
I/O deps such as numpy/soundfile in the shared consumer environment).
"""
def supported_formats(self) -> List[str]: # e.g., ['wav', 'mp3', 'flac']
"""List of supported audio file extensions (without the dot)."""
...
@abstractmethod
def execute(
self,
audio: Union[str, Path], # Path to a decodable audio file
**kwargs
) -> TranscriptionResult: # Transcription result with text, confidence, segments
"List of supported audio file extensions (without the dot)."
def execute(
self,
audio: Union[str, Path], # Path to a decodable audio file
**kwargs
) -> TranscriptionResult: # Transcription result with text, confidence, segments
"Transcribe audio to text.
`audio` is a path to a decodable audio file; the caller guarantees it is in
a form the plugin/model can consume."Transcription Storage (storage.ipynb)
Standardized SQLite storage for transcription results with content hashing
Import
from cjm_transcription_plugin_system.storage import (
TranscriptionRow,
TranscriptionStorage
)Classes
@dataclass
class TranscriptionRow:
"A single row from the transcriptions table."
job_id: str # Unique job identifier
audio_path: str # Path to the source audio file
audio_hash: str # Hash of source audio in "algo:hexdigest" format
config_hash: str # Hash of the effective transcription config used
text: str # Transcribed text output
text_hash: str # Hash of transcribed text in "algo:hexdigest" format
segments: Optional[List[Dict[str, Any]]] # Timestamped segments
metadata: Optional[Dict[str, Any]] # Plugin metadata
created_at: Optional[float] # Unix timestampclass TranscriptionStorage:
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Standardized SQLite storage for transcription results."
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Initialize storage, create table, run migrations, and build indexes."
def save(
self,
job_id: str, # Unique job identifier
audio_path: str, # Path to the source audio file
audio_hash: str, # Hash of source audio in "algo:hexdigest" format
config_hash: str, # Hash of the effective transcription config
text: str, # Transcribed text output
text_hash: str, # Hash of transcribed text in "algo:hexdigest" format
segments: Optional[List[Dict[str, Any]]] = None, # Timestamped segments
metadata: Optional[Dict[str, Any]] = None # Plugin metadata
) -> None
"Save or replace a transcription result (upsert by audio_path + config_hash)."
def save_with_logging(
self,
*,
job_id: str, # Unique job identifier
audio_path: str, # Path to the source audio file
audio_hash: str, # Hash of source audio in "algo:hexdigest" format
config_hash: str, # Hash of the effective transcription config
text: str, # Transcribed text output
text_hash: str, # Hash of transcribed text in "algo:hexdigest" format
segments: Optional[List[Dict[str, Any]]] = None, # Timestamped segments
metadata: Optional[Dict[str, Any]] = None, # Plugin metadata
logger: Optional[logging.Logger] = None # Optional logger for success/failure messages
) -> bool: # True if saved; False if the save failed (error logged, not raised)
"Save a result, logging success/failure. Failures are logged and swallowed (returns False).
Centralizes the try/save/log/except block every transcription plugin reimplements.
Returns True on success so callers can gate post-save side effects on the result."
def get_cached(
self,
audio_path: str, # Path to the source audio file
audio_hash: str, # Content hash of the audio (cache miss if the file changed)
config_hash: str # Hash of the effective transcription config
) -> Optional[TranscriptionRow]: # Cached row or None
"Retrieve a content-correct cached transcription result.
Matches on audio_path + audio_hash + config_hash. A changed audio file
(new audio_hash) misses even if a stale row exists at the same
(audio_path, config_hash) — the next save() replaces it."
def get_by_job_id(
self,
job_id: str # Job identifier to look up
) -> Optional[TranscriptionRow]: # Row or None if not found
"Retrieve a transcription result by job ID."
def list_jobs(
self,
limit: int = 100 # Maximum number of rows to return
) -> List[TranscriptionRow]: # List of transcription rows
"List transcription jobs ordered by creation time (newest first)."
def verify_audio(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if audio matches, False if tampered, None if job not found
"Verify the source audio file still matches its stored hash."
def verify_text(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if text matches, False if tampered, None if job not found
"Verify the transcription text still matches its stored hash."