Transcription Plugin Interface

Domain-specific plugin interface for audio transcription

TranscriptionPlugin


def TranscriptionPlugin(
    args:VAR_POSITIONAL, kwargs:VAR_KEYWORD
):

Abstract base class for all transcription plugins.

Extends PluginInterface with transcription-specific requirements: - supported_formats: List of audio file extensions this plugin can handle - execute: Accepts an audio file path (str or Path), returns TranscriptionResult

Input contract: plugins receive a path to a decodable audio file. Producing a model-ready file (format / sample-rate / channel normalization) is the caller’s responsibility — e.g. an upstream ffmpeg step in the orchestration pipeline — not the plugin’s. This keeps the interface library dependency-light (no audio I/O deps such as numpy/soundfile in the shared consumer environment).

How It Works

The Host submits an audio file path to the plugin; the Worker process reads the file from disk and runs inference:

Host / Orchestration                      Worker Process (Isolated Env)
┌─────────────────────────┐              ┌─────────────────────────────┐
│ # caller ensures the    │              │  TranscriptionPlugin        │
│ # audio file is model-  │   HTTP/JSON  │    .execute(                │
│ # ready (e.g. via an    │ ────────────▶│       audio="/tmp/seg.wav"  │
│ # upstream ffmpeg step) │  (path str)  │    )                        │
│ plugin.execute(         │              │  # reads file from disk     │
│   audio="/tmp/seg.wav") │              │  # runs inference           │
└─────────────────────────┘              └─────────────────────────────┘

Audio preparation (format conversion, resampling, channel downmix) is an upstream pipeline concern, not the plugin’s — keeping both the plugin and this interface library focused and dependency-light.

Example Implementation

A minimal transcription plugin that demonstrates the interface:

from typing import Any, Dict, Optional

class ExampleTranscriptionPlugin(TranscriptionPlugin):
    """Example implementation showing how to create a transcription plugin."""
    
    def __init__(self):
        self._config: Dict[str, Any] = {}
        self._model = None

    @property
    def name(self) -> str:
        return "example-transcription"
    
    @property
    def version(self) -> str:
        return "1.0.0"
    
    @property
    def supported_formats(self) -> List[str]:
        return ["wav", "mp3", "flac"]

    def initialize(self, config: Optional[Dict[str, Any]] = None) -> None:
        """Initialize with configuration."""
        self._config = config or {"model": "base"}
        self._model = f"MockModel-{self._config.get('model', 'base')}"

    def execute(
        self,
        audio: Union[str, Path],
        **kwargs
    ) -> TranscriptionResult:
        """Transcribe audio from a file path."""
        audio_path = str(audio)
        
        return TranscriptionResult(
            text=f"Transcribed from {audio_path}",
            confidence=0.95,
            segments=[{"start": 0.0, "end": 1.0, "text": "Mock transcription"}],
            metadata={"model": self._config.get("model")}
        )

    def get_config_schema(self) -> Dict[str, Any]:
        """Return JSON Schema for configuration."""
        return {
            "type": "object",
            "properties": {
                "model": {
                    "type": "string",
                    "enum": ["tiny", "base", "small", "medium", "large"],
                    "default": "base"
                },
                "language": {
                    "type": "string",
                    "default": "en"
                }
            }
        }

    def get_current_config(self) -> Dict[str, Any]:
        """Return current configuration."""
        return self._config

    def cleanup(self) -> None:
        """Clean up resources."""
        self._model = None
# Test the example plugin
plugin = ExampleTranscriptionPlugin()
plugin.initialize({"model": "large", "language": "en"})

print(f"Plugin: {plugin.name} v{plugin.version}")
print(f"Supported formats: {plugin.supported_formats}")
print(f"Config schema: {plugin.get_config_schema()}")
print(f"Current config: {plugin.get_current_config()}")

# Test execution with a file path (as Worker would receive)
result = plugin.execute("/tmp/audio.wav")
print(f"\nResult: {result}")

# Cleanup
plugin.cleanup()
Plugin: example-transcription v1.0.0
Supported formats: ['wav', 'mp3', 'flac']
Config schema: {'type': 'object', 'properties': {'model': {'type': 'string', 'enum': ['tiny', 'base', 'small', 'medium', 'large'], 'default': 'base'}, 'language': {'type': 'string', 'default': 'en'}}}
Current config: {'model': 'large', 'language': 'en'}

Result: TranscriptionResult(text='Transcribed from /tmp/audio.wav', confidence=0.95, segments=[{'start': 0.0, 'end': 1.0, 'text': 'Mock transcription'}], metadata={'model': 'large'})