cjm-transcript-source-select

FastHTML source selection component for transcript decomposition workflows, with federated database browsing, drag-drop ordering, and keyboard navigation.

Install

pip install cjm_transcript_source_select

Project Structure

nbs/
├── components/ (6)
│   ├── helpers.ipynb          # Shared helper functions for the selection module
│   ├── local_files.ipynb      # Local files browser for importing external .db files
│   ├── preview_panel.ipynb    # Collapsible preview panel for displaying selected content
│   ├── selection_queue.ipynb  # Selection queue component with drag-drop reordering
│   ├── source_browser.ipynb   # Source browser components for displaying and filtering transcription sources
│   └── step_renderer.ipynb    # Phase 1 step renderer: Source Selection & Ordering with two-column layout and collapsible preview
├── routes/ (7)
│   ├── core.ipynb            # Selection step state management helpers
│   ├── filtering.ipynb       # Filtering, grouping, and keyboard navigation route handlers
│   ├── init.ipynb            # Router assembly for Phase 1 selection routes
│   ├── local_files.ipynb     # Local files browser route handlers
│   ├── queue.ipynb           # Selection queue route handlers for Phase 1
│   ├── source_browser.ipynb  # Source browser virtual collection router for Phase 1 selection
│   └── tabs.ipynb            # Tab switching route handlers
├── services/ (2)
│   ├── source.ipynb        # Source service for federated transcription queries via DuckDB
│   └── source_utils.ipynb  # Source record operations for metadata extraction, grouping, and validation
├── html_ids.ipynb  # HTML ID constants for Phase 1: Source Selection & Ordering
├── models.ipynb    # Data models and URL bundles for Phase 1: Source Selection & Ordering
└── utils.ipynb     # Display formatting and word counting utilities for the selection step

Total: 18 notebooks across 3 directories

Module Dependencies

graph LR
    components_helpers[components.helpers<br/>helpers]
    components_local_files[components.local_files<br/>local_files]
    components_preview_panel[components.preview_panel<br/>preview_panel]
    components_selection_queue[components.selection_queue<br/>selection_queue]
    components_source_browser[components.source_browser<br/>source_browser]
    components_step_renderer[components.step_renderer<br/>step_renderer]
    html_ids[html_ids<br/>html_ids]
    models[models<br/>models]
    routes_core[routes.core<br/>core]
    routes_filtering[routes.filtering<br/>filtering]
    routes_init[routes.init<br/>init]
    routes_local_files[routes.local_files<br/>local_files]
    routes_queue[routes.queue<br/>queue]
    routes_source_browser[routes.source_browser<br/>source_browser]
    routes_tabs[routes.tabs<br/>tabs]
    services_source[services.source<br/>source]
    services_source_utils[services.source_utils<br/>source_utils]
    utils[utils<br/>utils]

    components_helpers --> models
    components_local_files --> html_ids
    components_local_files --> components_helpers
    components_preview_panel --> html_ids
    components_source_browser --> utils
    components_source_browser --> services_source_utils
    components_source_browser --> html_ids
    components_step_renderer --> html_ids
    components_step_renderer --> components_source_browser
    components_step_renderer --> components_local_files
    components_step_renderer --> components_selection_queue
    components_step_renderer --> models
    components_step_renderer --> components_preview_panel
    components_step_renderer --> utils
    routes_core --> components_step_renderer
    routes_core --> models
    routes_core --> components_selection_queue
    routes_core --> services_source
    routes_core --> html_ids
    routes_filtering --> services_source_utils
    routes_filtering --> routes_core
    routes_filtering --> models
    routes_filtering --> services_source
    routes_init --> routes_queue
    routes_init --> routes_core
    routes_init --> routes_tabs
    routes_init --> routes_local_files
    routes_init --> models
    routes_init --> services_source
    routes_init --> routes_source_browser
    routes_init --> routes_filtering
    routes_local_files --> components_local_files
    routes_local_files --> models
    routes_local_files --> services_source
    routes_local_files --> routes_core
    routes_queue --> services_source_utils
    routes_queue --> routes_core
    routes_queue --> models
    routes_queue --> components_preview_panel
    routes_queue --> services_source
    routes_source_browser --> html_ids
    routes_source_browser --> components_source_browser
    routes_source_browser --> routes_core
    routes_source_browser --> models
    routes_source_browser --> components_preview_panel
    routes_source_browser --> services_source
    routes_source_browser --> services_source_utils
    routes_tabs --> routes_core
    routes_tabs --> models
    routes_tabs --> components_step_renderer
    routes_tabs --> services_source
    routes_tabs --> services_source_utils

52 cross-module dependencies detected

CLI Reference

No CLI commands found in this project.

Module Overview

Detailed documentation for each module in the project:

core (core.ipynb)

Selection step state management helpers

Import

from cjm_transcript_source_select.routes.core import (
    DEBUG_SELECTION_STATE,
    WorkflowStateStore
)

Functions

def _get_step_state(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    session_id: str  # Session identifier string
) -> Dict[str, Any]:  # Step state dictionary
    "Get the selection step state from the workflow state store."
def _find_duplicate_media_source(
    source_service: SourceService,  # Source service for lookups
    record_id: str,  # Candidate record ID
    provider_id: str,  # Candidate provider ID
    selected_sources: List[Dict[str, str]],  # Current selections
) -> Optional[Dict[str, str]]:  # Conflicting source dict or None
    "Find an already-selected source that shares the same audio file."
def _render_duplicate_flash(
    candidate_row_id: str,  # DOM element ID of the candidate row
    existing_row_id: Optional[str] = None,  # DOM element ID of the conflicting row (None if off-screen)
) -> Div:  # OOB Div with flash script
    "Render a flash animation on one or two rows to indicate duplicate rejection."
def _get_active_source_tab(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    session_id: str  # Session identifier string
) -> str:  # Active tab: "db" or "files"
    "Get the currently active source tab from workflow state."
def _build_queue_response(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for querying transcriptions
    session_id: str,  # Session identifier string
    selected_sources: List[Dict[str, str]],  # Current selected sources after mutation
    urls: SelectionUrls,  # URL bundle for rendering
    include_stats: bool = True,  # Include OOB stats swap
    include_checkbox_oobs: bool = True,  # Include OOB checkbox cells for visible rows
) -> Union[Any, Tuple]:  # Single component or tuple of components with OOB swaps
    "Build the standard response for queue-mutating handlers."
def _update_step_state(
    "Update the selection step state in the workflow state store."

Variables

DEBUG_SELECTION_STATE = False
_rebuild_and_render_ref: list
_sync_items_ref: list
_get_checkbox_oobs_ref: list
_get_checkbox_oob_for_ref: list
_get_vc_row_id_for_ref: list
_activate_toggle_ref: list

filtering (filtering.ipynb)

Filtering, grouping, and keyboard navigation route handlers

Import

from cjm_transcript_source_select.routes.filtering import (
    init_filtering_router
)

Functions

def _handle_source_filter(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    search: str,  # Search term from input
    urls: SelectionUrls,  # URL bundle for rendering
):  # VC content wrapper (direct swap, not OOB)
    "Filter transcription sources by search term."
def _handle_grouping_change(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    grouping_mode: str,  # New grouping mode: "media_path" or "batch_id"
    urls: SelectionUrls,  # URL bundle for rendering
):  # VC content wrapper (direct swap, not OOB)
    "Change the grouping mode and re-render the VC content."
def _handle_selection_toggle_focused(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    record_id: str,  # Job ID from focused row (via hx-include)
    provider_id: str,  # Plugin name from focused row (via hx-include)
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component with OOB stats, optionally with OOB source list
    "Toggle selection of the focused row (keyboard shortcut handler)."
def _handle_keyboard_reorder(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    record_id: str,  # Record ID of item to move
    provider_id: str,  # Provider ID of item to move
    direction: str,  # Direction to move: "up" or "down"
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component, optionally with OOB source list
    "Move an item up or down in the selection queue via keyboard."
def init_filtering_router(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    prefix: str,  # Route prefix (e.g., "/workflow/selection/filtering")
    urls: SelectionUrls,  # URL bundle for rendering
) -> Tuple[APIRouter, Dict[str, Callable]]:  # (router, route_dict)
    "Initialize filtering and keyboard navigation routes."

helpers (helpers.ipynb)

Shared helper functions for the selection module

Import

from cjm_transcript_source_select.components.helpers import *

Functions

def _get_selection_state(
    ctx: InteractionContext  # Interaction context with state
) -> SelectionStepState:  # Typed selection step state
    "Get the full selection step state from context."
def _get_selected_sources(
    ctx: InteractionContext  # Interaction context with state
) -> List[SelectedSource]:  # List of selected source dicts
    "Get the list of selected sources from step state."
def _get_grouping_mode(
    ctx: InteractionContext  # Interaction context with state
) -> str:  # Grouping mode: "media_path" or "batch_id"
    "Get the current grouping mode from step state."

html_ids (html_ids.ipynb)

HTML ID constants for Phase 1: Source Selection & Ordering

Import

from cjm_transcript_source_select.html_ids import (
    SelectionHtmlIds
)

Classes

class SelectionHtmlIds:
    "HTML ID constants for Phase 1: Source Selection & Ordering."
    
    def as_selector(
            id_str:str  # The HTML ID to convert
        ) -> str:  # CSS selector with # prefix
        "Convert an ID to a CSS selector format."
    
    def source_checkbox(
            record_id:str,  # Record identifier
            provider_id:str  # Provider identifier
        ) -> str:  # HTML ID for the source checkbox
        "Generate HTML ID for a source selection checkbox."
    
    def source_row(
            record_id:str,  # Record identifier
            provider_id:str  # Provider identifier
        ) -> str:  # HTML ID for the source row
        "Generate HTML ID for a source browser row."
    
    def queue_item(
            record_id:str,  # Record identifier
            provider_id:str  # Provider identifier
        ) -> str:  # HTML ID for the queue item
        "Generate HTML ID for a queue item."

init (init.ipynb)

Router assembly for Phase 1 selection routes

Import

from cjm_transcript_source_select.routes.init import (
    init_selection_routers
)

Functions

def init_selection_routers(
    state_store: WorkflowStateStore,  # The workflow state store
    source_service: SourceService,  # The source service for queries
    workflow_id: str,  # The workflow identifier
    prefix: str,  # Base prefix for selection routes (e.g., "/workflow/selection")
) -> SelectionResult:  # Selection router result with routers, urls, routes, and restore
    "Initialize and return all selection routers with URL bundle."

local_files (local_files.ipynb)

Local files browser for importing external .db files

Import

from cjm_transcript_source_select.components.local_files import *

Functions

def _get_external_db_paths(
    ctx: InteractionContext  # Interaction context with state
) -> List[str]:  # List of external database paths
    "Get the list of external database paths from step state."
def _get_current_browse_path(
    ctx: InteractionContext  # Interaction context with state
) -> str:  # Current browse path
    "Get the current browse path from step state."
def _get_file_browser_state(
    step_state: Dict[str, Any],  # Selection step state dictionary
    default_path: Optional[str] = None  # Default path if no state exists
) -> BrowserState:  # BrowserState for file browser
    "Get or create BrowserState from step state."
def _create_db_browser_config() -> FileBrowserConfig:  # Configured FileBrowserConfig for .db file selection
    "Create file browser config for .db file selection."
def _render_external_sources_list(
    external_paths: List[str],  # List of added external database paths
    remove_url: str,  # URL for removing external source
    oob: bool = False,  # Whether to render as OOB swap
) -> Any:  # External sources section component (always rendered for OOB targeting)
    "Render the list of added external database sources with scrollable paths."
def _render_error_alert(
    error_message: Optional[str] = None,  # Error message to display (None = clear)
    oob: bool = False,  # Whether to render as OOB swap
) -> Any:  # Error alert container (always present for OOB targeting)
    "Render the error alert container for the local files browser."
def _render_local_files_browser(
    render_fn: Optional[Callable] = None,  # FileBrowserRouters.render callable
    external_paths: Optional[List[str]] = None,  # List of added external database paths
    remove_url: str = "",  # URL for removing external source
    error_message: Optional[str] = None,  # Error message to display
) -> Any:  # Local files browser component
    "Render the local files browser for adding external .db files."

local_files (local_files.ipynb)

Local files browser route handlers

Import

from cjm_transcript_source_select.routes.local_files import (
    init_local_files_router
)

Functions

def _get_local_files_provider() -> LocalFileSystemProvider:
    """Get or create the local files provider singleton."""
    global _local_files_provider
    if _local_files_provider is None
    "Get or create the local files provider singleton."
def _handle_remove_external_source(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for external db ops
    sess,  # FastHTML session object
    db_path: str,  # Path to the .db file to remove
    external_db_paths_ref: List[str],  # Shared external paths list (mutated in place)
    fb_routers: FileBrowserRouters,  # File browser routers (for targeted OOB)
    remove_url: str,  # URL for remove button in external sources list
    urls: SelectionUrls,  # Full URL bundle for queue re-rendering
):  # Tuple of OOB elements (external sources list + checkbox cells + queue + stats)
    "Remove an external database source and clean up orphaned queue items."
def init_local_files_router(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for external db ops
    prefix: str,  # Route prefix (e.g., "/workflow/selection/local_files")
    urls: SelectionUrls,  # URL bundle for rendering
) -> LocalFilesResult:  # Router result with routers, routes, render, restore, and reset
    "Initialize local files browser routes with new file browser API."

Variables

_local_files_provider: Optional[LocalFileSystemProvider] = None

models (models.ipynb)

Data models and URL bundles for Phase 1: Source Selection & Ordering

Import

from cjm_transcript_source_select.models import (
    SelectionStepState,
    SelectionUrls,
    LocalFilesResult,
    SelectionResult
)

Functions

def _no_op_restore(session_id: str) -> None:
    """Default no-op for restore_state."""
    pass

def _no_op_reset() -> None
    "Default no-op for restore_state."
def _no_op_reset() -> None:
    """Default no-op for reset_state."""
    pass

@dataclass
class LocalFilesResult
    "Default no-op for reset_state."

Classes

class SelectionStepState(TypedDict):
    "State for Phase 1: Source Selection & Ordering."
@dataclass
class SelectionUrls:
    "URL bundle for Phase 1 selection route handlers and renderers."
    
    add: str = ''  # Add source to queue
    remove: str = ''  # Remove source from queue
    toggle: str = ''  # Toggle source selection (add/remove based on current state)
    reorder: str = ''  # Reorder queue items
    clear: str = ''  # Clear all from queue
    select_all: str = ''  # Select all in a group
    preview: str = ''  # Preview source content
    toggle_focused: str = ''  # Toggle focused row selection
    keyboard_reorder: str = ''  # Keyboard reorder (Shift+Up/Down)
    filter: str = ''  # Filter source list
    grouping_change: str = ''  # Change grouping mode
    browse_directory: str = ''  # Browse directory
    add_external: str = ''  # Add external .db source
    remove_external: str = ''  # Remove external .db source
    tab_switch: str = ''  # Switch source tabs
@dataclass
class LocalFilesResult:
    "Return type from init_local_files_router."
    
    routers: List[APIRouter]  # Routers to register (custom + file browser + VC)
    routes: Dict[str, Callable]  # Named route handlers
    render_panel: Callable  # (error_message?, session_id?) -> rendered panel
    restore_state: Callable = field(...)  # (session_id) -> None, restore persisted state
    reset_state: Callable = field(...)  # () -> None, reset in-memory caches
@dataclass
class SelectionResult:
    "Return type from init_selection_routers."
    
    routers: List[APIRouter]  # All selection routers to register
    urls: 'SelectionUrls' = field(...)  # URL bundle
    routes: Dict[str, Callable] = field(...)  # All named route handlers
    render_local_files_panel: Optional[Callable]  # Render fn for local files tab
    sb_state: Any  # SourceBrowserRouterState
    restore_state: Callable = field(...)  # (session_id) -> None, restore persisted state
    reset_state: Callable = field(...)  # () -> None, reset in-memory caches

preview_panel (preview_panel.ipynb)

Collapsible preview panel for displaying selected content

Import

from cjm_transcript_source_select.components.preview_panel import *

Functions

def _render_preview_panel(
    preview_record_id: Optional[str] = None,  # Job ID being previewed
    preview_text: Optional[str] = None,  # Text content to preview
    is_open: bool = False,  # Whether the collapse should be open
) -> Any:  # Preview panel component (collapsible, full-width)
    "Render the collapsible preview panel for displaying selected content."

queue (queue.ipynb)

Selection queue route handlers for Phase 1

Import

from cjm_transcript_source_select.routes.queue import (
    init_queue_router
)

Functions

def _handle_selection_toggle(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    record_id: str,  # Job ID to toggle
    provider_id: str,  # Plugin name for the source
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component with OOB stats (no checkbox OOBs -- checkbox already correct)
    "Toggle a source's selection state (add if absent, remove if present)."
def _handle_selection_add(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    record_id: str,  # Job ID to add
    provider_id: str,  # Plugin name for the source
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component with OOB stats and visible checkbox OOBs
    "Add a source to the selection queue."
def _handle_selection_remove(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    key: str,  # Item key (record_id) to remove
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component with OOB stats and visible checkbox OOBs
    "Remove a source from the selection queue by key."
async def _handle_selection_reorder(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    urls: SelectionUrls,  # URL bundle for rendering
):  # Updated queue component
    "Reorder items in the selection queue based on SortableJS result."
def _handle_selection_clear(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component with OOB stats, optionally with OOB source list
    "Clear all items from the selection queue."
def _handle_selection_select_all(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    group_key: str,  # Group key to select all transcriptions for
    grouping_mode: str,  # Current grouping mode: "media_path" or "batch_id"
    urls: SelectionUrls,  # URL bundle for rendering
):  # Queue component with OOB stats, optionally with OOB source list
    "Select all transcriptions for a given group, skipping duplicate audio sources."
def _handle_selection_preview(
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    record_id: str,  # Job ID to preview
    provider_id: str,  # Plugin name for the source
):  # Full preview panel component (collapsible, open with content)
    "Get preview panel for a selected source."
def init_queue_router(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    prefix: str,  # Route prefix (e.g., "/workflow/selection/queue")
    urls: SelectionUrls,  # URL bundle for rendering (populated after all routers created)
) -> Tuple[APIRouter, Dict[str, Callable]]:  # (router, route_dict)
    "Initialize queue management routes."

selection_queue (selection_queue.ipynb)

Selection queue component with drag-drop reordering

Import

from cjm_transcript_source_select.components.selection_queue import (
    SD_QUEUE_PREFIX,
    SD_QUEUE_CONFIG,
    SD_QUEUE_IDS
)

Functions

def _render_queue_content(
    item: dict,  # Source dict with record_id and provider_id
    index: int,  # 0-based position in queue
) -> Any:  # Custom content for the queue item
    "Render the job ID display as custom content for each queue item."
def _render_queue_empty() -> Any:  # Empty state element
    "Render the custom empty state for the source selection queue."
def _render_selection_queue(
    selected_sources: List[Dict[str, str]],  # List of selected sources in order
    remove_url: str,  # URL for removing from queue
    reorder_url: str,  # URL for reordering queue
    clear_url: str,  # URL for clearing all
) -> Any:  # Queue panel component
    "Render the selection queue panel via cjm-fasthtml-sortable-queue."

Variables

SD_QUEUE_PREFIX = 'sd'
SD_QUEUE_CONFIG
SD_QUEUE_IDS

source (source.ipynb)

Source service for federated transcription queries via DuckDB

Import

from cjm_transcript_source_select.services.source import (
    VALID_DB_EXTENSIONS,
    TranscriptionDBProvider,
    SourceService,
    validate_and_toggle_external_db
)

Functions

def validate_and_toggle_external_db(
    source_service: SourceService,  # Source service for duplicate detection
    path: str,  # Path to the .db file
    external_paths: List[str],  # Current external database paths
    valid_extensions: List[str] = None,  # Valid file extensions (default: VALID_DB_EXTENSIONS)
) -> Tuple[List[str], Optional[str]]:  # (updated_paths, error_message or None)
    "Validate and toggle an external database path in the external paths list."

Classes

class TranscriptionDBProvider:
    def __init__(
        self,
        db_path: str,  # Path to SQLite database file
        name: str,  # Display name for this provider
        provider_id: Optional[str] = None  # Unique ID (defaults to db_path)
    )
    "SourceProvider for transcription SQLite databases."
    
    def __init__(
            self,
            db_path: str,  # Path to SQLite database file
            name: str,  # Display name for this provider
            provider_id: Optional[str] = None  # Unique ID (defaults to db_path)
        )
        "Initialize provider for a transcription database."
    
    def provider_id(self) -> str:  # Unique identifier
            """Unique identifier for this provider instance."""
            return self._id
        
        @property
        def provider_name(self) -> str:  # Display name
        "Unique identifier for this provider instance."
    
    def provider_name(self) -> str:  # Display name
            """Human-readable name for display."""
            return self._name
        
        @property
        def provider_type(self) -> str:  # Provider category
        "Human-readable name for display."
    
    def provider_type(self) -> str:  # Provider category
            """Provider type category."""
            return "transcription_db"
        
        @property
        def db_path(self) -> Path:  # Database file path
        "Provider type category."
    
    def db_path(self) -> Path:  # Database file path
            """Path to the underlying database file."""
            return self._db_path
        
        def is_available(self) -> bool:  # Whether database exists and is accessible
        "Path to the underlying database file."
    
    def is_available(self) -> bool:  # Whether database exists and is accessible
            """Check if the database file exists and is accessible."""
            return self._db_path.exists() and self._db_path.suffix == '.db'
        
        def validate_schema(self) -> Tuple[bool, str]:  # (is_valid, error_message)
        "Check if the database file exists and is accessible."
    
    def validate_schema(self) -> Tuple[bool, str]:  # (is_valid, error_message)
            """Check if database has valid transcription schema."""
            if not self.is_available()
        "Check if database has valid transcription schema."
    
    def query_records(
            self,
            limit: int = 100  # Maximum records to return
        ) -> List[SourceRecord]:  # List of source records
        "Query transcription records from the database."
    
    def get_source_block(
            self,
            record_id: str  # Job ID to fetch
        ) -> Optional[SourceBlock]:  # SourceBlock or None if not found
        "Fetch a specific transcription as a SourceBlock."
    
    def from_plugin(
            cls,
            meta: PluginMeta  # Plugin metadata with manifest containing db_path
        ) -> Optional["TranscriptionDBProvider"]:  # Provider or None if no valid db_path
        "Create provider from plugin metadata."
    
    def from_external_path(
            cls,
            path: str  # Path to external database file
        ) -> Optional["TranscriptionDBProvider"]:  # Provider or None if path invalid
        "Create provider from an external database path."
class SourceService:
    def __init__(
        self,
        plugin_manager: PluginManager,  # Plugin manager for discovering plugin sources
        source_categories: List[str] = None,  # Plugin categories to query (default: ['transcription'])
        external_paths: List[str] = None  # External database paths
    )
    "Service for federated access to content sources via providers."
    
    def __init__(
            self,
            plugin_manager: PluginManager,  # Plugin manager for discovering plugin sources
            source_categories: List[str] = None,  # Plugin categories to query (default: ['transcription'])
            external_paths: List[str] = None  # External database paths
        )
        "Initialize the source service."
    
    def add_provider(
            self,
            provider: SourceProvider  # Provider instance to add
        ) -> bool:  # True if added, False if ID already exists
        "Add a source provider."
    
    def remove_provider(
            self,
            provider_id: str  # ID of provider to remove
        ) -> bool:  # True if removed, False if not found
        "Remove a source provider by ID."
    
    def get_provider(
            self,
            provider_id: str  # ID of provider to get
        ) -> Optional[SourceProvider]:  # Provider or None if not found
        "Get a provider by ID."
    
    def get_providers(self) -> List[SourceProvider]:  # List of all providers
            """Get all registered providers."""
            return list(self._providers.values())
        
        def get_provider_by_name(
            self,
            name: str  # Provider name to search for
        ) -> Optional[SourceProvider]:  # Provider or None if not found
        "Get all registered providers."
    
    def get_provider_by_name(
            self,
            name: str  # Provider name to search for
        ) -> Optional[SourceProvider]:  # Provider or None if not found
        "Find a provider by its display name."
    
    def has_provider_for_path(
            self,
            path: str  # Path to check
        ) -> Tuple[bool, Optional[str]]:  # (has_duplicate, existing_provider_name)
        "Check if any provider uses the same resolved database path."
    
    def add_plugin_providers(self) -> int:  # Number of providers added
            """Discover and add providers from loaded plugins."""
            added = 0
            for category in self._categories
        "Discover and add providers from loaded plugins."
    
    def set_external_paths(
            self,
            paths: List[str]  # List of external database paths to set
        ) -> None
        "Set external database paths (replaces existing external providers)."
    
    def add_external_path(
            self,
            path: str  # External database path to add
        ) -> bool:  # True if added, False if already exists or invalid
        "Add an external database as a provider."
    
    def remove_external_path(
            self,
            path: str  # External database path to remove
        ) -> bool:  # True if removed, False if not found
        "Remove an external database provider."
    
    def get_external_paths(self) -> List[str]:  # List of external database paths
            """Get list of external database paths."""
            paths = []
            for pid, provider in self._providers.items()
        "Get list of external database paths."
    
    def get_available_sources(self) -> List[Dict[str, Any]]:  # List of source info dicts
            """Get list of available sources (for UI display)."""
            # First ensure plugin providers are loaded
            self.add_plugin_providers()
            
            sources = []
            for provider in self._providers.values()
        "Get list of available sources (for UI display)."
    
    def query_transcriptions(
            self,
            provider_name: Optional[str] = None,  # Filter by provider name (None for all)
            limit: int = 100  # Maximum number of results per provider
        ) -> List[Dict[str, Any]]:  # List of transcription records
        "Query records from all providers (or a specific one)."
    
    def get_transcription_by_id(
            self,
            record_id: str,  # Record ID to fetch
            provider_id: str  # Provider ID that owns this record
        ) -> Optional[SourceBlock]:  # SourceBlock or None if not found
        "Get a specific transcription as a SourceBlock."
    
    def get_source_blocks(
            self,
            selections: List[Dict[str, str]]  # List of {record_id, provider_id} dicts
        ) -> List[SourceBlock]:  # Ordered list of SourceBlocks
        "Fetch multiple records as SourceBlocks in order."

Variables

VALID_DB_EXTENSIONS = [3 items]

source_browser (source_browser.ipynb)

Source browser components for displaying and filtering transcription sources

Import

from cjm_transcript_source_select.components.source_browser import (
    SOURCE_BROWSER_COLUMNS,
    SB_SYSTEM_ID,
    SourceBrowserItem,
    build_source_items,
    is_source_item_skippable,
    create_source_cell_renderer,
    render_source_empty
)

Functions

def _render_grouping_selector(
    grouping_mode: str,  # Current grouping mode: "media_path" or "batch_id"
    grouping_change_url: str,  # URL for changing grouping mode
) -> Any:  # Grouping selector component
    "Render the dropdown for selecting grouping mode."
def build_source_items(
    transcriptions: List[Dict[str, Any]],  # Available transcription records
    selected_sources: List[Dict[str, str]],  # Currently selected sources
    grouping_mode: str = "media_path",  # Grouping mode: "media_path" or "batch_id"
) -> List[SourceBrowserItem]:  # Flat list with interleaved headers and records
    "Build the items list for the source browser virtual collection."
def is_source_item_skippable(
    item: SourceBrowserItem,  # Item to check
) -> bool:  # True if item is a group header (cursor should skip)
    "Predicate for virtual collection is_skippable parameter."
def _render_header_cell(
    item: SourceBrowserItem,  # Header item
    ctx: CellRenderContext,  # Cell render context
    select_all_url: str = "",  # URL for selecting all in group
) -> Any:  # Cell content for a group header row
    "Render cell content for a group header item."
def _render_record_cell(
    item: SourceBrowserItem,  # Record item
    ctx: CellRenderContext,  # Cell render context
    toggle_url: str = "",  # URL for toggling source selection
) -> Any:  # Cell content for a data record row
    "Render cell content for a data record item."
def create_source_cell_renderer(
    toggle_url: str = "",  # URL for toggling source selection
    select_all_url: str = "",  # URL for selecting all in a group
) -> Callable:  # render_cell(item: SourceBrowserItem, ctx: CellRenderContext) -> Any
    "Create a render_cell callback for the source browser virtual collection."
def render_source_empty() -> Any:  # Empty state component
    "Render empty state when no transcription sources are available."
def _render_source_browser_vc_content(
    sb_state: Any,  # SourceBrowserRouterState from routes.source_browser
) -> Any:  # VC content wrapper (without search/grouping header)
    "Render the VC content portion of the source browser."
def _render_source_browser_vc(
    sb_state: Any,  # SourceBrowserRouterState from routes.source_browser
    filter_url: str = "",  # URL for filtering sources
    grouping_mode: str = "media_path",  # Current grouping mode
    grouping_change_url: str = "",  # URL for changing grouping mode
) -> Any:  # Source browser component with virtual collection
    "Render the full source browser panel (header + VC content)."

Classes

@dataclass
class SourceBrowserItem:
    "Item in the source browser virtual collection (header or record)."
    
    item_type: str  # "header" or "record"
    group_key: str = ''  # Group key (media_path or batch_id value)
    group_display: str = ''  # Formatted display text for group header
    group_count: int = 0  # Number of records in this group
    grouping_mode: str = ''  # Grouping mode used ("media_path" or "batch_id")
    record: Optional[Dict[str, Any]]  # Original transcription record dict
    is_selected: bool = False  # Whether currently in queue

Variables

SOURCE_BROWSER_COLUMNS
_SB_CONTENT_ID = 'sb-content'
_SB_VC_WRAPPER_ID = 'sb-vc-wrapper'
SB_SYSTEM_ID = 'sb-collection'

source_browser (source_browser.ipynb)

Source browser virtual collection router for Phase 1 selection

Import

from cjm_transcript_source_select.routes.source_browser import (
    SourceBrowserRouterState,
    init_source_browser_router
)

Functions

def init_source_browser_router(
    source_service: SourceService,  # Source service for querying transcriptions
    urls: SelectionUrls,  # URL bundle (toggle, select_all, filter, grouping_change)
    prefix: str = "/browser",  # Route prefix for VC routes
) -> SourceBrowserRouterState:  # Router state with all VC objects and helpers
    "Initialize the source browser virtual collection router."

Classes

@dataclass
class SourceBrowserRouterState:
    "Return value from init_source_browser_router."
    
    router: APIRouter  # VC routes (nav, focus, activate, sort, viewport)
    urls: VirtualCollectionUrls  # VC URL bundle
    ids: VirtualCollectionHtmlIds  # VC HTML IDs
    btn_ids: VirtualCollectionButtonIds  # VC keyboard button IDs
    config: VirtualCollectionConfig  # VC config
    state: VirtualCollectionState  # VC state (mutable)
    items: List[SourceBrowserItem]  # Shared items list (mutable)
    render_cell: Callable  # Cell render callback
    rebuild_and_render: Callable  # (transcriptions, selected_sources, grouping_mode, content_only) -> Div
    rebuild_items: Callable  # (transcriptions, selected_sources, grouping_mode) -> None
    sync_items_selection: Callable  # (selected_sources) -> None
    get_visible_checkbox_oobs: Callable  # () -> tuple of OOB elements
    get_checkbox_oob_for: Callable  # (record_id, provider_id) -> OOB element or None
    get_vc_row_id_for: Callable  # (record_id, provider_id) -> str or None

source_utils (source_utils.ipynb)

Source record operations for metadata extraction, grouping, and validation

Import

from cjm_transcript_source_select.services.source_utils import (
    extract_batch_id,
    extract_model_name,
    group_transcriptions,
    group_transcriptions_by_audio,
    is_source_selected,
    get_selected_media_paths,
    filter_transcriptions,
    select_all_in_group,
    toggle_source_selection,
    reorder_item,
    reorder_sources,
    calculate_next_tab,
    check_audio_exists,
    validate_browse_path
)

Functions

def extract_batch_id(
    metadata: Any  # Metadata dict or JSON string
) -> str:  # Batch ID or "No Batch ID"
    "Extract batch_id from transcription metadata."
def extract_model_name(
    metadata: Any  # Metadata dict or JSON string
) -> str:  # Formatted model name for display
    "Extract and format model name from transcription metadata."
def group_transcriptions(
    transcriptions: List[Dict[str, Any]],  # List of transcription records
    group_by: str = "media_path"  # Grouping mode: "media_path" or "batch_id"
) -> Dict[str, List[Dict[str, Any]]]:  # Grouped transcriptions
    "Group transcription records by the specified field."
def group_transcriptions_by_audio(
    transcriptions: List[Dict[str, Any]]  # List of transcription records
) -> Dict[str, List[Dict[str, Any]]]:  # Grouped by media_path
    "Group transcription records by their source audio file."
def is_source_selected(
    record_id: str,  # Job ID to check
    provider_id: str,  # Provider ID to check
    selected_sources: List[Dict[str, str]]  # List of selected sources
) -> bool:  # True if source is selected
    "Check if a source is in the selected list by (record_id, provider_id) pair."
def get_selected_media_paths(
    selected_sources: List[Dict[str, str]],  # Current selections (record_id, provider_id)
    all_transcriptions: List[Dict[str, Any]],  # All available transcription records
) -> Set[str]:  # Media paths already represented in selections
    "Get the set of media_paths for currently selected sources."
def filter_transcriptions(
    transcriptions: List[Dict[str, Any]],  # List of transcription records to filter
    search_text: str,  # Search term for case-insensitive substring matching
) -> List[Dict[str, Any]]:  # Filtered transcription records
    "Filter transcriptions by substring match across record_id, media_path, and text fields."
def select_all_in_group(
    transcriptions: List[Dict[str, Any]],  # All transcription records
    group_key: str,  # Group key to match against
    grouping_mode: str,  # Grouping mode: "media_path" or "batch_id"
    selected_sources: List[Dict[str, str]],  # Current selections
    excluded_media_paths: Optional[Set[str]] = None,  # Media paths to skip (already selected)
) -> List[Dict[str, str]]:  # Updated selections with new items appended
    "Add all transcriptions matching a group key to the selection list, skipping duplicates."
def toggle_source_selection(
    record_id: str,  # Job ID to toggle
    provider_id: str,  # Plugin name for the source
    selected_sources: List[Dict[str, str]],  # Current selections
) -> List[Dict[str, str]]:  # Updated selections
    "Toggle a source in or out of the selection list by (record_id, provider_id) pair."
def reorder_item(
    selected_sources: List[Dict[str, str]],  # Current selections
    record_id: str,  # Record ID of item to move
    provider_id: str,  # Provider ID of item to move
    direction: str,  # Direction: "up" or "down"
) -> List[Dict[str, str]]:  # Reordered selections
    "Move an item up or down in the selection list by swapping with its neighbor."
def reorder_sources(
    selected_sources: List[Dict[str, str]],  # Current selections
    new_order_ids: List[str],  # Job IDs in desired order
) -> List[Dict[str, str]]:  # Reordered selections
    "Reorder sources to match the given job ID order."
def calculate_next_tab(
    direction: str,  # Direction: "prev", "next", or a direct tab name
    current_tab: str,  # Currently active tab name
    tabs: List[str],  # Available tab names in order
) -> str:  # New active tab name
    "Calculate the next tab based on direction or direct selection."
def check_audio_exists(
    media_path: str  # Path to audio file
) -> bool:  # True if file exists
    "Check if the audio file exists at the given path."
def validate_browse_path(
    path: str  # Path to validate
) -> str:  # Validated and resolved path, or home directory on error
    "Validate a browse path for security. Returns home directory on invalid input."

step_renderer (step_renderer.ipynb)

Phase 1 step renderer: Source Selection & Ordering with two-column layout and collapsible preview

Import

from cjm_transcript_source_select.components.step_renderer import (
    SD_TAB_PREV_BTN,
    SD_TAB_NEXT_BTN,
    SD_PREVIEW_BTN,
    FB_SYSTEM_ID,
    render_selection_step
)

Functions

def _create_parent_keyboard_manager() -> ZoneManager:  # Parent keyboard manager for hierarchy
    "Create the parent keyboard manager with two ghost zones for column switching."
def _render_selection_stats(
    selected_sources: List[Dict[str, str]],  # Selected sources
    transcriptions: List[Dict[str, Any]],  # All transcriptions (for word count)
    oob: bool = False,  # Whether to render as OOB swap
) -> Any:  # Stats component
    "Render the selection statistics (word count and source count)."
def _render_selection_footer(
    selected_sources: List[Dict[str, str]],  # Selected sources
    transcriptions: List[Dict[str, Any]],  # All transcriptions (for word count)
) -> Any:  # Footer component
    "Render the footer with statistics and continue button."
def _render_tab_headers(
    active_tab: str,  # Currently active tab ('db' or 'files')
    tab_switch_url: str = "",  # URL for switching tabs via HTMX
    oob: bool = False,  # Whether to render as OOB swap
) -> Any:  # Tab headers container
    "Render the tab header radio inputs."
def _render_source_tabs(
    active_tab: str,  # Currently active tab ('db' or 'files')
    active_content: Any,  # Content for the currently active tab
    tab_switch_url: str = "",  # URL for switching tabs via HTMX
) -> Any:  # Tabs header + separate content container
    "Render source type tabs with a single shared content container."
def _generate_hierarchy_js(
    active_tab: str,  # Active tab: "db" or "files"
) -> Script:  # Script element with hierarchy wiring and activation logic
    "Generate JavaScript for keyboard system hierarchy and child activation."
def render_selection_step(
    sources: List[Dict[str, Any]],  # Available source plugins
    transcriptions: List[Dict[str, Any]],  # Available transcription records
    selected_sources: List[Dict[str, str]],  # Ordered selection
    grouping_mode: str,  # Grouping mode: "media_path" or "batch_id"
    active_tab: str,  # Active tab: "db" or "files"
    urls: SelectionUrls,  # URL bundle for selection routes
    render_local_files_panel: Optional[Callable] = None,  # Render fn for Files tab content
    sb_state: Any = None,  # SourceBrowserRouterState for DB tab VC rendering
) -> Any:  # FastHTML component
    "Render Phase 1: Source Selection & Ordering step with two-column layout."

Variables

SD_TAB_PREV_BTN = 'sd-tab-prev-btn'
SD_TAB_NEXT_BTN = 'sd-tab-next-btn'
SD_PREVIEW_BTN = 'sd-preview-btn'
FB_SYSTEM_ID = 'lfb-collection'
_ZONE_FOCUS_CLASSES
_VIEWPORT_FIT_CONFIG

tabs (tabs.ipynb)

Tab switching route handlers

Import

from cjm_transcript_source_select.routes.tabs import (
    init_tabs_router
)

Functions

def _handle_tab_switch(
    source_service: SourceService,  # The source service for queries
    request,  # FastHTML request object
    sess,  # FastHTML session object
    direction: str,  # Direction: "prev", "next", "db", or "files"
    urls: SelectionUrls,  # URL bundle for rendering
    current_tab_ref: List[str],  # Mutable ref [current_tab] for closure-based tracking
    render_local_files_panel: Optional[Callable] = None,  # Render fn for Files tab
    sb_state: Any = None,  # SourceBrowserRouterState for DB tab VC rendering
    state_store: WorkflowStateStore = None,  # State store (for reading step state)
    workflow_id: str = "",  # Workflow ID (for reading step state)
):  # Tuple of inner content, OOB tab headers, and tab switch script
    "Switch between Plugin DB and Local Files tabs."
def init_tabs_router(
    state_store: WorkflowStateStore,  # The workflow state store
    workflow_id: str,  # The workflow identifier
    source_service: SourceService,  # The source service for queries
    prefix: str,  # Route prefix (e.g., "/workflow/selection/tabs")
    urls: SelectionUrls,  # URL bundle for rendering
    render_local_files_panel: Optional[Callable] = None,  # Render fn for Files tab content
    sb_state: Any = None,  # SourceBrowserRouterState for DB tab VC rendering
) -> Tuple[APIRouter, Dict[str, Callable]]:  # (router, route_dict)
    "Initialize tab switching routes."

utils (utils.ipynb)

Display formatting and word counting utilities for the selection step

Import

from cjm_transcript_source_select.utils import (
    count_words,
    format_date,
    format_audio_filename
)

Functions

def count_words(
    text: str  # Text to count words in
) -> int:  # Word count
    "Count the number of whitespace-delimited words in text."
def format_date(
    created_at: str  # ISO date string, Unix timestamp, or similar
) -> str:  # Formatted date for display
    "Format a date string for human-readable display (e.g., 'Jan 20, 2026')."
def format_audio_filename(
    audio_path: str  # Full path to audio file
) -> str:  # Shortened filename for display
    "Extract and format the filename from a path."