Standardized SQLite storage for forced alignment results with content hashing
ForcedAlignmentRow
A dataclass representing a single row in the standardized forced alignments table. This provides a type-safe way to work with stored forced alignment results.
Row: job_id=fa_job_abc123, text=Hello world
Audio hash: sha256:aaaaaaaaaaaaa...
Text hash: sha256:bbbbbbbbbbbbb...
ForcedAlignmentStorage
Standardized SQLite storage that all forced alignment plugins should use. Defines the canonical schema for the forced_alignments table with content hash columns for traceability.
Schema:
CREATETABLEIFNOTEXISTS forced_alignments (idINTEGERPRIMARYKEY AUTOINCREMENT, job_id TEXT UNIQUENOTNULL, audio_path TEXT NOTNULL, audio_hash TEXT NOTNULL, text TEXT NOTNULL, text_hash TEXT NOTNULL, items JSON, metadata JSON, created_at REALNOTNULL);
The audio_hash and text_hash columns use the self-describing "algo:hexdigest" format (e.g., "sha256:a3f2b8..."), enabling downstream consumers to verify content integrity.
Retrieved: fa_test_001
Text: November the 10th, Wednesday, 9 p.m.
Items: 6 words
First item: {'text': 'November', 'start_time': 1.04, 'end_time': 1.6}
Created at: 1773970556.1129797
# Save another and test list_jobsstorage.save( job_id="fa_test_002", audio_path="/tmp/test_audio_2.mp3", audio_hash="sha256:"+"f"*64, text="Second alignment test.", text_hash=hash_bytes(b"Second alignment test."), items=[{"text": "Second", "start_time": 0.0, "end_time": 0.5}],)jobs = storage.list_jobs()assertlen(jobs) ==2# Newest firstassert jobs[0].job_id =="fa_test_002"assert jobs[1].job_id =="fa_test_001"print(f"list_jobs returned {len(jobs)} rows (newest first): {[j.job_id for j in jobs]}")
list_jobs returned 2 rows (newest first): ['fa_test_002', 'fa_test_001']
# Test text verificationassert storage.verify_text("fa_test_001") ==Trueprint("verify_text with unchanged text: True")# Tamper with text directly in DBwith sqlite3.connect(tmp_db.name) as con: con.execute("UPDATE forced_alignments SET text = 'TAMPERED' WHERE job_id = 'fa_test_001'")assert storage.verify_text("fa_test_001") ==Falseprint("verify_text after tampering: False")# Missing job returns Noneassert storage.verify_text("nonexistent") isNoneprint("verify_text for missing job: None")
verify_text with unchanged text: True
verify_text after tampering: False
verify_text for missing job: None