# Test TextProcessRow creationrow = TextProcessRow( job_id="job_abc123", input_text="Hello world. How are you?", input_hash="sha256:"+"a"*64, spans=[ {"text": "Hello world.", "start_char": 0, "end_char": 12, "label": "sentence"}, {"text": "How are you?", "start_char": 13, "end_char": 25, "label": "sentence"} ], metadata={"processor": "nltk"})print(f"Row: job_id={row.job_id}")print(f"Input: {row.input_text}")print(f"Spans: {len(row.spans)} spans")
Row: job_id=job_abc123
Input: Hello world. How are you?
Spans: 2 spans
TextProcessStorage
Standardized SQLite storage that all text processing plugins should use. Defines the canonical schema for the text_jobs table with input hashing for traceability.
Schema:
CREATETABLEIFNOTEXISTS text_jobs (idINTEGERPRIMARYKEY AUTOINCREMENT, job_id TEXT UNIQUENOTNULL, input_text TEXT NOTNULL, input_hash TEXT NOTNULL, spans JSON, metadata JSON, created_at REALNOTNULL);
The input_hash column stores a hash of the input text in "algo:hexdigest" format, enabling downstream consumers to verify that the source text hasn’t changed since processing.
TextProcessStorage
def TextProcessStorage( db_path:str, # Absolute path to the SQLite database file):
Standardized SQLite storage for text processing results.
Retrieved: job_test_001
Input: Hello world. How are you?
Spans: 2 spans
Input hash: sha256:1d473b202b6fea30ab890b1...
get_by_job_id returns None for missing job: OK
# Save another and test list_jobsstorage.save( job_id="job_test_002", input_text="Second text.", input_hash=hash_bytes(b"Second text."), spans=[{"text": "Second text.", "start_char": 0, "end_char": 12, "label": "sentence"}])jobs = storage.list_jobs()assertlen(jobs) ==2assert jobs[0].job_id =="job_test_002"# Newest firstprint(f"list_jobs returned {len(jobs)} rows: {[j.job_id for j in jobs]}")
list_jobs returned 2 rows: ['job_test_002', 'job_test_001']
# Test input verificationassert storage.verify_input("job_test_001") ==Trueprint("verify_input with unchanged text: True")# Tamper with input text directly in DBwith sqlite3.connect(tmp_db.name) as con: con.execute("UPDATE text_jobs SET input_text = 'TAMPERED' WHERE job_id = 'job_test_001'")assert storage.verify_input("job_test_001") ==Falseprint("verify_input after tampering: False")# Missing job returns Noneassert storage.verify_input("nonexistent") isNoneprint("verify_input for missing job: None")
verify_input with unchanged text: True
verify_input after tampering: False
verify_input for missing job: None