Platform

Batch Upload

Process content in bulk for governance, compliance, or performance optimization. Upload historical records or offload real-time extraction to an async pipeline.

When to Use Batch

Retroactive Governance

Process historical content that predates your DeepaData integration. With permission, existing records become governed artifacts with full audit trails.

  • Years of interaction history → governed artifacts
  • EU AI Act compliance for historical data
  • Populate Observe dashboards from day one
Async Processing

Offload extraction from the real-time request path. Upload content in batches for better performance, lower latency, and energy efficiency.

  • Non-blocking extraction for high-volume apps
  • Process during off-peak hours
  • Reduce API call overhead with bulk operations

Key Benefits

Day-1 Provenance

Don't wait to accumulate governed data. Process your historical corpus and have audit-ready records immediately.

Populate Observe

Historical extractions feed into Observe metrics. See drift, escalation patterns, and emotional exposure from your full history.

Compliance Ready

Retrospective pathway marks artifacts as backfill with explicit permission. Audit-ready governance for historical data.

Permission Model

Batch processing uses the retrospective issuance pathway. This requires explicit permission from either the data subject or organizational authority.

Retrospective pathway requirements

  • Content must be voluntarily expressed text (interpreted from meaning, not derived from behavioral signals)
  • Subject must have consented to the original collection
  • Reprocessing must be permitted under original consent or new explicit permission
  • Artifacts are marked with pathway: "retrospective"

Batch Operations

Batch Upload supports two operations, corresponding to the two capture modes.

extract

Full 96-field EDM artifact extraction. ~15 seconds per record. Creates artifacts that can be sealed into .ddna envelopes.

Max content: 50,000 characters per record

observe

Lightweight salience capture. ~5 seconds per record. Creates Salience Records for trigger/escalation analysis. Requires subject_id.

Max content: 20,000 characters per record

API Usage

Upload returns immediately with a job_id. Poll the status endpoint to track progress.

# Upload batch
curl -X POST https://www.deepadata.com/api/v1/batch/upload \
  -H "Authorization: Bearer dda_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "extract",
    "name": "Q1 2025 therapy sessions",
    "records": [
      { "content": "First session transcript..." },
      { "content": "Second session transcript..." }
    ]
  }'

# Response includes job_id
# { "data": { "job_id": "job-01HZ...", "status": "pending" } }

# Poll for status
curl https://www.deepadata.com/api/v1/batch/status?job_id=job-01HZ... \
  -H "Authorization: Bearer dda_live_YOUR_KEY"

Input Formats

Batch Upload accepts JSON or CSV. CSV uploads pass the operation as a query parameter.

JSON format

{
  "operation": "observe",
  "name": "Historical chats",
  "records": [
    {
      "content": "User message content...",
      "subject_id": "user-123"
    },
    {
      "content": "Another message...",
      "subject_id": "user-456"
    }
  ]
}

CSV format

content,subject_id
"First passage text...",user-123
"Second passage text...",user-456
"Third passage text...",user-789

Upload with Content-Type: text/csv and ?operation=observe query param

Limits & Timing

LimitValue
Max records per batch1,000
Max content length (extract)50,000 characters
Max content length (observe)20,000 characters
Processing time (extract)~15 seconds/record
Processing time (observe)~5 seconds/record
Typical batch completion30-120 seconds

Status Response

{
  "success": true,
  "data": {
    "job_id": "job-01HZ3GKWP7XTJY9QN4RD",
    "name": "Q1 2025 therapy sessions",
    "operation": "extract",
    "status": "completed",
    "total_records": 150,
    "processed_records": 150,
    "failed_records": 2,
    "progress_percent": 100,
    "created_at": "2026-02-24T10:30:00.000Z",
    "completed_at": "2026-02-24T10:45:30.000Z"
  }
}
pendingJob queued, not yet started
processingRecords being processed
completedAll records processed
failedJob failed (see error_message)

Enterprise Workflow

For large-scale historical ingestion, we recommend a phased approach.

1

Sample extraction

Run a batch of 100 representative records. Review extraction quality and identify any content patterns that need preprocessing.

2

Permission audit

Verify consent basis for historical data. Document the permission model for retrospective processing.

3

Staged ingestion

Process in batches of 500-1000 records. Monitor job status and error rates. Address failures before proceeding.

4

Seal high-value records

After extraction, seal records that require long-term retention via/v1/issue with pathway: "retrospective".

Related