Gene annotation
Gene finding over long DNA sequences. A two-stage pipeline detects transcription start and polyadenylation signals (4-class) and then filters intragenic positions (6-class) to produce transcript intervals. The model uses a ModernBERT backbone trained on human transcript annotations and predicts on the plus strand.
Input
A DNA sequence passed as sequence (inline) or a server-side handle. The model processes long sequences (up to 100,000 bp) using an 8,192 bp context window with a 4,096 bp prediction window. Predictions are plus-strand oriented.
Annotation is the longest-running task, so it runs asynchronously: submit with Prefer: respond-async, capture the job_id, and poll GET /v1/tasks/jobs/{job_id} for progress until the result is ready.
Output
Transcript intervals, returned as BED and JSON in the {data, meta} envelope.
Try it
REST API: POST /v1/tasks/annotation/predict with Prefer: respond-async — see the getting-started guide and the async annotation recipe. MCP: "Find the genes in chr8:127,680,000-127,800,000." See mcp.md.