Skip to main content
A voice session is a real-time call room backed by LiveKit and Gemini Live. You start a session against a voice agent, connect your client using the returned credentials, and then complete the session with a transcript to get structured extraction results.

Endpoints

MethodPathDescription
POST/api/v1/voice/sessionsStart a session
GET/api/v1/voice/sessions/:idGet session status
POST/api/v1/voice/sessions/:id/completeComplete with transcript and get structured results
POST/api/v1/voice/webhooks/livekitLiveKit-signed webhook receiver

Authentication

Authorization: Bearer <your_api_key>
POST /api/v1/voice/webhooks/livekit does not use Pi API key authentication. LiveKit signs the raw request body; Pi verifies the signature using your LiveKit webhook credentials.

Start a session

POST /api/v1/voice/sessions
Creates a LiveKit room, inserts a session record, and mints both a LiveKit user JWT and a Gemini Live ephemeral token. Returns 201 with status: "active".

Request parameters

agent_id
string
required
UUID of the voice agent to use for this session. The agent must be active.
participant
object
required
Identity of the call participant.
context
object
JSON object with any caller context you want to make available to the agent (e.g. account details, prior conversation history). Serialized length must be at most 16,000 characters.
ttl_seconds
number
Expiry window in seconds for the LiveKit room, JWT, and Gemini ephemeral token. Range: 60–3600. Default: 600.When you also set max_duration_seconds, ttl_seconds must be greater than or equal to max_duration_seconds.
max_duration_seconds
number
Maximum call length in seconds for this session. Range: 60–1800. Overrides the agent’s behaviors.max_duration_seconds for this session only.When set, your client is responsible for ending the call and disconnecting from LiveKit when this duration elapses.
voice
object
Override the agent’s voice configuration for this session only.

Example request

curl -X POST "https://api.trypi.ai/api/v1/voice/sessions" \
  -H "Authorization: Bearer pi_live_***" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "participant": {
      "identity": "user_abc123",
      "name": "Alice"
    },
    "context": {
      "account_tier": "pro",
      "prior_purchases": 3
    },
    "ttl_seconds": 900,
    "max_duration_seconds": 600
  }'

Response (201)

status
string
Top-level session status: "active".
data.session_id
string
UUID of the created session.
data.agent_id
string
UUID of the agent used for this session.
data.connection.livekit.url
string
LiveKit server URL. Use the wss:// URL when connecting from a browser or mobile client.
data.connection.livekit.token
string
LiveKit user JWT. Pass this to the LiveKit SDK to join the room.
data.connection.gemini_live.url
string
Gemini Live ephemeral WebSocket URL.
data.connection.gemini_live.token
string
Gemini Live ephemeral token. Valid for the ttl_seconds window.
data.system_instruction
string
The compiled system instruction locked into the session.
data.expires_at
number
Unix timestamp when the session credentials expire.
data.max_duration_seconds
number | null
Effective call cap in seconds, or null if no duration limit is set. When non-null, your client must end the call and disconnect from LiveKit when this many seconds have elapsed since the call started.
When max_duration_seconds is set, your client is responsible for tracking elapsed time and disconnecting from LiveKit before the cap is reached. The session is not automatically ended by the server.

Error codes

CodeHTTPDescription
voice_agent_not_found404No active agent with the given agent_id.
voice_gemini_ephemeral_failed502Failed to mint a Gemini Live ephemeral token.
voice_session_ttl_too_short400ttl_seconds is less than max_duration_seconds.

Get session status

GET /api/v1/voice/sessions/:id
Returns the current state of a session, including transcript and extracted results when available.

Path parameters

id
string
required
UUID of the voice session.

Query parameters

wait_for_completion
boolean
Set to true to long-poll until the session reaches a terminal status or the timeout elapses.
timeout_seconds
number
Long-poll timeout in seconds. Range: 1–120.

Response fields

data.session_id
string
UUID of the session.
data.status
string
Session status: "active", "completed", or "failed".
data.transcript
array
Array of transcript entries once available (set via POST .../complete).
data.results
object
Structured extraction results once available.
data.max_duration_seconds
number | null
Effective call cap, or null.
data.expires_at
number
Unix timestamp when the session credentials expire.

Complete a session

POST /api/v1/voice/sessions/:id/complete
Submit the call transcript to finalize the session. If the agent has output_schema keys or output_schema_strict set, the backend runs a non-live Gemini extraction pass and returns structured results. Otherwise it stores a transcript summary.

Path parameters

id
string
required
UUID of the active voice session to complete.

Request parameters

transcript
array
required
Array of transcript entries from the call. Minimum 1 entry, maximum 5,000 entries. Each entry has:
duration_seconds
number
Actual call duration in seconds. Range: 0–86400.

Example request

curl -X POST "https://api.trypi.ai/api/v1/voice/sessions/SESSION_ID/complete" \
  -H "Authorization: Bearer pi_live_***" \
  -H "Content-Type: application/json" \
  -d '{
    "transcript": [
      { "role": "agent", "text": "Hi! How can I help you today?", "timestamp": 1710000000 },
      { "role": "user", "text": "I am looking for a Pro plan upgrade.", "timestamp": 1710000005 },
      { "role": "agent", "text": "Great, what is your team size?", "timestamp": 1710000008 }
    ],
    "duration_seconds": 120
  }'

Response fields

The response mirrors GET /api/v1/voice/sessions/:id and additionally includes:
results
object
Top-level structured extraction results aligned with the agent’s output_schema or output_schema_strict.
extraction_warnings
array
Present when strict-schema validation finds issues during extraction. Also mirrored under metadata.pi_extraction_warnings.
data.session_id
string
UUID of the session.
data.status
string
"completed" or "failed".
data.transcript
array
The stored transcript entries.
data.results
object
Structured results (same as top-level results).
data.duration_seconds
number
Stored call duration.
data.max_duration_seconds
number | null
Effective call cap.
data.error_log
array
Any non-fatal errors logged during completion.
data.expires_at
number
Unix timestamp of credential expiry.
data.created_at
number
Unix timestamp of session creation.
data.updated_at
number
Unix timestamp of the last update.

Error codes

CodeHTTPDescription
voice_session_not_active409The session is not in active status.
voice_result_extraction_failed502The Gemini extraction pass failed.

LiveKit webhook

POST /api/v1/voice/webhooks/livekit
Receives signed event payloads from LiveKit. Configure this URL in your LiveKit project dashboard.
This route does not accept Pi API key authentication. LiveKit signs the raw request body using your webhook credentials (LIVEKIT_WEBHOOK_API_KEY / LIVEKIT_WEBHOOK_API_SECRET). Pi verifies the signature before processing the event.
On room_finished events, Pi may merge metadata into the matching session record. Final session completion and structured extraction still require an explicit call to POST /api/v1/voice/sessions/:id/complete.