Voice Sessions

A voice session is a real-time call room backed by LiveKit and Gemini Live. You start a session against a voice agent, connect your client using the returned credentials, and then complete the session with a transcript to get structured extraction results.

Endpoints

Method	Path	Description
`POST`	`/api/v1/voice/sessions`	Start a session
`GET`	`/api/v1/voice/sessions/:id`	Get session status
`POST`	`/api/v1/voice/sessions/:id/complete`	Complete with transcript and get structured results
`POST`	`/api/v1/voice/webhooks/livekit`	LiveKit-signed webhook receiver

Authentication

Authorization: Bearer <your_api_key>

POST /api/v1/voice/webhooks/livekit does not use Pi API key authentication. LiveKit signs the raw request body; Pi verifies the signature using your LiveKit webhook credentials.

Start a session

POST /api/v1/voice/sessions

Creates a LiveKit room, inserts a session record, and mints both a LiveKit user JWT and a Gemini Live ephemeral token. Returns 201 with status: "active".

Request parameters

agent_id

string

required

UUID of the voice agent to use for this session. The agent must be active.

participant

object

required

Identity of the call participant.

Show Participant fields

participant.identity

string

required

Unique identifier for the participant within the room (e.g. a user ID). 1–256 characters.

participant.name

string

Human-readable display name for the participant. Maximum 256 characters.

context

object

JSON object with any caller context you want to make available to the agent (e.g. account details, prior conversation history). Serialized length must be at most 16,000 characters.

ttl_seconds

number

Expiry window in seconds for the LiveKit room, JWT, and Gemini ephemeral token. Range: 60–3600. Default: 600.When you also set max_duration_seconds, ttl_seconds must be greater than or equal to max_duration_seconds.

max_duration_seconds

number

Maximum call length in seconds for this session. Range: 60–1800. Overrides the agent’s behaviors.max_duration_seconds for this session only.When set, your client is responsible for ending the call and disconnecting from LiveKit when this duration elapses.

voice

object

Override the agent’s voice configuration for this session only.

Show Voice fields

voice.name

string

Gemini Live Chirp 3 HD prebuilt voice name (see GET /api/v1/voice/voices).

voice.language_code

string

BCP 47 speech language tag (e.g. "en-US", "fr-FR").

Example request

curl -X POST "https://api.trypi.ai/api/v1/voice/sessions" \
  -H "Authorization: Bearer pi_live_***" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "participant": {
      "identity": "user_abc123",
      "name": "Alice"
    },
    "context": {
      "account_tier": "pro",
      "prior_purchases": 3
    },
    "ttl_seconds": 900,
    "max_duration_seconds": 600
  }'

Response (201)

status

string

Top-level session status: "active".

data.session_id

string

UUID of the created session.

data.agent_id

string

UUID of the agent used for this session.

data.connection.livekit.url

string

LiveKit server URL. Use the wss:// URL when connecting from a browser or mobile client.

data.connection.livekit.token

string

LiveKit user JWT. Pass this to the LiveKit SDK to join the room.

data.connection.gemini_live.url

string

Gemini Live ephemeral WebSocket URL.

data.connection.gemini_live.token

string

Gemini Live ephemeral token. Valid for the ttl_seconds window.

data.system_instruction

string

The compiled system instruction locked into the session.

data.expires_at

number

Unix timestamp when the session credentials expire.

data.max_duration_seconds

number | null

Effective call cap in seconds, or null if no duration limit is set. When non-null, your client must end the call and disconnect from LiveKit when this many seconds have elapsed since the call started.

When max_duration_seconds is set, your client is responsible for tracking elapsed time and disconnecting from LiveKit before the cap is reached. The session is not automatically ended by the server.

Error codes

Code	HTTP	Description
`voice_agent_not_found`	404	No active agent with the given `agent_id`.
`voice_gemini_ephemeral_failed`	502	Failed to mint a Gemini Live ephemeral token.
`voice_session_ttl_too_short`	400	`ttl_seconds` is less than `max_duration_seconds`.

Get session status

GET /api/v1/voice/sessions/:id

Returns the current state of a session, including transcript and extracted results when available.

Path parameters

string

required

UUID of the voice session.

Query parameters

wait_for_completion

boolean

Set to true to long-poll until the session reaches a terminal status or the timeout elapses.

timeout_seconds

number

Long-poll timeout in seconds. Range: 1–120.

Response fields

data.session_id

string

UUID of the session.

data.status

string

Session status: "active", "completed", or "failed".

data.transcript

array

Array of transcript entries once available (set via POST .../complete).

data.results

object

Structured extraction results once available.

data.max_duration_seconds

number | null

Effective call cap, or null.

data.expires_at

number

Unix timestamp when the session credentials expire.

Complete a session

POST /api/v1/voice/sessions/:id/complete

Submit the call transcript to finalize the session. If the agent has output_schema keys or output_schema_strict set, the backend runs a non-live Gemini extraction pass and returns structured results. Otherwise it stores a transcript summary.

Path parameters

string

required

UUID of the active voice session to complete.

Request parameters

transcript

array

required

Array of transcript entries from the call. Minimum 1 entry, maximum 5,000 entries. Each entry has:

Show Transcript entry fields

transcript[].role

string

required

Speaker role: "agent" or "user".

transcript[].text

string

required

Spoken text for this turn. 1–32,000 characters.

transcript[].timestamp

number

Optional Unix timestamp (integer, non-negative) for when this turn occurred.

duration_seconds

number

Actual call duration in seconds. Range: 0–86400.

Example request

curl -X POST "https://api.trypi.ai/api/v1/voice/sessions/SESSION_ID/complete" \
  -H "Authorization: Bearer pi_live_***" \
  -H "Content-Type: application/json" \
  -d '{
    "transcript": [
      { "role": "agent", "text": "Hi! How can I help you today?", "timestamp": 1710000000 },
      { "role": "user", "text": "I am looking for a Pro plan upgrade.", "timestamp": 1710000005 },
      { "role": "agent", "text": "Great, what is your team size?", "timestamp": 1710000008 }
    ],
    "duration_seconds": 120
  }'

Response fields

The response mirrors GET /api/v1/voice/sessions/:id and additionally includes:

results

object

Top-level structured extraction results aligned with the agent’s output_schema or output_schema_strict.

extraction_warnings

array

Present when strict-schema validation finds issues during extraction. Also mirrored under metadata.pi_extraction_warnings.

data.session_id

string

UUID of the session.

data.status

string

"completed" or "failed".

data.transcript

array

The stored transcript entries.

data.results

object

Structured results (same as top-level results).

data.duration_seconds

number

Stored call duration.

data.max_duration_seconds

number | null

Effective call cap.

data.error_log

array

Any non-fatal errors logged during completion.

data.expires_at

number

Unix timestamp of credential expiry.

data.created_at

number

Unix timestamp of session creation.

data.updated_at

number

Unix timestamp of the last update.

Error codes

Code	HTTP	Description
`voice_session_not_active`	409	The session is not in `active` status.
`voice_result_extraction_failed`	502	The Gemini extraction pass failed.

LiveKit webhook

POST /api/v1/voice/webhooks/livekit

Receives signed event payloads from LiveKit. Configure this URL in your LiveKit project dashboard.

This route does not accept Pi API key authentication. LiveKit signs the raw request body using your webhook credentials (LIVEKIT_WEBHOOK_API_KEY / LIVEKIT_WEBHOOK_API_SECRET). Pi verifies the signature before processing the event.

On room_finished events, Pi may merge metadata into the matching session record. Final session completion and structured extraction still require an explicit call to POST /api/v1/voice/sessions/:id/complete.

​Endpoints

​Authentication

​Start a session

​Request parameters

​Example request

​Response (201)

​Error codes

​Get session status

​Path parameters

​Query parameters

​Response fields

​Complete a session

​Path parameters

​Request parameters

​Example request

​Response fields

​Error codes

​LiveKit webhook

Endpoints

Authentication

Start a session

Request parameters

Example request

Response (201)

Error codes

Get session status

Path parameters

Query parameters

Response fields

Complete a session

Path parameters

Request parameters

Example request

Response fields

Error codes

LiveKit webhook