Voice Sessions
Pi’s voice API provisions two independent real-time connections per session — a LiveKit WebRTC room and a Gemini Live WebSocket — along with a structured completion step that runs a non-live extraction pass on the transcript once the call ends.
Two connections per session
POST /api/v1/voice/sessions returns both connections in a single 201 response:
| Connection | Purpose |
|---|
data.connection.livekit | WebRTC room URL + JWT. Use with livekit-client (or mobile SDKs) for media, recording, or multi-party. |
data.connection.gemini_live | WebSocket URL + ephemeral token for direct client access to Gemini Live (gemini-3.1-flash-live-preview only). |
Pi does not run a per-call server process that pipes audio between LiveKit and Gemini. Your app should:
- Connect the user to LiveKit (if you need WebRTC / room features).
- Open the Gemini Live WebSocket and stream mic audio / play model audio per the Live API.
- When the call ends, call
POST /api/v1/voice/sessions/:id/complete with a transcript so Pi can run structured extraction using the agent’s output_schema.
Step 1 — Create a voice agent
A voice agent stores a reusable configuration: instructions, questions, behaviors, output schema, and voice settings. Create one before starting any session.
export BASE="https://api.example.com"
export API_KEY="pi_live_***"
agent_id=$(curl -sS -X POST "$BASE/api/v1/voice/agents" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Support Agent",
"instructions": "You are a helpful customer support agent. Be concise and friendly.",
"language": "en-US",
"purpose": "customer_support",
"questions": [
"What issue are you experiencing today?",
"How long has this been happening?"
],
"behaviors": {
"max_duration_seconds": 600,
"speaking_pace": "normal",
"response_length": "moderate",
"closing_message": "Thank you for contacting support. Have a great day!"
},
"output_schema": {
"issue_type": "string",
"urgency": "string",
"resolution_offered": "boolean"
},
"voice": { "name": "Charon", "language_code": "en-US" }
}' | jq -r '.data.agent_id')
echo "Agent ID: $agent_id"
Key agent config fields:
| Field | Description |
|---|
name | Required. Display name for the agent. |
instructions | Required. System prompt for the model. |
language | Default en-US. |
behaviors.max_duration_seconds | Default max call length (60–1800). Overridable per session. |
output_schema | JSON key hints for structured extraction at session completion. |
output_schema_strict | JSON Schema for constrained extraction (Gemini-supported subset). |
voice.name | One of the 30 prebuilt Gemini Live voices (see GET /api/v1/voice/voices). |
Step 2 — Start a session
session=$(curl -sS -X POST "$BASE/api/v1/voice/sessions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"agent_id\": \"$agent_id\",
\"participant\": { \"identity\": \"user_abc123\", \"name\": \"Alex\" },
\"ttl_seconds\": 900,
\"max_duration_seconds\": 600
}")
session_id=$(echo "$session" | jq -r '.data.session_id')
livekit_url=$(echo "$session" | jq -r '.data.connection.livekit.url')
livekit_token=$(echo "$session" | jq -r '.data.connection.livekit.token')
gemini_url=$(echo "$session" | jq -r '.data.connection.gemini_live.url')
gemini_token=$(echo "$session" | jq -r '.data.connection.gemini_live.token')
max_duration=$(echo "$session" | jq -r '.data.max_duration_seconds')
echo "Session: $session_id"
echo "LiveKit room URL: $livekit_url"
echo "Max duration (seconds): $max_duration"
Response fields (201):
| Field | Description |
|---|
data.session_id | UUID for this session. |
data.connection.livekit.url | wss:// LiveKit room URL. Pass to livekit-client. |
data.connection.livekit.token | JWT to join the room. |
data.connection.gemini_live.url | Ephemeral WebSocket URL for Gemini Live. |
data.connection.gemini_live.token | Short-lived token (scoped to ttl_seconds). |
data.max_duration_seconds | Resolved call cap (number or null if no cap). |
data.expires_at | Unix timestamp when room tokens expire. |
The LiveKit room token returned here is what you pass to livekit-client to join the room from the browser or mobile SDK. It is a standard LiveKit JWT — not a Pi API key.
max_duration_seconds and client-side enforcement
Pi cannot forcibly hang up a browser WebRTC or Gemini socket from a serverless handler. Enforce call length in your client:
- Read
max_duration_seconds from the session create response (null means no cap from Pi).
- Start a countdown timer when the session starts.
- When the timer fires: disconnect Gemini Live, leave the LiveKit room, then call
POST .../complete with the transcript.
ttl_seconds vs max_duration_seconds:
ttl_seconds (default 600, max 3600): provisioning window — LiveKit room empty timeout, LiveKit JWT TTL, and Gemini ephemeral token expiry.
max_duration_seconds (60–1800): your intended call cap, enforced by your client timer.
- Pi requires
ttl_seconds >= max_duration_seconds. If too low, the session create call returns voice_session_ttl_too_short (400).
Step 3 — Complete the session
When the call ends, submit the transcript. If the agent has an output_schema or output_schema_strict, Pi runs a non-live Gemini extraction pass and returns structured results.
curl -sS -X POST "$BASE/api/v1/voice/sessions/$session_id/complete" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"transcript": [
{ "role": "agent", "text": "Hi, how can I help you today?" },
{ "role": "user", "text": "My account is locked and I cannot log in." },
{ "role": "agent", "text": "I can help with that. Let me look into your account." }
],
"duration_seconds": 45
}'
Response includes the full session state plus top-level results from structured extraction:
{
"session_id": "...",
"status": "completed",
"results": {
"issue_type": "account_access",
"urgency": "high",
"resolution_offered": true
},
"transcript": [...],
"duration_seconds": 45
}
POST .../complete only succeeds when the session status is active. If the session has already been completed or failed, the call returns voice_session_not_active (409). Always complete sessions promptly after the call ends to avoid token expiry.
LiveKit webhook
Configure POST /api/v1/voice/webhooks/livekit in your LiveKit project dashboard to receive room lifecycle events. Pi verifies the LiveKit-signed body using LIVEKIT_WEBHOOK_API_KEY / LIVEKIT_WEBHOOK_API_SECRET.
On room_finished, Pi merges room metadata into the matching voice_sessions row. Final completion and structured results still go through POST .../complete — the webhook does not auto-complete a session.
The LiveKit webhook route does not use Pi Bearer auth. LiveKit signs the raw request body; Pi verifies the signature independently.