Voice Sessions

Pi’s voice API provisions two independent real-time connections per session — a LiveKit WebRTC room and a Gemini Live WebSocket — along with a structured completion step that runs a non-live extraction pass on the transcript once the call ends.

Two connections per session

POST /api/v1/voice/sessions returns both connections in a single 201 response:

Connection	Purpose
`data.connection.livekit`	WebRTC room URL + JWT. Use with `livekit-client` (or mobile SDKs) for media, recording, or multi-party.
`data.connection.gemini_live`	WebSocket URL + ephemeral token for direct client access to Gemini Live (`gemini-3.1-flash-live-preview` only).

Pi does not run a per-call server process that pipes audio between LiveKit and Gemini. Your app should:

Connect the user to LiveKit (if you need WebRTC / room features).
Open the Gemini Live WebSocket and stream mic audio / play model audio per the Live API.
When the call ends, call POST /api/v1/voice/sessions/:id/complete with a transcript so Pi can run structured extraction using the agent’s output_schema.

Step 1 — Create a voice agent

A voice agent stores a reusable configuration: instructions, questions, behaviors, output schema, and voice settings. Create one before starting any session.

export BASE="https://api.example.com"
export API_KEY="pi_live_***"

agent_id=$(curl -sS -X POST "$BASE/api/v1/voice/agents" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Agent",
    "instructions": "You are a helpful customer support agent. Be concise and friendly.",
    "language": "en-US",
    "purpose": "customer_support",
    "questions": [
      "What issue are you experiencing today?",
      "How long has this been happening?"
    ],
    "behaviors": {
      "max_duration_seconds": 600,
      "speaking_pace": "normal",
      "response_length": "moderate",
      "closing_message": "Thank you for contacting support. Have a great day!"
    },
    "output_schema": {
      "issue_type": "string",
      "urgency": "string",
      "resolution_offered": "boolean"
    },
    "voice": { "name": "Charon", "language_code": "en-US" }
  }' | jq -r '.data.agent_id')

echo "Agent ID: $agent_id"

Key agent config fields:

Field	Description
`name`	Required. Display name for the agent.
`instructions`	Required. System prompt for the model.
`language`	Default `en-US`.
`behaviors.max_duration_seconds`	Default max call length (60–1800). Overridable per session.
`output_schema`	JSON key hints for structured extraction at session completion.
`output_schema_strict`	JSON Schema for constrained extraction (Gemini-supported subset).
`voice.name`	One of the 30 prebuilt Gemini Live voices (see `GET /api/v1/voice/voices`).

Step 2 — Start a session

session=$(curl -sS -X POST "$BASE/api/v1/voice/sessions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"agent_id\": \"$agent_id\",
    \"participant\": { \"identity\": \"user_abc123\", \"name\": \"Alex\" },
    \"ttl_seconds\": 900,
    \"max_duration_seconds\": 600
  }")

session_id=$(echo "$session" | jq -r '.data.session_id')
livekit_url=$(echo "$session" | jq -r '.data.connection.livekit.url')
livekit_token=$(echo "$session" | jq -r '.data.connection.livekit.token')
gemini_url=$(echo "$session" | jq -r '.data.connection.gemini_live.url')
gemini_token=$(echo "$session" | jq -r '.data.connection.gemini_live.token')
max_duration=$(echo "$session" | jq -r '.data.max_duration_seconds')

echo "Session: $session_id"
echo "LiveKit room URL: $livekit_url"
echo "Max duration (seconds): $max_duration"

Response fields (201):

Field	Description
`data.session_id`	UUID for this session.
`data.connection.livekit.url`	`wss://` LiveKit room URL. Pass to `livekit-client`.
`data.connection.livekit.token`	JWT to join the room.
`data.connection.gemini_live.url`	Ephemeral WebSocket URL for Gemini Live.
`data.connection.gemini_live.token`	Short-lived token (scoped to `ttl_seconds`).
`data.max_duration_seconds`	Resolved call cap (number or `null` if no cap).
`data.expires_at`	Unix timestamp when room tokens expire.

The LiveKit room token returned here is what you pass to livekit-client to join the room from the browser or mobile SDK. It is a standard LiveKit JWT — not a Pi API key.

`max_duration_seconds` and client-side enforcement

Pi cannot forcibly hang up a browser WebRTC or Gemini socket from a serverless handler. Enforce call length in your client:

Read max_duration_seconds from the session create response (null means no cap from Pi).
Start a countdown timer when the session starts.
When the timer fires: disconnect Gemini Live, leave the LiveKit room, then call POST .../complete with the transcript.

ttl_seconds vs max_duration_seconds:

ttl_seconds (default 600, max 3600): provisioning window — LiveKit room empty timeout, LiveKit JWT TTL, and Gemini ephemeral token expiry.
max_duration_seconds (60–1800): your intended call cap, enforced by your client timer.
Pi requires ttl_seconds >= max_duration_seconds. If too low, the session create call returns voice_session_ttl_too_short (400).

Step 3 — Complete the session

When the call ends, submit the transcript. If the agent has an output_schema or output_schema_strict, Pi runs a non-live Gemini extraction pass and returns structured results.

curl -sS -X POST "$BASE/api/v1/voice/sessions/$session_id/complete" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "transcript": [
      { "role": "agent", "text": "Hi, how can I help you today?" },
      { "role": "user",  "text": "My account is locked and I cannot log in." },
      { "role": "agent", "text": "I can help with that. Let me look into your account." }
    ],
    "duration_seconds": 45
  }'

Response includes the full session state plus top-level results from structured extraction:

{
  "session_id": "...",
  "status": "completed",
  "results": {
    "issue_type": "account_access",
    "urgency": "high",
    "resolution_offered": true
  },
  "transcript": [...],
  "duration_seconds": 45
}

POST .../complete only succeeds when the session status is active. If the session has already been completed or failed, the call returns voice_session_not_active (409). Always complete sessions promptly after the call ends to avoid token expiry.

LiveKit webhook

Configure POST /api/v1/voice/webhooks/livekit in your LiveKit project dashboard to receive room lifecycle events. Pi verifies the LiveKit-signed body using LIVEKIT_WEBHOOK_API_KEY / LIVEKIT_WEBHOOK_API_SECRET. On room_finished, Pi merges room metadata into the matching voice_sessions row. Final completion and structured results still go through POST .../complete — the webhook does not auto-complete a session.

The LiveKit webhook route does not use Pi Bearer auth. LiveKit signs the raw request body; Pi verifies the signature independently.

​Voice Sessions

​Two connections per session

​Step 1 — Create a voice agent

​Step 2 — Start a session

​max_duration_seconds and client-side enforcement

​Step 3 — Complete the session

​LiveKit webhook

​Related

Voice Sessions

Two connections per session

Step 1 — Create a voice agent

Step 2 — Start a session

`max_duration_seconds` and client-side enforcement

Step 3 — Complete the session

LiveKit webhook

Related