Skip to main content

Voice Sessions

Pi’s voice API provisions two independent real-time connections per session — a LiveKit WebRTC room and a Gemini Live WebSocket — along with a structured completion step that runs a non-live extraction pass on the transcript once the call ends.

Two connections per session

POST /api/v1/voice/sessions returns both connections in a single 201 response:
ConnectionPurpose
data.connection.livekitWebRTC room URL + JWT. Use with livekit-client (or mobile SDKs) for media, recording, or multi-party.
data.connection.gemini_liveWebSocket URL + ephemeral token for direct client access to Gemini Live (gemini-3.1-flash-live-preview only).
Pi does not run a per-call server process that pipes audio between LiveKit and Gemini. Your app should:
  1. Connect the user to LiveKit (if you need WebRTC / room features).
  2. Open the Gemini Live WebSocket and stream mic audio / play model audio per the Live API.
  3. When the call ends, call POST /api/v1/voice/sessions/:id/complete with a transcript so Pi can run structured extraction using the agent’s output_schema.

Step 1 — Create a voice agent

A voice agent stores a reusable configuration: instructions, questions, behaviors, output schema, and voice settings. Create one before starting any session.
export BASE="https://api.example.com"
export API_KEY="pi_live_***"

agent_id=$(curl -sS -X POST "$BASE/api/v1/voice/agents" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Agent",
    "instructions": "You are a helpful customer support agent. Be concise and friendly.",
    "language": "en-US",
    "purpose": "customer_support",
    "questions": [
      "What issue are you experiencing today?",
      "How long has this been happening?"
    ],
    "behaviors": {
      "max_duration_seconds": 600,
      "speaking_pace": "normal",
      "response_length": "moderate",
      "closing_message": "Thank you for contacting support. Have a great day!"
    },
    "output_schema": {
      "issue_type": "string",
      "urgency": "string",
      "resolution_offered": "boolean"
    },
    "voice": { "name": "Charon", "language_code": "en-US" }
  }' | jq -r '.data.agent_id')

echo "Agent ID: $agent_id"
Key agent config fields:
FieldDescription
nameRequired. Display name for the agent.
instructionsRequired. System prompt for the model.
languageDefault en-US.
behaviors.max_duration_secondsDefault max call length (60–1800). Overridable per session.
output_schemaJSON key hints for structured extraction at session completion.
output_schema_strictJSON Schema for constrained extraction (Gemini-supported subset).
voice.nameOne of the 30 prebuilt Gemini Live voices (see GET /api/v1/voice/voices).

Step 2 — Start a session

session=$(curl -sS -X POST "$BASE/api/v1/voice/sessions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"agent_id\": \"$agent_id\",
    \"participant\": { \"identity\": \"user_abc123\", \"name\": \"Alex\" },
    \"ttl_seconds\": 900,
    \"max_duration_seconds\": 600
  }")

session_id=$(echo "$session" | jq -r '.data.session_id')
livekit_url=$(echo "$session" | jq -r '.data.connection.livekit.url')
livekit_token=$(echo "$session" | jq -r '.data.connection.livekit.token')
gemini_url=$(echo "$session" | jq -r '.data.connection.gemini_live.url')
gemini_token=$(echo "$session" | jq -r '.data.connection.gemini_live.token')
max_duration=$(echo "$session" | jq -r '.data.max_duration_seconds')

echo "Session: $session_id"
echo "LiveKit room URL: $livekit_url"
echo "Max duration (seconds): $max_duration"
Response fields (201):
FieldDescription
data.session_idUUID for this session.
data.connection.livekit.urlwss:// LiveKit room URL. Pass to livekit-client.
data.connection.livekit.tokenJWT to join the room.
data.connection.gemini_live.urlEphemeral WebSocket URL for Gemini Live.
data.connection.gemini_live.tokenShort-lived token (scoped to ttl_seconds).
data.max_duration_secondsResolved call cap (number or null if no cap).
data.expires_atUnix timestamp when room tokens expire.
The LiveKit room token returned here is what you pass to livekit-client to join the room from the browser or mobile SDK. It is a standard LiveKit JWT — not a Pi API key.

max_duration_seconds and client-side enforcement

Pi cannot forcibly hang up a browser WebRTC or Gemini socket from a serverless handler. Enforce call length in your client:
  1. Read max_duration_seconds from the session create response (null means no cap from Pi).
  2. Start a countdown timer when the session starts.
  3. When the timer fires: disconnect Gemini Live, leave the LiveKit room, then call POST .../complete with the transcript.
ttl_seconds vs max_duration_seconds:
  • ttl_seconds (default 600, max 3600): provisioning window — LiveKit room empty timeout, LiveKit JWT TTL, and Gemini ephemeral token expiry.
  • max_duration_seconds (60–1800): your intended call cap, enforced by your client timer.
  • Pi requires ttl_seconds >= max_duration_seconds. If too low, the session create call returns voice_session_ttl_too_short (400).

Step 3 — Complete the session

When the call ends, submit the transcript. If the agent has an output_schema or output_schema_strict, Pi runs a non-live Gemini extraction pass and returns structured results.
curl -sS -X POST "$BASE/api/v1/voice/sessions/$session_id/complete" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "transcript": [
      { "role": "agent", "text": "Hi, how can I help you today?" },
      { "role": "user",  "text": "My account is locked and I cannot log in." },
      { "role": "agent", "text": "I can help with that. Let me look into your account." }
    ],
    "duration_seconds": 45
  }'
Response includes the full session state plus top-level results from structured extraction:
{
  "session_id": "...",
  "status": "completed",
  "results": {
    "issue_type": "account_access",
    "urgency": "high",
    "resolution_offered": true
  },
  "transcript": [...],
  "duration_seconds": 45
}
POST .../complete only succeeds when the session status is active. If the session has already been completed or failed, the call returns voice_session_not_active (409). Always complete sessions promptly after the call ends to avoid token expiry.

LiveKit webhook

Configure POST /api/v1/voice/webhooks/livekit in your LiveKit project dashboard to receive room lifecycle events. Pi verifies the LiveKit-signed body using LIVEKIT_WEBHOOK_API_KEY / LIVEKIT_WEBHOOK_API_SECRET. On room_finished, Pi merges room metadata into the matching voice_sessions row. Final completion and structured results still go through POST .../complete — the webhook does not auto-complete a session.
The LiveKit webhook route does not use Pi Bearer auth. LiveKit signs the raw request body; Pi verifies the signature independently.