deepgram-go-speech-to-text - SKILL.md Agent Skill

name: deepgram-go-speech-to-text description: Use when writing or reviewing Go code in this repo that transcribes prerecorded audio with Listen v1 REST or streams live audio with Listen v1 WebSockets. Route text generation to deepgram-go-text-to-speech, text analysis to deepgram-go-text-intelligence, audio analytics overlays to deepgram-go-audio-intelligence, and Flux or other v2 conversational work to deepgram-go-conversational-stt.

Using Deepgram Speech-to-Text from the Go SDK

When to use this product

Use this skill for pkg/client/listen work:

prerecorded transcription with FromURL, FromFile, or FromStream
live transcription with pkg/client/listen/v1/websocket
channel-based or callback-based streaming flows

Use a different skill when:

you need TTS output (deepgram-go-text-to-speech)
you need text analysis on plain text (deepgram-go-text-intelligence)
you need analytics overlays like summaries, topics, or sentiments (deepgram-go-audio-intelligence)
you need Flux / conversational STT v2 (deepgram-go-conversational-stt)

Authentication

Set DEEPGRAM_API_KEY before constructing clients.

export DEEPGRAM_API_KEY="your_api_key"

This SDK reads env-backed defaults via the client option layer. Prefer API key or token auth supported by the repo's client options; do not hardcode credentials.

Quick start

Prerecorded REST:

package main

import (
    "context"
    "fmt"
    "log"

    api "github.com/deepgram/deepgram-go-sdk/v3/pkg/api/listen/v1/rest"
    listen "github.com/deepgram/deepgram-go-sdk/v3/pkg/client/listen"
    interfaces "github.com/deepgram/deepgram-go-sdk/v3/pkg/client/interfaces"
)

func main() {
    if err := run(); err != nil {
        log.Fatal(err)
    }
}

func run() error {
    ctx := context.Background()

    client := listen.NewRESTWithDefaults()
    dg := api.New(client)

    resp, err := dg.FromURL(
        ctx,
        "https://dpgr.am/spacewalk.wav",
        &interfaces.PreRecordedTranscriptionOptions{
            Model:       "nova-3",
            SmartFormat: true,
            Punctuate:   true,
        },
    )
    if err != nil {
        return err
    }

    fmt.Println(resp.Results.Channels[0].Alternatives[0].Transcript)
    return nil
}

Live WebSocket with channel fan-out:

package main

import (
    "context"
    "fmt"
    "log"

    listenws "github.com/deepgram/deepgram-go-sdk/v3/pkg/api/listen/v1/websocket"
    listen "github.com/deepgram/deepgram-go-sdk/v3/pkg/client/listen"
    interfaces "github.com/deepgram/deepgram-go-sdk/v3/pkg/client/interfaces"
)

func main() {
    if err := run(); err != nil {
        log.Fatal(err)
    }
}

func run() error {
    ctx := context.Background()
    handler := listenws.NewDefaultChanHandler()

    conn, err := listen.NewWSUsingChanWithDefaults(
        ctx,
        &interfaces.LiveTranscriptionOptions{Model: "nova-3", InterimResults: true},
        handler,
    )
    if err != nil {
        return err
    }
    defer conn.Stop()

    if ok := conn.Connect(); !ok {
        return fmt.Errorf("connect failed")
    }

    conn.Start()

    // The handler receives Open/Message/Metadata/UtteranceEnd events.
    // In a real app, stream PCM/audio chunks from your mic or file reader here.
    // For example (pseudo-code):
    //   for chunk := range audioChunks {
    //       if err := conn.WriteBinary(chunk); err != nil { return err }
    //   }
    //
    // When the input stream ends, flush any trailing audio and close cleanly:
    //   if err := conn.Finalize(); err != nil { return err }

    return nil
}

Key parameters

interfaces.PreRecordedTranscriptionOptions
- common fields: Model, Language, Punctuate, SmartFormat, Diarize, DiarizeModel (batch diarization version: latest/v1/v2), Redact, Utterances
- use with pkg/api/listen/v1/rest: api.New(client).FromURL, FromFile, FromStream
interfaces.LiveTranscriptionOptions
- common fields: Model, Language, Encoding, SampleRate, Channels, InterimResults, Endpointing
constructor families
- REST: listen.NewRESTWithDefaults(), listen.NewREST(apiKey, options)
- WS callbacks: listen.NewWSUsingCallback...
- WS channels: listen.NewWSUsingChan...
lifecycle
- Connect() returns bool; call Start(), stream/write audio, KeepAlive() as needed, Finalize(), then defer conn.Stop()

API reference (layered)

In-repo reference
- README.md
- docs.go
- pkg/client/listen/client.go
- pkg/client/listen/v1/rest/client.go
- pkg/client/listen/v1/websocket/client_callback.go
- pkg/client/listen/v1/websocket/client_channel.go
- pkg/client/interfaces/v1/types-prerecorded.go
- pkg/client/interfaces/v1/types-stream.go
OpenAPI
- https://developers.deepgram.com/openapi.yaml
AsyncAPI
- https://developers.deepgram.com/asyncapi.yaml
Context7
- /llmstxt/developers_deepgram_llms_txt
Product docs
- https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded
- https://developers.deepgram.com/reference/speech-to-text/listen-streaming
- https://developers.deepgram.com/docs/speech-to-text

Gotchas

This repo uses listen package names for STT v1, not transcription.
Streaming code is split into callback and channel variants; copy the style that matches the surrounding package.
For WebSockets, pass a handler into NewWSUsingChan..., keep defer conn.Stop() near construction, and finalize before shutdown.
Live and prerecorded option structs are different; do not assume analytics-only prerecorded fields exist in live mode.
Use context.Context and return error; do not translate examples into exception-style control flow.

Example files in this repo

examples/speech-to-text/rest/url/main.go
examples/speech-to-text/rest/file/main.go
examples/speech-to-text/websocket/microphone_channel/main.go
examples/speech-to-text/websocket/microphone_callback/main.go
tests/edge_cases/keepalive/main.go
tests/edge_cases/reconnect_client/main.go

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).