This tutorial will show you how to create a hands-on advanced workflow using the Deepgram Python SDK to explore the integration of modern voice AI in one Python environment. The SDK is used to perform transcription, text analysis, and speech generation. The pipeline is extended to allow for async processing, which allows faster and more efficient execution. Also, we generate speech using multiple TTS voice, analyze the text to determine sentiment, topic, and intents and look at advanced transcription controls, such as keyword searching, replacement, boost, raw response, and structured errors handling. We create an end-to-end Deepgram AI voice workflow through this method that is technically precise and easily adaptable for real world applications.
!pip install deepgram-sdk httpx --quiet
import os, asyncio, textwrap, urllib.request
From getpass import Getpass
DeepgramClient can be imported as a client.
from deepgram.core.api_error import ApiError
Import Audio and display from IPython.display
DEEPGRAM_API_KEY = getpass("🔑 Enter your Deepgram API key: ")
os.environ["DEEPGRAM_API_KEY"] = DEEPGRAM_API_KEY
client = DeepgramClient(api_key=DEEPGRAM_API_KEY)
async_client = AsyncDeepgramClient(api_key=DEEPGRAM_API_KEY)
AUDIO_URL = "https://dpgr.am/spacewalk.wav"
AUDIO_PATH = "/tmp/sample.wav"
urllib.request.urlretrieve(AUDIO_URL, AUDIO_PATH)
def read_audio(path=AUDIO_PATH):
Open(path) "rb"As f:
Read Return f.()
def _get(obj, key, default=None):
"""Get a field from either a dict or an object — v6 returns both."""
If isinstance (obj,dict)
return obj.get(key, default)
return getattr(obj, key, default)
def get_model_name(meta):
Mi = _get (meta) "model_info")
Return if mi = None "n/a"
Return _get() (mi). "name", "n/a")
def tts_to_bytes(response) -> bytes:
"""v6 generate() returns a generator of chunks or an object with .stream."""
If hasattr (response) "stream"):
return response.stream.getvalue()
Return b""If isinstance() returns a chunk, then.join(chunk) will be returned.
def save_tts(response, path: str) -> str:
Open(path) "wb"As f:
f.write(tts_to_bytes(response))
Return path
print("✅ Deepgram client ready | sample audio downloaded")
print("n" + "="*60)
print("📼 SECTION 2: Pre-Recorded Transcription from URL")
print("="*60)
response = client.listen.v1.media.transcribe_url(
url=AUDIO_URL,
model="nova-3",
smart_format=True,
diarize=True,
language="en",
utterances=True,
filler_words=True,
)
transcript = response.results.channels[0].alternatives[0].transcript
print(f"n📝 Full Transcript:n{textwrap.fill(transcript, 80)}")
confidence = response.results.channels[0].alternatives[0].confidence
print(f"n🎯 Confidence: {confidence:.2%}")
words = response.results.channels[0].alternatives[0].words
print(f"n🔤 First 5 words with timing:")
For w in words[:5]:
print(f" '{w.word}' start={w.start:.2f}s end={w.end:.2f}s conf={w.confidence:.2f}")
print(f"n👥 Speaker Diarization (first 5 words):")
For w in words[:5]:
speaker = getattr(w, "speaker", None)
If speaker is None
print(f" Speaker {int(speaker)}: '{w.word}'")
The meta tag is used to indicate the response.metadata.
print(f"n📊 Metadata: duration={meta.duration:.2f}s channels={int(meta.channels)} model={get_model_name(meta)}")
Installing the Deepgram SDK with its dependencies and setting up authentication securely using our API Key is next. Initialize both the synchronous and asynchronous Deepgram client, download an audio sample, define helper function to make working with audio bytes and model metadata easier, as well as streamed TTS. Then, we run our first recorded transcription using a URL, inspecting the transcript, the confidence score, the word-level time stamps, the speaker diarization and the metadata in order to better understand the richness and structure of the response.
print("n" + "="*60)
print("📂 SECTION 3: Pre-Recorded Transcription from File")
print("="*60)
file_response = client.listen.v1.media.transcribe_file(
request=read_audio(),
model="nova-3",
smart_format=True,
diarize=True,
paragraphs=True,
summarize="v2",
)
alt = file_response.results.channels[0].alternatives[0]
paragraphs = getattr(alt, "paragraphs", None)
If paragraphs are _get (paragraphs) and then _get(), "paragraphs"):
print("n📄 Paragraph-Formatted Transcript:")
If you want to get the paragraphs in your _get(), use this code. "paragraphs")[:2]:
The following sentences are " ".join(_get(s, "text", ""Then, if s = s (_getpara, "sentences"" or []))
print(f" [Speaker {int(_get(para,'speaker',0))}, "
f"{_get(para,'start',0):.1f}s–{_get(para,'end',0):.1f}s] {sentences[:120]}...")
else:
print(f"n📝 Transcript: {alt.transcript[:200]}...")
if getattr(file_response.results, "summary", None):
short = _get(file_response.results.summary, "short", "")
Short:
print(f"n📌 AI Summary: {short}")
print(f"n🎯 Confidence: {alt.confidence:.2%}")
print(f"🔤 Word count : {len(alt.words)}")
print("n" + "="*60)
print("⚡ SECTION 4: Async Parallel Transcription")
print("="*60)
Async Def Transcribe_Async():
audio_bytes = read_audio()
Async Def From_url (label)
r = await async_client.listen.v1.media.transcribe_url(
url=AUDIO_URL, model="nova-3", smart_format=True,
)
print(f" [{label}] {r.results.channels[0].alternatives[0].transcript[:100]}...")
Define async from_file (label)
r = await async_client.listen.v1.media.transcribe_file(
request=audio_bytes, model="nova-3", smart_format=True,
)
print(f" [{label}] {r.results.channels[0].alternatives[0].transcript[:100]}...")
await asyncio.gather(from_url("From URL"), from_file("From File"))
await transcribe_async()
By sending audio files directly to Deepgram API instead of URLs, we can now access richer features such as summarization and paragraphs. The SDK’s support for more analysis-friendly and readable transcription is demonstrated by the paragraph structure returned, the speaker segmentation output, the summary output, the confidence score and the word count. In addition, we introduce asynchronous processes and simultaneously run URL and file-based transcripts to help us better understand voice AI pipelines that are faster and more scalable.
print("n" + "="*60)
print("🔊 SECTION 5: Text-to-Speech")
print("="*60)
sample_text = (
"Welcome to the Deepgram advanced tutorial. "
"This SDK lets you transcribe audio, generate speech, "
"and analyse text — all with a simple Python interface."
)
tts_path = save_tts(
client.speak.v1.audio.generate(text=sample_text, model="aura-2-asteria-en"),
"/tmp/tts_output.mp3",
)
size_kb = os.path.getsize(tts_path) / 1024
print(f"✅ TTS audio saved → {tts_path} ({size_kb:.1f} KB)")
display(Audio(tts_path))
print("n" + "="*60)
print("🎭 SECTION 6: Multiple TTS Voices Comparison")
print("="*60)
voices = {
"aura-2-asteria-en": "Asteria (female, warm)",
"aura-2-orion-en": "Orion (male, deep)",
"aura-2-luna-en": "Luna (female, bright)",
}
For model_id label, in the voices.items():
try:
Path = Save_tts (
client.speak.v1.audio.generate(text="Hello! I am a Deepgram voice model.", model=model_id),
F"/tmp/tts_{model_id}.mp3",
)
print(f" ✅ {label}")
display(Audio(path))
Except Exception As e.
print(f" ⚠️ {label} — {e}")
print("n" + "="*60)
print("🧠 SECTION 7: Text Intelligence — Sentiment, Topics, Intents")
print("="*60)
review_text = (
"I absolutely love this product! It arrived quickly, the quality is "
"outstanding, and customer support was incredibly helpful when I had "
"a question. I would definitely recommend it to anyone looking for "
"a reliable solution. Five stars!"
)
read_response = client.read.v1.text.analyze(
request={"text": review_text},
language="en",
sentiment=True,
topics=True,
intents=True,
summarize=True,
)
results = read_response.results
Then, we convert the text into audio by using Deepgram’s Text-to-Speech API. We save the audio file as MP3. Then we compare different TTS voices in order to see how they behave, and how easy it is to switch them without changing the code pattern. We then begin to work with Deepgram’s Text Intelligence system by sending the review text through the Read API.
if getattr(results, "sentiments", None):
overall = results.sentiments.average
print(f"😊 Sentiment: {_get(overall,'sentiment','?').upper()} "
The f"(score={_get(overall,'sentiment_score',0):.3f})")
for seg in (_get(results.sentiments, "segments"" or [])[:2]:
print(f" • "{_get(seg,'text','')[:60]}" → {_get(seg,'sentiment','?')}")
if getattr(results, "topics", None):
print(f"n🏷️ Topics Detected:")
for seg in (_get(results.topics, "segments"" or [])[:3]:
For t, (_get(seg) "topics"" or []):
print(f" • {_get(t,'topic','?')} (conf={_get(t,'confidence_score',0):.2f})")
if getattr(results, "intents", None):
print(f"n🎯 Intents Detected:")
for seg in (_get(results.intents, "segments"" or [])[:3]:
For intent in (_get()seg "intents"" or []):
print(f" • {_get(intent,'intent','?')} (conf={_get(intent,'confidence_score',0):.2f})")
if getattr(results, "summary", None):
text = _get(results.summary, "text", "")
If text
print(f"n📌 Summary: {text}")
print("n" + "="*60)
print("⚙️ SECTION 8: Advanced Options — Search, Replace, Boost")
print("="*60)
search_response = client.listen.v1.media.transcribe_url(
url=AUDIO_URL,
model="nova-3",
smart_format=True,
punctuate=True,
search=["spacewalk", "mission", "astronaut"],
replace=[{"find": "um", "replace": "[hesitation]"}],
keyterm=["spacewalk", "NASA"],
)
ch = search_response.results.channels[0]
if getattr(ch, "search", None):
print("🔍 Keyword Search Hits:")
Search for a hit_group using ch.search
To get the number of hits, use _get_hit_group() "hits"" or []
print(f" '{_get(hit_group,'query','?')}': {len(hits)} hit(s)")
For h, in Hits[:2]:
print(f" at {_get(h,'start',0):.2f}s–{_get(h,'end',0):.2f}s "
F"conf={_get(h,'confidence',0):.2f}")
print(f"n📝 Transcript:n{textwrap.fill(ch.alternatives[0].transcript, 80)}")
print("n" + "="*60)
print("🔩 SECTION 9: Raw HTTP Response Access")
print("="*60)
raw = client.listen.v1.media.with_raw_response.transcribe_url(
url=AUDIO_URL, model="nova-3",
)
print(f"Response type : {type(raw.data).__name__}")
request_id = raw.headers.get("dg-request-id", raw.headers.get("x-dg-request-id", "n/a"))
print(f"Request ID : {request_id}")
To understand the deeper language insights, we continue to work with text intelligence. We examine sentiment, topics and intents as well as summary outputs. Then, we explore more advanced transcription features, including search terms, replacement of words, and keyterm boost, in order to improve the accuracy and utility of transcription for specific domain applications. The raw HTTP headers and responses are then accessed, which provides a deeper view into the API interaction, and makes debugging, observing, and monitoring easier.
print("n" + "="*60)
print("🛡️ SECTION 10: Error Handling")
print("="*60)
Safe_transcribe() (url: str; model: str) = "nova-3"):
try:
r = client.listen.v1.media.transcribe_url(
url=url, model=model,
request_options={"timeout_in_seconds": 30, "max_retries": 2},
)
Return r.results.channels[0].alternatives[0].transcript
except ApiError as e:
print(f" ❌ ApiError {e.status_code}: {e.body}")
Return None
Except Exception As e.
print(f" ❌ {type(e).__name__}: {e}")
Return None
t = safe_transcribe(AUDIO_URL)
print(f"✅ Valid URL → '{t[:60]}...'")
t_bad = safe_transcribe("https://example.com/nonexistent_audio.wav")
If t_bad = None
print("✅ Invalid URL → error caught gracefully")
print("n" + "="*60)
print("🎉 Tutorial complete! Sections covered:")
for example [
"2. transcribe_url(url=...) + diarization + word timing",
"3. transcribe_file(request=bytes) + paragraphs + summarize",
"4. Async parallel transcription",
"5. Text-to-Speech — generator-safe via save_tts()",
"6. Multi-voice TTS comparison",
"7. Text Intelligence — sentiment, topics, intents (dict-safe)",
"8. Advanced options — keyword search, word replacement, boosting",
"9. Raw HTTP response & request ID",
"10. Error handling with ApiError + retries"
]:
print(f" ✅ {s}")
print("="*60)
The wrapper is designed to handle API exceptions, while also gracefully handling API-specific ones. The function is tested with both valid audio URLs and invalid audioURLs in order to verify that the workflow works reliably, even when there are errors. This tutorial concludes with the printing of a comprehensive summary that includes all sections covered. It helps to review the complete Deepgram pipeline including transcription, TTS, text intelligence, advanced features, raw responses and error handling.
We concluded by gaining a practical and complete understanding of the Deepgram Python SDK and how it can be used for voice and language workflows. In addition to high-quality transcripts and text-tospeech, we learned how extract more value from audio files and text using metadata inspection, summary, sentiment analysis and topic detection. The tutorial is much more complex than just a simple SDK demonstration, as we have actively integrated multiple capabilities to create a pipeline which reflects the way production-ready AI voice systems are typically built. We also saw that the SDK allows for both easy-to-use and advanced control. This allowed us to progress from basic examples to more robust, richer implementations. The Deepgram SDK provided us with the foundation we needed to create transcription tools and audio intelligence systems.
Check out the Full Codes here. Also, feel free to follow us on Twitter Join our Facebook group! 130k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.
You can partner with us to promote your GitHub Repository OR Hugging Page OR New Product Launch OR Webinar, etc.? Connect with us

