What is Asyncio? How to get started with asynchronous Python with a LLM

The performance of AI applications is important. You may have noticed that while working with Large Language Models (LLMs), a lot of time is spent waiting—waiting for an API response, waiting for multiple calls to finish, or waiting for I/O operations.

This is where asyncio can help. Many developers are using LLMs, but they don’t realize that asynchronous programming can help them speed up apps.

You will learn how to:

What is the asyncio language?
Start using asynchronous Python
Use of asyncio with an LLM in an AI-application

What is Asyncio?

Asyncio, the Python library for concurrent programming using async/await allows multiple I/O tasks to be run within one thread. At its core, asyncio works with awaitable objects—usually coroutines—that an event loop schedules and executes without blocking.

Simpler still, synchronous codes run tasks sequentially like standing in one grocery line. While asynchronous codes perform tasks concurrently like multiple self-checkouts. It is particularly useful when making API calls, such as OpenAI, Anthropic or Hugging Face, where the majority of time is spent in waiting for responses. This allows for much quicker execution.

Get Started with Python asynchronous

Async and Synchronous Tasks: Example

We ran the same function in this example three times synchronously. Each call to say_hello produces the same output.() Prints “Hello…”Then, it prints. “…World!”. Since the calls happen one after another, the wait time adds up — 2 seconds × 3 calls = 6 seconds total. Take a look at the FULL CODES here.

import time

Say hello to def():
    print("Hello...")
 Simulate waiting by using time.sleep(2)
    print("...World!")

Def main():
    say_hello()
    say_hello()
    say_hello()

If the __name__ equals "__main__":
 Start = Time()
 You can also read more about it here.()
    print(f"Finished in {time.time() - start:.2f} seconds")

Below code shows all three call to say_hello() Function started at almost the same time. Each prints “Hello…” Immediately, wait two seconds and then print “…World!”.

The sum of wait times (6 seconds for the synchronous variant) is reduced to a mere 2 seconds because these tasks were run in parallel, rather than sequentially. Asyncio is a better choice for tasks that are I/O bound. See the FULL CODES here.

import nest_asyncio, asyncio
nest_asyncio.apply()
import time

async def say_hello():
    print("Hello...")
 Asyncio.sleep(2) # Simulate waiting, like an API call
    print("...World!")

async def main():
 # Execute tasks at the same time
 Asyncio.gather (attend)
        say_hello(),
        say_hello(),
        say_hello()
    )

If the __name__ equals "__main__":
 Start = Time()
    asyncio.run(main())
    print(f"Finished in {time.time() - start:.2f} seconds")

Simulator Download

Imagine that you want to download multiple files. While each download is time-consuming, the program could be working on another download instead of being idle.

import asyncio
Random Import
import time

async def download_file(file_id: int):
    print(f"Start downloading file {file_id}")
 Simulation of variable download times: download_time = Random.Uniform(1, 3).
    await asyncio.sleep(download_time)    # non-blocking wait
    print(f"Finished downloading file {file_id} in {download_time:.2f} seconds")
 Return f"File {file_id} content"

async def main():
    files = [1, 2, 3, 4, 5]

    start_time = time.time()
    
 # Downloads should be done simultaneously
    results = await asyncio.gather(*(download_file(f) for f in files))
    
    end_time = time.time()
    print("nAll downloads completed.")
    print(f"Total time taken: {end_time - start_time:.2f} seconds")
    print("Results:", results)

If the __name__ equals "__main__":
    asyncio.run(main())

As shown in the screenshot, all downloads began at almost the same time. “Start downloading file X” All of a sudden, lines appear.
The time it took to complete each file was different. “download” Simulated with Asyncio.Sleep()), so they finished at different times — file 3 finished first in 1.42 seconds, and file 1 last in 2.67 seconds.
Because all downloads ran simultaneously, the time total was approximately equal to the single longest download (2.68 second), and not the sum.

This demonstrates the power of asyncio — when tasks involve waiting, they can be done in parallel, greatly improving efficiency.

Asyncio for AI with an LLM

We now understand the asyncio concept. Now let’s use it in a real AI application. OpenAI GPT’s Large Language Models often require multiple API calls, each of which takes time. We waste time waiting on responses if we make these API calls in succession.

We’ll use 15 short prompts to clearly demonstrate the performance difference. The performance differences will be demonstrated using 15 brief prompts. Take a look at the FULL CODES here.

import asyncio
AsyncOpenAI can be imported from openai


Import os
Getpass Import
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

import time
OpenAI import OpenAI

Synchronize client
Client = OpenAI()

Asking a question with def ask_llm (prompt: str).
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
 Choose return response.[0].message.content

Def main():
    prompts = [
    "Briefly explain quantum computing.",
    "Write a 3-line haiku about AI.",
    "List 3 startup ideas in agri-tech.",
    "Summarize Inception in 2 sentences.",
    "Explain blockchain in 2 sentences.",
    "Write a 3-line story about a robot.",
    "List 5 ways AI helps healthcare.",
    "Explain Higgs boson in simple terms.",
    "Describe neural networks in 2 sentences.",
    "List 5 blog post ideas on renewable energy.",
    "Give a short metaphor for time.",
    "List 3 emerging trends in ML.",
    "Write a short limerick about programming.",
    "Explain supervised vs unsupervised learning in one sentence.",
    "List 3 ways to reduce urban traffic."
]

 Start = Time()
 Results = []
 "for prompt" in prompts
        results.append(ask_llm(prompt))
 End = Time()

 For i, use res to enumerate (results: 1)
        print(f"n--- Response {i} ---")
        print(res)

    print(f"n[Synchronous] Finished in {end - start:.2f} seconds")

If the __name__ equals "__main__":
 You can also read more about it here.()

Each request was processed one at a time, and the total runtime is equal to the sum of the requests’ duration. Since each request took time to complete, the overall runtime was much longer — 49.76 Check out the time in this example. Take a look at this FULL CODES here.

AsyncOpenAI can be imported from openai

# Create async clients
Client = AsyncOpenAI()

async def ask_llm(prompt: str):
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
 Choose return response.[0].message.content

async def main():
    prompts = [
    "Briefly explain quantum computing.",
    "Write a 3-line haiku about AI.",
    "List 3 startup ideas in agri-tech.",
    "Summarize Inception in 2 sentences.",
    "Explain blockchain in 2 sentences.",
    "Write a 3-line story about a robot.",
    "List 5 ways AI helps healthcare.",
    "Explain Higgs boson in simple terms.",
    "Describe neural networks in 2 sentences.",
    "List 5 blog post ideas on renewable energy.",
    "Give a short metaphor for time.",
    "List 3 emerging trends in ML.",
    "Write a short limerick about programming.",
    "Explain supervised vs unsupervised learning in one sentence.",
    "List 3 ways to reduce urban traffic."
]

 Start = Time()
    results = await asyncio.gather(*(ask_llm(p) for p in prompts))
 End = Time()

 For i, use res to enumerate (results: 1)
        print(f"n--- Response {i} ---")
        print(res)

    print(f"n[Asynchronous] Finished in {end - start:.2f} seconds")

If the __name__ equals "__main__":
    asyncio.run(main())

Instead of processing each prompt one at a time, the synchronous version started them all almost simultaneously. As a result, the total runtime was close to the time of the slowest single request — 8.25 Add the number of requests instead.

This is because in synchronous processing, every API call will block the program and wait until its completion. The time difference can be significant. With asyncio’s asynchronous implementation, API calls can run in parallel. The program is able to complete many tasks without waiting for responses.

What this means for AI applications

Waiting for each query to complete before moving on can become a bottleneck in real-world AI apps, particularly when there are multiple data sources or queries. It is common to see this in workflows like:

Generating content for multiple users simultaneously — e.g., chatbots, recommendation engines, or multi-user dashboards.
Calling the LLM several times in one workflow — such as for summarization, refinement, classification, or multi-step reasoning.
Fetching data from multiple APIs — for example, combining LLM output with information from a vector database or external APIs.

Asyncio is a great tool for these situations.

Improved performance — by making parallel API calls instead of waiting for each one sequentially, your system can handle more work in less time.
Cost effectiveness — faster execution can reduce operational costs, and batching requests where possible can further optimize usage of paid APIs.
Better user experience — concurrency makes applications feel more responsive, which is crucial for real-time systems like AI assistants and chatbots.
Scalability — asynchronous patterns allow your application to handle many more simultaneous requests without proportionally increasing resource consumption.

Take a look at the FULL CODES here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter.

I’m a Civil Engineering graduate (2022) at Jamia Millia Islamia in New Delhi. I’m interested in Data Science and especially Neural networks and how they can be applied in different areas.

What is Asyncio? How to get started with asynchronous Python with a LLM

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

Wired Roundup: Gemini 3 Launch, Nvidia Earnings and Epstein Files Fallout

Google Acquires Top talent from AI Voice Startup, Hume AI

Attend Our Livestream to Learn What GPT-5 Means to ChatGPT Users

Authors Are Posting TikToks to Protest AI Use in Writing—and to Prove They Aren’t Doing It

Anthropic agrees to pay authors at least $1.5 billion in AI Copyright Settlement

Top Insights

Senators Want to Know how Much Energy Data Centers Use

NVIDIA releases AITune, an open-source inference toolkit that automatically finds the fastest backend to any PyTorch model.

Latest News

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

What is Asyncio? How to get started with asynchronous Python with a LLM

What is Asyncio?

Get Started with Python asynchronous

Async and Synchronous Tasks: Example

Simulator Download

Asyncio for AI with an LLM

What this means for AI applications

Related Posts