The performance of AI applications is important. You may have noticed that while working with Large Language Models (LLMs), a lot of time is spent waiting—waiting for an API response, waiting for multiple calls to finish, or waiting for I/O operations.
This is where asyncio can help. Many developers are using LLMs, but they don’t realize that asynchronous programming can help them speed up apps.
You will learn how to:
- What is the asyncio language?
- Start using asynchronous Python
- Use of asyncio with an LLM in an AI-application
What is Asyncio?
Asyncio, the Python library for concurrent programming using async/await allows multiple I/O tasks to be run within one thread. At its core, asyncio works with awaitable objects—usually coroutines—that an event loop schedules and executes without blocking.
Simpler still, synchronous codes run tasks sequentially like standing in one grocery line. While asynchronous codes perform tasks concurrently like multiple self-checkouts. It is particularly useful when making API calls, such as OpenAI, Anthropic or Hugging Face, where the majority of time is spent in waiting for responses. This allows for much quicker execution.
Get Started with Python asynchronous
Async and Synchronous Tasks: Example
We ran the same function in this example three times synchronously. Each call to say_hello produces the same output.() Prints “Hello…”Then, it prints. “…World!”. Since the calls happen one after another, the wait time adds up — 2 seconds × 3 calls = 6 seconds total. Take a look at the FULL CODES here.
import time
Say hello to def():
print("Hello...")
Simulate waiting by using time.sleep(2)
print("...World!")
Def main():
say_hello()
say_hello()
say_hello()
If the __name__ equals "__main__":
Start = Time()
You can also read more about it here.()
print(f"Finished in {time.time() - start:.2f} seconds")
Below code shows all three call to say_hello() Function started at almost the same time. Each prints “Hello…” Immediately, wait two seconds and then print “…World!”.
The sum of wait times (6 seconds for the synchronous variant) is reduced to a mere 2 seconds because these tasks were run in parallel, rather than sequentially. Asyncio is a better choice for tasks that are I/O bound. See the FULL CODES here.
import nest_asyncio, asyncio
nest_asyncio.apply()
import time
async def say_hello():
print("Hello...")
Asyncio.sleep(2) # Simulate waiting, like an API call
print("...World!")
async def main():
# Execute tasks at the same time
Asyncio.gather (attend)
say_hello(),
say_hello(),
say_hello()
)
If the __name__ equals "__main__":
Start = Time()
asyncio.run(main())
print(f"Finished in {time.time() - start:.2f} seconds")
Simulator Download
Imagine that you want to download multiple files. While each download is time-consuming, the program could be working on another download instead of being idle.
import asyncio
Random Import
import time
async def download_file(file_id: int):
print(f"Start downloading file {file_id}")
Simulation of variable download times: download_time = Random.Uniform(1, 3).
await asyncio.sleep(download_time) # non-blocking wait
print(f"Finished downloading file {file_id} in {download_time:.2f} seconds")
Return f"File {file_id} content"
async def main():
files = [1, 2, 3, 4, 5]
start_time = time.time()
# Downloads should be done simultaneously
results = await asyncio.gather(*(download_file(f) for f in files))
end_time = time.time()
print("nAll downloads completed.")
print(f"Total time taken: {end_time - start_time:.2f} seconds")
print("Results:", results)
If the __name__ equals "__main__":
asyncio.run(main())
- As shown in the screenshot, all downloads began at almost the same time. “Start downloading file X” All of a sudden, lines appear.
- The time it took to complete each file was different. “download” Simulated with Asyncio.Sleep()), so they finished at different times — file 3 finished first in 1.42 seconds, and file 1 last in 2.67 seconds.
- Because all downloads ran simultaneously, the time total was approximately equal to the single longest download (2.68 second), and not the sum.
This demonstrates the power of asyncio — when tasks involve waiting, they can be done in parallel, greatly improving efficiency.
Asyncio for AI with an LLM
We now understand the asyncio concept. Now let’s use it in a real AI application. OpenAI GPT’s Large Language Models often require multiple API calls, each of which takes time. We waste time waiting on responses if we make these API calls in succession.
We’ll use 15 short prompts to clearly demonstrate the performance difference. The performance differences will be demonstrated using 15 brief prompts. Take a look at the FULL CODES here.
import asyncio
AsyncOpenAI can be imported from openai
Import os
Getpass Import
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
import time
OpenAI import OpenAI
Synchronize client
Client = OpenAI()
Asking a question with def ask_llm (prompt: str).
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
Choose return response.[0].message.content
Def main():
prompts = [
"Briefly explain quantum computing.",
"Write a 3-line haiku about AI.",
"List 3 startup ideas in agri-tech.",
"Summarize Inception in 2 sentences.",
"Explain blockchain in 2 sentences.",
"Write a 3-line story about a robot.",
"List 5 ways AI helps healthcare.",
"Explain Higgs boson in simple terms.",
"Describe neural networks in 2 sentences.",
"List 5 blog post ideas on renewable energy.",
"Give a short metaphor for time.",
"List 3 emerging trends in ML.",
"Write a short limerick about programming.",
"Explain supervised vs unsupervised learning in one sentence.",
"List 3 ways to reduce urban traffic."
]
Start = Time()
Results = []
"for prompt" in prompts
results.append(ask_llm(prompt))
End = Time()
For i, use res to enumerate (results: 1)
print(f"n--- Response {i} ---")
print(res)
print(f"n[Synchronous] Finished in {end - start:.2f} seconds")
If the __name__ equals "__main__":
You can also read more about it here.()
Each request was processed one at a time, and the total runtime is equal to the sum of the requests’ duration. Since each request took time to complete, the overall runtime was much longer — 49.76 Check out the time in this example. Take a look at this FULL CODES here.
AsyncOpenAI can be imported from openai
# Create async clients
Client = AsyncOpenAI()
async def ask_llm(prompt: str):
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
Choose return response.[0].message.content
async def main():
prompts = [
"Briefly explain quantum computing.",
"Write a 3-line haiku about AI.",
"List 3 startup ideas in agri-tech.",
"Summarize Inception in 2 sentences.",
"Explain blockchain in 2 sentences.",
"Write a 3-line story about a robot.",
"List 5 ways AI helps healthcare.",
"Explain Higgs boson in simple terms.",
"Describe neural networks in 2 sentences.",
"List 5 blog post ideas on renewable energy.",
"Give a short metaphor for time.",
"List 3 emerging trends in ML.",
"Write a short limerick about programming.",
"Explain supervised vs unsupervised learning in one sentence.",
"List 3 ways to reduce urban traffic."
]
Start = Time()
results = await asyncio.gather(*(ask_llm(p) for p in prompts))
End = Time()
For i, use res to enumerate (results: 1)
print(f"n--- Response {i} ---")
print(res)
print(f"n[Asynchronous] Finished in {end - start:.2f} seconds")
If the __name__ equals "__main__":
asyncio.run(main())
Instead of processing each prompt one at a time, the synchronous version started them all almost simultaneously. As a result, the total runtime was close to the time of the slowest single request — 8.25 Add the number of requests instead.
This is because in synchronous processing, every API call will block the program and wait until its completion. The time difference can be significant. With asyncio’s asynchronous implementation, API calls can run in parallel. The program is able to complete many tasks without waiting for responses.
What this means for AI applications
Waiting for each query to complete before moving on can become a bottleneck in real-world AI apps, particularly when there are multiple data sources or queries. It is common to see this in workflows like:
- Generating content for multiple users simultaneously — e.g., chatbots, recommendation engines, or multi-user dashboards.
- Calling the LLM several times in one workflow — such as for summarization, refinement, classification, or multi-step reasoning.
- Fetching data from multiple APIs — for example, combining LLM output with information from a vector database or external APIs.
Asyncio is a great tool for these situations.
- Improved performance — by making parallel API calls instead of waiting for each one sequentially, your system can handle more work in less time.
- Cost effectiveness — faster execution can reduce operational costs, and batching requests where possible can further optimize usage of paid APIs.
- Better user experience — concurrency makes applications feel more responsive, which is crucial for real-time systems like AI assistants and chatbots.
- Scalability — asynchronous patterns allow your application to handle many more simultaneous requests without proportionally increasing resource consumption.
Take a look at the FULL CODES here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter.


