We explore this topic in detail. LitServeIt is an easy-to-use and powerful framework for serving APIs that allow us to quickly deploy machine learning algorithms. We develop and test a number of endpoints which demonstrate practical functionalities, including text generation, batching and streaming, as well as multi-task processing and caching. These are all performed locally and without external APIs. We will have a clear understanding of how to build scalable, flexible and efficient ML-serving pipelines for production applications. Click here to view the FULL CODES here.
The 'pip installation of litserve torches -q
Download litserve ls
Buy a torch
Transformers Import Pipeline
import time
Typing import list
Start by installing LitServe and PyTorch on Google Colab. Importing the necessary libraries and modules will help us define, test, and serve our APIs effectively. Visit the FULL CODES here.
class TextGeneratorAPI(ls.LitAPI):
Def Setup(self and device)
self.model = pipeline("text-generation", model="distilgpt2", device=0 if device == "cuda" and torch.cuda.is_available() Then -1)
self.device = device
def decode_request(self, request):
Please return this request["prompt"]
def predict(self, prompt):
result = self.model(prompt, max_length=100, num_return_sequences=1, temperature=0.8, do_sample=True)
Return to resultReturn[0]['generated_text']
def encode_response(self, output):
return {"generated_text": output, "model": "distilgpt2"}
class BatchedSentimentAPI(ls.LitAPI):
Def Setup(self and device)
self.model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=0 if device == "cuda" and torch.cuda.is_available() Then -1)
def decode_request(self, request):
Please return this request["text"]
def batch(self, inputs: List[str]) -> List[str]:
return inputs
def predict(self, batch: List[str]):
Results = Self.model(batch).
Results of the return
def unbatch(self, output):
Return outputReturn
def encode_response(self, output):
return {"label"The output["label"], "score": float(output["score"]), "batched": True}
We create here two LitServe-APIs: one for text creation using a DistilGPT2 local model, and the other for batch sentiment analysis. The APIs are defined by how they decode incoming requests and perform inference. They then return structured responses. Click here to see the FULL CODES here.
class StreamingTextAPI(ls.LitAPI):
Def Setup(self and device)
self.model = pipeline("text-generation", model="distilgpt2", device=0 if device == "cuda" and torch.cuda.is_available() Then -1)
def decode_request(self, request):
Please return this request["prompt"]
def predict(self, prompt):
words = ["Once", "upon", "a", "time", "in", "a", "digital", "world"]
For word by words
time.sleep(0.1)
yield word + " "
def encode_response(self, output):
For token output:"
yield {"token": token}
This section outlines a text generation API which emits tokens in real-time. LitServe’s ability to generate continuous tokens is demonstrated by yielding one word at a moment. Click here to see the FULL CODES here.
class MultiTaskAPI(ls.LitAPI):
Def Setup(self and device)Return
self.sentiment = pipeline("sentiment-analysis", device=-1)
self.summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-6-6", device=-1)
self.device = device
def decode_request(self, request):
return {"task"Please contact us if you have any questions..get("task", "sentiment"), "text": request["text"]}
def predict (self inputs, outputs)
Task = inputs["task"]
text = inputs["text"]
if task== "sentiment":
"text" = resultReturn[0]
return {"task": "sentiment", "result": result}
Elif Task == "summarize":
If len(text.split())
Now we develop a multitask API which handles sentiment analysis as well as summarization through a single endpoint. This code snippet shows how to manage multiple pipelines using a single interface. Each request is dynamically routed according to its task. Click here to view the FULL CODES here.
class CachedAPI(ls.LitAPI):
Def Setup(self and device)
self.model = pipeline("sentiment-analysis", device=-1)
self.cache = {}
self.hits = 0
self.misses = 0
def decode_request(self, request):
Please return this request["text"]
def predict(self, text):
If text is in cache:
self.hits += 1
Return self-cache[text]True
self.misses += 1
Self.model = result[0]
self.cache[text] Example:
False
def encode_response(self, output):
Result from cache = outputReturn
return {"label"The result["label"], "score": float(result["score"]), "from_cache": from_cache, "cache_stats": {"hits": self.hits, "misses": self.misses}}
Use our API to implement a new application. caching To reduce redundant computations for repeated requests, previous inference results can be stored. Cache hits and misses are tracked in real-time, showing how caching can dramatically improve performance. See the FULL CODES here.
Test local APIs with def test_apis():
print("=" * 70)
print("Testing APIs Locally (No Server)")
print("=" * 70)
TextGeneratorAPI = api1(); api1.setup("cpu")
decoded = api1.decode_request({"prompt": "Artificial intelligence will"})
result = api1.predict(decoded)
encoded = api1.encode_response(result)
print(f"✓ Result: {encoded['generated_text'][:100]}...")
api2 = BatchedSentimentAPI(); api2.setup("cpu")
Texts = ["I love Python!", "This is terrible.", "Neutral statement."]
decoded_batch = [api2.decode_request({"text": t}) for t in texts]
batched = api2.batch(decoded_batch)
results = api2.predict(batched)
unbatched = api2.unbatch(results)
for i, r in enumerate(unbatched):
encoded = api2.encode_response(r)
print(f"✓ '{texts[i]}' -> {encoded['label']} ({encoded['score']:.2f})")
MultiTaskAPI = MultiTaskAPI(); api3.setup("cpu")
decoded = api3.decode_request({"task": "sentiment", "text": "Amazing tutorial!"})
result = api3.predict(decoded)
print(f"✓ Sentiment: {result['result']}")
CachedAPI = api4(); api4.setup("cpu")
test_text = "LitServe is awesome!"
for i in range(3):
decoded = api4.decode_request({"text": test_text})
result = api4.predict(decoded)
encoded = api4.encode_response(result)
print(f"✓ Request {i+1}: {encoded['label']} (cached: {encoded['from_cache']})")
print("=" * 70)
print("✅ All tests completed successfully!")
print("=" * 70)
test_apis_locally()
All our APIs are tested locally, to ensure their accuracy and correctness. This is done without launching an external server. Our LitServe system is tested sequentially for text generation and sentiment analysis. We also test multi-tasking and caching to ensure that each part of the LitServe works smoothly.
We demonstrate the flexibility of LitServe by creating and running diverse APIs. LitServe allows us to experiment with a variety of features, including text generation and sentiment analysis.MLHugging Face integration. After completing the tutorial we see how LitServe can simplify model deployment workflows. This allows us to use intelligent ML systems with just a few Python lines, maintaining simplicity, flexibility and performance.
Take a look at the FULL CODES here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.
Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

