We will be moving away from the traditional approach of crafting prompts to one that is more systematic and programable. In this tutorial we treat prompts like variables, rather than as static text. We build a loop of optimization around Gemini Flash to experiment, evaluate, and select the best prompt configuration. This implementation shows how our model grows over time, showing that prompt engineering is more effective when it’s orchestrated with data-driven searches rather than intuition. See the Full Codes here.
import google.generativeai as genai
Import json
Import random
Type List, Tuple or Dict to import.
Import dataclasses from dataclasses
Numpy can be imported as np
From Collections Import Counter
def setup_gemini(api_key: str = None):
If api_key equals None:
api_key = input("Enter your Gemini API key: ").strip()
genai.configure(api_key=api_key)
model = genai.GenerativeModel('gemini-2.0-flash-exp')
print("✓ Gemini 2.0 Flash configured")
Return model
@dataclass
class Example
Text:
Sentiment:
Def to_dict (self)Return
return {"text": self.text, "sentiment": self.sentiment}
@dataclass
Class Prediction
Sentiment:
Because: str ""
Confidence: floating = 1
To configure Gemini Flash, we import the required libraries and declare setup_gemini. The Example and Prediction classes are created to provide a structured, clean way of representing datasets and model outputs. See the Full Codes here.
Def Create_Dataset() -> Tuple[List[Example]List[Example]]:
train_data = [
Example("This movie was absolutely fantastic! Best film of the year.", "positive"),
Example("Terrible experience, waste of time and money.", "negative"),
Example("The product works as expected, nothing special.", "neutral"),
Example("I'm blown away by the quality and attention to detail!", "positive"),
Example("Disappointing and overpriced. Would not recommend.", "negative"),
Example("It's okay, does the job but could be better.", "neutral"),
Example("Incredible customer service and amazing results!", "positive"),
Example("Complete garbage, broke after one use.", "negative"),
Example("Average product, met my basic expectations.", "neutral"),
Example("Revolutionary! This changed everything for me.", "positive"),
Example("Frustrating bugs and poor design choices.", "negative"),
Example("Decent quality for the price point.", "neutral"),
Example("Exceeded all my expectations, truly remarkable!", "positive"),
Example("Worst purchase I've ever made, avoid at all costs.", "negative"),
Example("It's fine, nothing to complain about really.", "neutral"),
Example("Absolutely stellar performance, 5 stars!", "positive"),
Example("Broken and unusable, total disaster.", "negative"),
Example("Meets requirements, standard quality.", "neutral"),
]
val_data = [
Example("Absolutely love it, couldn't be happier!", "positive"),
Example("Broken on arrival, very upset.", "negative"),
Example("Works fine, no major issues.", "neutral"),
Example("Outstanding performance and great value!", "positive"),
Example("Regret buying this, total letdown.", "negative"),
Example("Adequate for basic use.", "neutral"),
]
Return train_data and val_data
class PromptTemplate:
def __init__(self, instruction: str = ""List[Example] = None):
Self-instruction = Instruction
self.examples = examples or []
def format(self, text: str) -> str:
prompt_parts = []
If self-instruction is:
prompt_parts.append(self.instruction)
If you are self.examples
prompt_parts.append("nExamples:")
Examples of self-explanatory sentences
prompt_parts.append(f"nText: {ex.text}")
prompt_parts.append(f"Sentiment: {ex.sentiment}")
prompt_parts.append(f"nText: {text}")
prompt_parts.append("Sentiment:")
Return to the Homepage "n".join(prompt_parts)
def clone(self):
return PromptTemplate(self.instruction, self.examples.copy())
We create a small, but varied sentiment dataset using the function create_dataset. Then, we define PromptTemplate which allows us to combine instructions, some examples and the current query in a single string. This template can be treated as an object that is programmable, allowing us to swap out instructions and examples for optimization. See the Full Codes here.
Class SentimentModel
def __init__(self, model, prompt_template: PromptTemplate):
self.model = model
self.prompt_template = prompt_template
def predict(self, text: str) -> Prediction:
prompt = self.prompt_template.format(text)
try:
response = self.model.generate_content(prompt)
Text.strip = result().lower()
For sentiment ['positive', 'negative', 'neutral']:
If sentiment is in the result
return Prediction(sentiment=sentiment, reasoning=result)
return Prediction(sentiment="neutral", reasoning=result)
Except as follows:
return Prediction(sentiment="neutral", reasoning=str(e))
def evaluate(self, dataset: List[Example]) -> float:
Incorrect = 0,
Example of dataset
pred = self.predict(example.text)
if pred.sentiment == example.sentiment:
Correct = 1
return (correct / len(dataset)) * 100
Gemini is wrapped in SentimentModel so that we can use it as a classifier. Formatting prompts is done via the generated_content method, then we post-process text in order to get one of the three emotions. Also, we add an evaluate method to measure the accuracy of any dataset in a single request. See the Full Codes here.
It is a class promptOptimizer.
def __init__(self, model):
self.model = model
self.instruction_candidates = [
"Analyze the sentiment of the following text. Classify as positive, negative, or neutral.",
"Classify the sentiment: positive, negative, or neutral.",
"Determine if this text expresses positive, negative, or neutral sentiment.",
"What is the emotional tone? Answer: positive, negative, or neutral.",
"Sentiment classification (positive/negative/neutral):",
"Evaluate sentiment and respond with exactly one word: positive, negative, or neutral.",
]
def select_best_examples(self, train_data: List[Example]List[Example], n_examples: int = 3) -> List[Example]:
best_examples = None
best_score = 0
For example, _ is in range(10)
examples_by_sentiment = {
'positive': [e for e in train_data if e.sentiment == 'positive'],
'negative': [e for e in train_data if e.sentiment == 'negative'],
'neutral': [e for e in train_data if e.sentiment == 'neutral']
}
Select = []
For sentiment ['positive', 'negative', 'neutral']:
if examples_by_sentiment[sentiment]:
selected.append(random.choice(examples_by_sentiment[sentiment]))
[e for e in train_data if e not in selected]
While len (selected):
Score = best_score
best_examples = selected
return best_examples
def optimize_instruction(self, examples: List[Example]. val_data : List[Example]) -> str:
best_instruction = self.instruction_candidates[0]
best_score = 0
for instruction in self.instruction_candidates:
template = PromptTemplate(instruction=instruction, examples=examples)
test_model = SentimentModel(self.model, template)
score = test_model.evaluate(val_data)
if score > best_score:
Score = best_score
best_instruction = instruction
return best_instruction
The PromptOptimizer is introduced and a set of testable instructions are defined. Select_best_examples is used to find a diverse, small set of examples. Optimize_instruction scores each variant of an instruction based on the validation data. In essence, we are turning prompt design in to a search problem that involves examples and instructions. See the Full Codes here.
def compile(self, train_data: List[Example]List[Example], n_examples: int = 3) -> PromptTemplate:
best_examples = self.select_best_examples(train_data, val_data, n_examples)
best_instruction = self.optimize_instruction(best_examples, val_data)
optimized_template = PromptTemplate(instruction=best_instruction, examples=best_examples)
return optimized_template
Def main():
print("="*70)
print("Prompt Optimization Tutorial")
print("Stop Writing Prompts, Start Programming Them!")
print("="*70)
Setup_gemini model is:()
train_data, val_data = create_dataset()
print(f"✓ {len(train_data)} training examples, {len(val_data)} validation examples")
baseline_template = PromptTemplate(
instruction="Classify sentiment as positive, negative, or neutral.",
examples=[]
)
baseline_model = SentimentModel(model, baseline_template)
baseline_score = baseline_model.evaluate(val_data)
manual_examples = train_data[:3]
manual_template = PromptTemplate(
instruction="Classify sentiment as positive, negative, or neutral.",
examples=manual_examples
)
manual_model = SentimentModel(model, manual_template)
manual_score = manual_model.evaluate(val_data)
optimizer = PromptOptimizer(model)
optimized_template = optimizer.compile(train_data, val_data, n_examples=4)
The compile method is used to merge the best example and instructions together into an optimized final PromptTemplate. Within main we configure Gemini. Build the dataset and test both a zero shot baseline and simple manual few-shots prompt. Next, we call our optimizer in order to generate a compiled and optimized prompt. See the Full Codes here.
optimized_model = SentimentModel(model, optimized_template)
optimized_score = optimized_model.evaluate(val_data)
print(f"Baseline (zero-shot): {baseline_score:.1f}%")
print(f"Manual few-shot: {manual_score:.1f}%")
print(f"Optimized (compiled): {optimized_score:.1f}%")
print(f"nInstruction: {optimized_template.instruction}")
print(f"nSelected Examples ({len(optimized_template.examples)}):")
for i, ex in enumerate(optimized_template.examples, 1):
print(f"n{i}. Text: {ex.text}")
print(f" Sentiment: {ex.sentiment}")
test_cases = [
"This is absolutely amazing, I love it!",
"Completely broken and unusable.",
"It works as advertised, no complaints."
]
For test_text, see test_cases
print(f"nInput: {test_text}")
pred = optimized_model.predict(test_text)
print(f"Predicted: {pred.sentiment}")
print("✓ Tutorial Complete!")
If __name__ is equal to "__main__":
The main reason for this is that()
Evaluation of the model optimized and comparison with the base-line and the few-shot manual setups. Printing the instructions and examples allows us to inspect the results of the optimization. We then run some live tests sentences in order to observe the prediction. Then we summarize the changes and reinforce the notion that prompts are better written programmatically than by hand.
As a conclusion, we have implemented how programmatic promp optimization can provide a repeatable workflow that is evidence-driven for creating high-performing instructions. Starting with a weak baseline, we iteratively evaluated instructions, chose diverse examples and created an optimized template. This outperformed manual attempts. The process shows we are no longer dependent on trial-and error prompting, but instead orchestrated an optimized optimization cycle. We can also extend the pipeline to include new tasks, more complex datasets and advanced scoring methods. This will allow us to create prompts that are precise, confident and scalable.
Click here to find out more Full Codes here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.
Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. This platform has over 2,000,000 monthly views which shows its popularity.

