Gemini Flash, Few Shot Selection and Evolutionary Instruction Search: A complete workflow for automated prompt optimization

We will be moving away from the traditional approach of crafting prompts to one that is more systematic and programable. In this tutorial we treat prompts like variables, rather than as static text. We build a loop of optimization around Gemini Flash to experiment, evaluate, and select the best prompt configuration. This implementation shows how our model grows over time, showing that prompt engineering is more effective when it’s orchestrated with data-driven searches rather than intuition. See the Full Codes here.

import google.generativeai as genai
Import json
Import random
Type List, Tuple or Dict to import.
Import dataclasses from dataclasses
Numpy can be imported as np
From Collections Import Counter


def setup_gemini(api_key: str = None):
 If api_key equals None:
       api_key = input("Enter your Gemini API key: ").strip()
   genai.configure(api_key=api_key)
   model = genai.GenerativeModel('gemini-2.0-flash-exp')
   print("✓ Gemini 2.0 Flash configured")
 Return model


@dataclass
class Example
 Text:
 Sentiment:
 Def to_dict (self)Return
       return {"text": self.text, "sentiment": self.sentiment}


@dataclass
Class Prediction
 Sentiment:
 Because: str ""
 Confidence: floating = 1

To configure Gemini Flash, we import the required libraries and declare setup_gemini. The Example and Prediction classes are created to provide a structured, clean way of representing datasets and model outputs. See the Full Codes here.

Def Create_Dataset() -> Tuple[List[Example]List[Example]]:
   train_data = [
       Example("This movie was absolutely fantastic! Best film of the year.", "positive"),
       Example("Terrible experience, waste of time and money.", "negative"),
       Example("The product works as expected, nothing special.", "neutral"),
       Example("I'm blown away by the quality and attention to detail!", "positive"),
       Example("Disappointing and overpriced. Would not recommend.", "negative"),
       Example("It's okay, does the job but could be better.", "neutral"),
       Example("Incredible customer service and amazing results!", "positive"),
       Example("Complete garbage, broke after one use.", "negative"),
       Example("Average product, met my basic expectations.", "neutral"),
       Example("Revolutionary! This changed everything for me.", "positive"),
       Example("Frustrating bugs and poor design choices.", "negative"),
       Example("Decent quality for the price point.", "neutral"),
       Example("Exceeded all my expectations, truly remarkable!", "positive"),
       Example("Worst purchase I've ever made, avoid at all costs.", "negative"),
       Example("It's fine, nothing to complain about really.", "neutral"),
       Example("Absolutely stellar performance, 5 stars!", "positive"),
       Example("Broken and unusable, total disaster.", "negative"),
       Example("Meets requirements, standard quality.", "neutral"),
   ]
   val_data = [
       Example("Absolutely love it, couldn't be happier!", "positive"),
       Example("Broken on arrival, very upset.", "negative"),
       Example("Works fine, no major issues.", "neutral"),
       Example("Outstanding performance and great value!", "positive"),
       Example("Regret buying this, total letdown.", "negative"),
       Example("Adequate for basic use.", "neutral"),
   ]
 Return train_data and val_data


class PromptTemplate:
   def __init__(self, instruction: str = ""List[Example] = None):
 Self-instruction = Instruction
       self.examples = examples or []
   def format(self, text: str) -> str:
       prompt_parts = []
 If self-instruction is:
           prompt_parts.append(self.instruction)
 If you are self.examples
           prompt_parts.append("nExamples:")
 Examples of self-explanatory sentences
               prompt_parts.append(f"nText: {ex.text}")
               prompt_parts.append(f"Sentiment: {ex.sentiment}")
       prompt_parts.append(f"nText: {text}")
       prompt_parts.append("Sentiment:")
 Return to the Homepage "n".join(prompt_parts)
   def clone(self):
       return PromptTemplate(self.instruction, self.examples.copy())

We create a small, but varied sentiment dataset using the function create_dataset. Then, we define PromptTemplate which allows us to combine instructions, some examples and the current query in a single string. This template can be treated as an object that is programmable, allowing us to swap out instructions and examples for optimization. See the Full Codes here.

Class SentimentModel
   def __init__(self, model, prompt_template: PromptTemplate):
       self.model = model
       self.prompt_template = prompt_template


   def predict(self, text: str) -> Prediction:
       prompt = self.prompt_template.format(text)
       try:
           response = self.model.generate_content(prompt)
 Text.strip = result().lower()
 For sentiment ['positive', 'negative', 'neutral']:
 If sentiment is in the result
                   return Prediction(sentiment=sentiment, reasoning=result)
           return Prediction(sentiment="neutral", reasoning=result)
 Except as follows:
           return Prediction(sentiment="neutral", reasoning=str(e))


   def evaluate(self, dataset: List[Example]) -> float:
 Incorrect = 0,
 Example of dataset
           pred = self.predict(example.text)
           if pred.sentiment == example.sentiment:
 Correct = 1
       return (correct / len(dataset)) * 100

Gemini is wrapped in SentimentModel so that we can use it as a classifier. Formatting prompts is done via the generated_content method, then we post-process text in order to get one of the three emotions. Also, we add an evaluate method to measure the accuracy of any dataset in a single request. See the Full Codes here.

It is a class promptOptimizer.
   def __init__(self, model):
       self.model = model
       self.instruction_candidates = [
           "Analyze the sentiment of the following text. Classify as positive, negative, or neutral.",
           "Classify the sentiment: positive, negative, or neutral.",
           "Determine if this text expresses positive, negative, or neutral sentiment.",
           "What is the emotional tone? Answer: positive, negative, or neutral.",
           "Sentiment classification (positive/negative/neutral):",
           "Evaluate sentiment and respond with exactly one word: positive, negative, or neutral.",
       ]


   def select_best_examples(self, train_data: List[Example]List[Example], n_examples: int = 3) -> List[Example]:
       best_examples = None
       best_score = 0
 For example, _ is in range(10)
           examples_by_sentiment = {
               'positive': [e for e in train_data if e.sentiment == 'positive'],
               'negative': [e for e in train_data if e.sentiment == 'negative'],
               'neutral': [e for e in train_data if e.sentiment == 'neutral']
           }
 Select = []
 For sentiment ['positive', 'negative', 'neutral']:
               if examples_by_sentiment[sentiment]:
                   selected.append(random.choice(examples_by_sentiment[sentiment]))
  [e for e in train_data if e not in selected]
 While len (selected):
 Score = best_score
               best_examples = selected
       return best_examples


   def optimize_instruction(self, examples: List[Example]. val_data : List[Example]) -> str:
       best_instruction = self.instruction_candidates[0]
       best_score = 0
       for instruction in self.instruction_candidates:
           template = PromptTemplate(instruction=instruction, examples=examples)
           test_model = SentimentModel(self.model, template)
           score = test_model.evaluate(val_data)
           if score > best_score:
 Score = best_score
               best_instruction = instruction
       return best_instruction

The PromptOptimizer is introduced and a set of testable instructions are defined. Select_best_examples is used to find a diverse, small set of examples. Optimize_instruction scores each variant of an instruction based on the validation data. In essence, we are turning prompt design in to a search problem that involves examples and instructions. See the Full Codes here.

  def compile(self, train_data: List[Example]List[Example], n_examples: int = 3) -> PromptTemplate:
       best_examples = self.select_best_examples(train_data, val_data, n_examples)
       best_instruction = self.optimize_instruction(best_examples, val_data)
       optimized_template = PromptTemplate(instruction=best_instruction, examples=best_examples)
       return optimized_template


Def main():
   print("="*70)
   print("Prompt Optimization Tutorial")
   print("Stop Writing Prompts, Start Programming Them!")
   print("="*70)


 Setup_gemini model is:()
   train_data, val_data = create_dataset()
   print(f"✓ {len(train_data)} training examples, {len(val_data)} validation examples")


   baseline_template = PromptTemplate(
       instruction="Classify sentiment as positive, negative, or neutral.",
       examples=[]
   )
   baseline_model = SentimentModel(model, baseline_template)
   baseline_score = baseline_model.evaluate(val_data)


   manual_examples = train_data[:3]
   manual_template = PromptTemplate(
       instruction="Classify sentiment as positive, negative, or neutral.",
       examples=manual_examples
   )
   manual_model = SentimentModel(model, manual_template)
   manual_score = manual_model.evaluate(val_data)


   optimizer = PromptOptimizer(model)
   optimized_template = optimizer.compile(train_data, val_data, n_examples=4)

The compile method is used to merge the best example and instructions together into an optimized final PromptTemplate. Within main we configure Gemini. Build the dataset and test both a zero shot baseline and simple manual few-shots prompt. Next, we call our optimizer in order to generate a compiled and optimized prompt. See the Full Codes here.

optimized_model = SentimentModel(model, optimized_template)
   optimized_score = optimized_model.evaluate(val_data)


   print(f"Baseline (zero-shot):     {baseline_score:.1f}%")
   print(f"Manual few-shot:          {manual_score:.1f}%")
   print(f"Optimized (compiled):     {optimized_score:.1f}%")


   print(f"nInstruction: {optimized_template.instruction}")
   print(f"nSelected Examples ({len(optimized_template.examples)}):")
   for i, ex in enumerate(optimized_template.examples, 1):
       print(f"n{i}. Text: {ex.text}")
       print(f"   Sentiment: {ex.sentiment}")


   test_cases = [
       "This is absolutely amazing, I love it!",
       "Completely broken and unusable.",
       "It works as advertised, no complaints."
   ]


 For test_text, see test_cases
       print(f"nInput: {test_text}")
       pred = optimized_model.predict(test_text)
       print(f"Predicted: {pred.sentiment}")


   print("✓ Tutorial Complete!")


If __name__ is equal to "__main__":
 The main reason for this is that()

Evaluation of the model optimized and comparison with the base-line and the few-shot manual setups. Printing the instructions and examples allows us to inspect the results of the optimization. We then run some live tests sentences in order to observe the prediction. Then we summarize the changes and reinforce the notion that prompts are better written programmatically than by hand.

As a conclusion, we have implemented how programmatic promp optimization can provide a repeatable workflow that is evidence-driven for creating high-performing instructions. Starting with a weak baseline, we iteratively evaluated instructions, chose diverse examples and created an optimized template. This outperformed manual attempts. The process shows we are no longer dependent on trial-and error prompting, but instead orchestrated an optimized optimization cycle. We can also extend the pipeline to include new tasks, more complex datasets and advanced scoring methods. This will allow us to create prompts that are precise, confident and scalable.

Click here to find out more Full Codes here. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.

Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. This platform has over 2,000,000 monthly views which shows its popularity.

Gemini Flash, Few Shot Selection and Evolutionary Instruction Search: A complete workflow for automated prompt optimization

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

Chris Hayes offers some tips for staying up to date with news

A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

AI podcasters Want To Tell You How To Keep A Man Happy

ByteDance & DeepSeek Place Very Different AI Bets

Lisa Su, AMD’s CEO, says concerns about an artificial intelligence bubble are overblown

Top Insights

What is AI? AI Blog

What is the best way to create AI-ready APIs for your business?

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Gemini Flash, Few Shot Selection and Evolutionary Instruction Search: A complete workflow for automated prompt optimization

Related Posts