The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

We will explore the property-based test using this tutorial. Hypothesis Build a testing pipeline with rigor that is far more rigorous than traditional unit testing. Invariants, differential tests, metamorphics, stateful testing and targeted exploration are used to verify both the functional correctness of our system and its behavioral guarantees. We let Hypothesis create structured inputs and shrink errors to minimal counterexamples. This allows us to systematically discover hidden bugs. We show how to integrate modern testing techniques directly into experimental or research-driven workflows.

import sys, textwrap, subprocess, os, re, math
!{sys.executable} -m pip -q install hypothesis pytest


test_code = r'''
The import of math
Import pytest
From hypothesis import
   given, assume, example, settings, note, target,
 HealthCheck Phase
)
Import strategies for hypotheses
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant, initialize, precondition


def clamp(x: int, lo: int, hi: int) -> int:
 If x is hi:
 Return Hi
 Return to x


def normalize_whitespace(s: str) -> str:
 Return to the Homepage " ".join(s.split())


def is_sorted_non_decreasing(xs):
 Return all (xs)[i]

Installing Hypothesis, pytest, and all the required modules will set up your environment. We begin constructing the full test suite by defining core utility functions such as clamp, normalize_whitespace, and merge_sorted. In this step, we lay the groundwork for later tests that will be based on property checks.

Def safe_parse_int():
 T = strip()
 If fullmatch is re."[+-]?d+"The underlying t-value is not None.
 Return (False) "not_an_int")
 if len.t.lstrip"+-")) > 2000:
 Return (False) "too_big")
   try:
 Return (True int(t), t)
 The exception:
 Return (False) "parse_error")


def safe_parse_int_alt(s: str):
 T = strip()
 If not:
 Return (False) "not_an_int")
 Sign = 1
 If you t[0] == "+":
 T = t[1:]
 The elif is a[0] == "-":
 sign = -1
 T = t[1:]
 If t is not present, then (ch  "9" For ch, use t:
 Return (False) "not_an_int")
   if len(t) > 2000:
 Return (False) "too_big")
 Val = 0
 The ch is a t-word.
       val = val * 10 + (ord(ch) - 48)
   return (True, sign * val)


bounds = st.tuples(st.integers(-10_000, 10_000), st.integers(-10_000, 10_000)).map(
 Lambda T:[0]The t[1]) if t[0]

Our parsing strategies are structured and we define them to generate meaningful, constrained test inputs. In order to control input space precisely for the property validation, we create composite strategies like int_like_strings. In order to enable invariant and differential testing, we prepare bounds strategies and sorted lists generators.

@settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow])
@given(x=st.integers(-50_000, 50_000), b=bounds)
def test_clamp_within_bounds(x, b):
   lo, hi = b
   y = clamp(x, lo, hi)
   assert lo

The core properties tests we define validate the correctness of multiple functions. We utilize Hypothesis decorators in order to explore and validate edge cases, and behavioral guarantees like boundary constraints and normalized deterministic behavior. To ensure that our merging implementation is consistent with a reference, we also use differential testing.

@settings(max_examples=250, deadline=200, suppress_health_check=[HealthCheck.too_slow])
@given(s=int_like_strings())
def test_two_parsers_agree_on_int_like_strings(s):
   ok1, v1 = safe_parse_int(s)
   ok2, v2 = safe_parse_int_alt(s)
 Assert ok1 ok2
   assert v1 == v2


@settings(max_examples=250)
@given(s=st.text(min_size=0, max_size=200))
def test_safe_parse_int_rejects_non_ints(s):
 Strip = s.t()
 Fullmatch(r) m"[+-]?d+", t)
 Safe_parse_int() is ok.
 If m = None, then:
 False assertion assert that ok
   else:
 If len.lstrip"+-")) > 2000:
 "Val ==" is the equivalent of asserting that ok and False. "too_big"
       else:
 The assertion ok and the isinstance(val int)


Def variation(xs).
 If len(xs), then if it is less than 0.0
       k = 7
       assert math.isclose(variance([x + k for x in xs]), v, rel_tol=1e-12, abs_tol=1e-12)

We use targeted exploration to extend validation of parsing robustness, statistical correctness, and the quality of our results. Two independent integer parsers must agree to accept structured inputs. We enforce reject rules for invalid strings. Further, we implement metamorphic test by validating the invariants of variation under transformation.

Class Bank
   def __init__(self):
       self.balance = 0
       self.ledger = []


   def deposit(self, amt: int):
       if amt  self.balance:
           raise ValueError("insufficient funds")
 Amt = self.balance
       self.ledger.append(("wd", amt))


   def replay_balance(self):
 Bal = 0.
 Self-ledger for type, amt:
           bal += amt if typ == "dep" else -amt
       return bal


class BankMachine(RuleBasedStateMachine):
   def __init__(self):
 You can also check out our supersized().__init__()
 Self-bank = Bank()


   @initialize()
   def init(self):
 "assert self.bank.balance==0"
       assert self.bank.replay_balance() == 0


   @rule(amt=st.integers(min_value=1, max_value=10_000))
   def deposit(self, amt):
       self.bank.deposit(amt)


   @precondition(lambda self: self.bank.balance > 0)
   @rule(amt=st.integers(min_value=1, max_value=10_000))
 Define withdraw (self, amount)
       assume(amt = 0


   @invariant()
   def ledger_replay_matches_balance(self):
       assert self.bank.replay_balance() == self.bank.balance


TestBankMachine => BankMachine.TestCase
'''


Path = "/tmp/test_hypothesis_advanced.py"
Open(path) "w", encoding="utf-8"( f as a:
   f.write(test_code)


print("Hypothesis version:", __import__("hypothesis").__version__)
print("nRunning pytest on:", path, "n")


res = subprocess.run([sys.executable, "-m", "pytest", "-q", path], capture_output=True, text=True)
print(res.stdout)
if returncode.res. = 0
   print(res.stderr)


If res.returncode is equal to 0, then:
   print("nAll Hypothesis tests passed.")
elif returncode == 0:
   print("nPytest collected no tests.")
else:
   print("nSome tests failed.")

In order to simulate a real bank account, we implement a system that is stateful using Hypothesis’ rule-based state machines. In order to ensure balance consistency and ledger accuracy under any operation sequence, we create rules, conditions, and invariants. The entire test suite is then executed via Pytest. Hypothesis can automatically detect counterexamples, and check system correctness.

As a conclusion, we developed a property-based test framework which validates functions with no state, logic parsing, statistical behavior and even systems with states that have invariants. Hypothesis’ shrinking, state machine, and targeted search capabilities allowed us to transition from example-based to behavior-driven testing. We can reason at a much higher abstraction level about correctness while maintaining strong guarantees on edge cases and consistency of the system.

Check out the Full Coding Notebook here. Also, feel free to follow us on Twitter Don’t forget about our 130k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Want to promote your GitHub repo, Hugging Face page, Product release or Webinar?? Connect with us

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.

Top 19 AI Red Teaming Tools (2026): Secure Your ML Models

OpenAI should stop naming its creations after products that already exist

HHS Is Utilizing AI Instruments From Palantir to Goal ‘DEI’ and ‘Gender Ideology’ in Grants

Hackers hijacked Google Gemini AI with a poisoned calendar invite to take over a smart home

AI Is Eliminating Jobs for Younger Workers

Some of the viral AI Fruit videos have a very dark undertone

Top Insights

SpeechBrain: Building an Automatic Speech Recognition Pipeline (ASR), and Speech Enhancement in Python

EraRAG is a multi-layer, scalable graph-based retrieval system that can be used for dynamic and growing corpora.

Latest News

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Related Posts