PyTorch Label-Flipping Demonstration of Targeted Deep Learning Attacks by PyTorch

This tutorial shows how to perform a real-world data poisoning attack. We manipulate labels on the CIFAR-10 dataset, and observe its effect on the model’s behavior. In order to achieve stable and comparable learning dynamics, we build a training pipeline with a pure class, and one that contains a data poisoning attack. We use a ResNet convolutional net. We show that a subtle corruption of the pipeline during training can lead to systematic misclassification. Click here to see the FULL CODES here.

Import torch
Buy torch.nn in a nn
Download and import the torch.optim file as an optim
Import torchvision
import torchvision.transforms as transforms
DataLoader and Dataset can be imported from torch.utils.data
Import numpy as an np
Import matplotlib.pyplot into plt
Import Seaborn as SnsCONFIG=
from sklearn.metrics import confusion_matrix, classification_report


CONFIG = {
   "batch_size": 128,
   "epochs": 10,
   "lr": 0.001,
   "target_class": 1,
   "malicious_label": 9,
   "poison_ratio": 0.4,
}


torch.manual_seed(42)
np.random.seed(42)

In one location, we define the configuration of the entire experiment. By fixing the random seed across PyTorch, NumPy and NumPy we ensure reproducibility. The tutorial will run efficiently both on CPU and GPU. Take a look at the FULL CODES here.

class PoisonedCIFAR10(Dataset):
   def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
       self.dataset = original_dataset
       self.targets = np.array(original_dataset.targets)
       self.is_train = is_train
       if is_train and ratio > 0:
           indices = np.where(self.targets == target_class)[0]
           n_poison = int(len(indices) * ratio)
           poison_indices = np.random.choice(indices, n_poison, replace=False)
           self.targets[poison_indices] = malicious_label


   def __getitem__(self, index):
       img, _ = self.dataset[index]
 Self.targets - Return image[index]


   def __len__(self):
       return len(self.dataset)

Implementing a customized dataset wrapper allows for controlled label poisoning to occur during training. While the test data remains untouched, we selectively switch a configurable percentage of samples in the class target to the class malicious. So that the only thing compromised is label integrity, we preserve all original data. Take a look at the FULL CODES here.

def get_model():
   model = torchvision.models.resnet18(num_classes=10)
   model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
   model.maxpool = nn.Identity()
 Please return the model.["device"])


def train_and_evaluate(train_loader, description):
   model = get_model()
   optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
   criterion = nn.CrossEntropyLoss()
 For _ range (CONFIG["epochs"]):
       model.train()
 Labels for train_loader images:
 Images = images.to (CONFIG["device"])
 If you want to know what labels are, then just type in labels.["device"])
           optimizer.zero_grad()
 outputs = (images)
           loss = criterion(outputs, labels)
           loss.backward()
           optimizer.step()
 Model Return

Our lightweight ResNet-based models are tailored specifically for CIFAR-10, and we implement the entire training loop. To ensure convergence, we train the network with Adam and standard cross-entropy losses. The training logic is the same for poisoned and clean data, to eliminate the effects of data contamination. See the FULL CODES here.

def get_predictions(model, loader):
   model.eval()
   preds, labels_all = [], []
 The torch.no_grad():
 Labels for images in the loader
 Images = images.to (CONFIG["device"])
 Outputs = (images)
 The predicted output is: _.max(outputs)
           preds.extend(predicted.cpu().numpy())
           labels_all.extend(labels.numpy())
   return np.array(preds), np.array(labels_all)


def plot_results(clean_preds, clean_labels, poisoned_preds, poisoned_labels, classes):
 Figure, Ax = Plt.Subplots (1, 2, figsize= (16, 6).
 If i is (preds) then enumerate.[
       (clean_preds, clean_labels, "Clean Model Confusion Matrix"),
       (poisoned_preds, poisoned_labels, "Poisoned Model Confusion Matrix")
   ]):
       cm = confusion_matrix(labels, preds)
       sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax[i],
                   xticklabels=classes, yticklabels=classes)
 ax[i].set_title(title)
   plt.tight_layout()
   plt.show()

Then we collect the predictions and run an inference. Calculate confusion matrices for class-wise behaviors of both the clean and poisoned model. These visual diagnostics highlight specific misclassification patterns caused by the poisoning. See the FULL CODES here.

Compose = "transform"[
   transforms.RandomHorizontalFlip(),
   transforms.ToTensor(),
   transforms.Normalize((0.4914, 0.4822, 0.4465),
                        (0.2023, 0.1994, 0.2010))
])


base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
base_test = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)


clean_ds = PoisonedCIFAR10(base_trainCONFIG["target_class"]CONFIG["malicious_label"], ratio=0)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])


clean_loader = DataLoader(clean_ds, batch_size=CONFIG["batch_size"], shuffle=True)
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
test_loader = DataLoader(base_test, batch_size=CONFIG["batch_size"], shuffle=False)


clean_model = train_and_evaluate(clean_loader, "Clean Training")
poisoned_model = train_and_evaluate(poison_loader, "Poisoned Training")


c_preds, c_true = get_predictions(clean_model, test_loader)
p_preds, p_true = get_predictions(poisoned_model, test_loader)


plot_results(c_preds, c_true, p_preds, p_true, classes)


print(classification_report(c_true, c_preds, target_names=classes, labels=[1]))
print(classification_report(p_true, p_preds, target_names=classes, labels=[1]))

Prepare the CIFAR-10 dataset. Construct clean and polluted dataloaders. Execute both training pipelines. To ensure fairness, we evaluate trained models against a common test set. Finalize the analysis with class-specific recall and precision to reveal the impact of the poisoning.

As a conclusion, we found that data contamination at the label level can degrade class performance but not necessarily accuracy. The confusion matrixes as well as the per-class reports revealed that this was an attack that introduced specific failure modes. This experiment highlights the need for data validation and monitoring, particularly in safety-critical areas.

Take a look at the FULL CODES here. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.

Our latest releases of ai2025.devIt is an analytics platform focused on 2025, which turns the model launches and benchmarks as well as ecosystem activity, into a dataset that you can export, filter and compare.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

PyTorch Label-Flipping Demonstration of Targeted Deep Learning Attacks by PyTorch

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

The Executive Team and All Employees are AI Agents

Carl Pei believes that the phone of the future will only have one app

Data Centers have arrived at the edge of the Arctic Circle

HHS Is Utilizing AI Instruments From Palantir to Goal ‘DEI’ and ‘Gender Ideology’ in Grants

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Top Insights

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit

YouTube’s ‘Trending’ section is about to disappear

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

PyTorch Label-Flipping Demonstration of Targeted Deep Learning Attacks by PyTorch

Related Posts