This tutorial shows how to perform a real-world data poisoning attack. We manipulate labels on the CIFAR-10 dataset, and observe its effect on the model’s behavior. In order to achieve stable and comparable learning dynamics, we build a training pipeline with a pure class, and one that contains a data poisoning attack. We use a ResNet convolutional net. We show that a subtle corruption of the pipeline during training can lead to systematic misclassification. Click here to see the FULL CODES here.
Import torch
Buy torch.nn in a nn
Download and import the torch.optim file as an optim
Import torchvision
import torchvision.transforms as transforms
DataLoader and Dataset can be imported from torch.utils.data
Import numpy as an np
Import matplotlib.pyplot into plt
Import Seaborn as SnsCONFIG=
from sklearn.metrics import confusion_matrix, classification_report
CONFIG = {
"batch_size": 128,
"epochs": 10,
"lr": 0.001,
"target_class": 1,
"malicious_label": 9,
"poison_ratio": 0.4,
}
torch.manual_seed(42)
np.random.seed(42)
In one location, we define the configuration of the entire experiment. By fixing the random seed across PyTorch, NumPy and NumPy we ensure reproducibility. The tutorial will run efficiently both on CPU and GPU. Take a look at the FULL CODES here.
class PoisonedCIFAR10(Dataset):
def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
self.dataset = original_dataset
self.targets = np.array(original_dataset.targets)
self.is_train = is_train
if is_train and ratio > 0:
indices = np.where(self.targets == target_class)[0]
n_poison = int(len(indices) * ratio)
poison_indices = np.random.choice(indices, n_poison, replace=False)
self.targets[poison_indices] = malicious_label
def __getitem__(self, index):
img, _ = self.dataset[index]
Self.targets - Return image[index]
def __len__(self):
return len(self.dataset)
Implementing a customized dataset wrapper allows for controlled label poisoning to occur during training. While the test data remains untouched, we selectively switch a configurable percentage of samples in the class target to the class malicious. So that the only thing compromised is label integrity, we preserve all original data. Take a look at the FULL CODES here.
def get_model():
model = torchvision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
model.maxpool = nn.Identity()
Please return the model.["device"])
def train_and_evaluate(train_loader, description):
model = get_model()
optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
criterion = nn.CrossEntropyLoss()
For _ range (CONFIG["epochs"]):
model.train()
Labels for train_loader images:
Images = images.to (CONFIG["device"])
If you want to know what labels are, then just type in labels.["device"])
optimizer.zero_grad()
outputs = (images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
Model Return
Our lightweight ResNet-based models are tailored specifically for CIFAR-10, and we implement the entire training loop. To ensure convergence, we train the network with Adam and standard cross-entropy losses. The training logic is the same for poisoned and clean data, to eliminate the effects of data contamination. See the FULL CODES here.
def get_predictions(model, loader):
model.eval()
preds, labels_all = [], []
The torch.no_grad():
Labels for images in the loader
Images = images.to (CONFIG["device"])
Outputs = (images)
The predicted output is: _.max(outputs)
preds.extend(predicted.cpu().numpy())
labels_all.extend(labels.numpy())
return np.array(preds), np.array(labels_all)
def plot_results(clean_preds, clean_labels, poisoned_preds, poisoned_labels, classes):
Figure, Ax = Plt.Subplots (1, 2, figsize= (16, 6).
If i is (preds) then enumerate.[
(clean_preds, clean_labels, "Clean Model Confusion Matrix"),
(poisoned_preds, poisoned_labels, "Poisoned Model Confusion Matrix")
]):
cm = confusion_matrix(labels, preds)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax[i],
xticklabels=classes, yticklabels=classes)
ax[i].set_title(title)
plt.tight_layout()
plt.show()
Then we collect the predictions and run an inference. Calculate confusion matrices for class-wise behaviors of both the clean and poisoned model. These visual diagnostics highlight specific misclassification patterns caused by the poisoning. See the FULL CODES here.
Compose = "transform"[
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))
])
base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
base_test = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)
clean_ds = PoisonedCIFAR10(base_trainCONFIG["target_class"]CONFIG["malicious_label"], ratio=0)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])
clean_loader = DataLoader(clean_ds, batch_size=CONFIG["batch_size"], shuffle=True)
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
test_loader = DataLoader(base_test, batch_size=CONFIG["batch_size"], shuffle=False)
clean_model = train_and_evaluate(clean_loader, "Clean Training")
poisoned_model = train_and_evaluate(poison_loader, "Poisoned Training")
c_preds, c_true = get_predictions(clean_model, test_loader)
p_preds, p_true = get_predictions(poisoned_model, test_loader)
plot_results(c_preds, c_true, p_preds, p_true, classes)
print(classification_report(c_true, c_preds, target_names=classes, labels=[1]))
print(classification_report(p_true, p_preds, target_names=classes, labels=[1]))
Prepare the CIFAR-10 dataset. Construct clean and polluted dataloaders. Execute both training pipelines. To ensure fairness, we evaluate trained models against a common test set. Finalize the analysis with class-specific recall and precision to reveal the impact of the poisoning.
As a conclusion, we found that data contamination at the label level can degrade class performance but not necessarily accuracy. The confusion matrixes as well as the per-class reports revealed that this was an attack that introduced specific failure modes. This experiment highlights the need for data validation and monitoring, particularly in safety-critical areas.
Take a look at the FULL CODES here. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! What? now you can join us on telegram as well.
Our latest releases of ai2025.devIt is an analytics platform focused on 2025, which turns the model launches and benchmarks as well as ecosystem activity, into a dataset that you can export, filter and compare.
Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

