With tech giants investing more and more into AI, trying to sell the propaganda of building a data center in your own backyards, there now seems to be a group of tech industry insiders who are now striving to deliberately feed these AI models corrupted data.
The project is reportedly being called the “Poison Foundation,” and its goal is to encourage others to degrade the quality of the big tech AI models by feeding them false data.
The project is said to take inspiration from the Anthropic paper, which suggests that it only takes a few malicious documents to completely damage a model’s performance. This is what they refer to as the technology’s “Achilles heel,” which can be used to corrupt the model by feeding it bugs and subtle logic errors.
So what is AI Data Poisoning?

Now, to get a grasp of data poisoning, you will first need to understand that most machine learning models, or MLs, ingest tons and millions of data during the training phase. Much of the data that is fed can’t be completely accurate. However, when it comes to data poisoning, the data fed is not just imperfect; it’s intentionally manipulated by the attacker to corrupt the model’s behaviour.
This data poisoning takes place during the training and fine-tuning phase itself andcan be categorized into two types of attacks:
| Availability Attacks | This will degrade the overall accuracy and reliability of the AI model. |
| Integrity Attacks | This is also known as backdoor attacks, which will implant specific triggers that will occur only under certain conditions. |
The real concern with AI data poisoning is that it often remains undetected, it looks harmless, and will often pass through pipeliness and into training workflows. The malicious data will then quietly start to alter the outputs of the system, making the model more unreliable over time.
Now, most AI tech companies don’t train large models from scratch. They often just tune the program and rely on a third-party for the classifiers.
So they will automatically inherit any sort of poisoning into their system, which may have remained dormant backdoors from the early-stage training.
When comany does not train the AI models, they only focus on the outputs. However, just monitoring the output is not going to helpt htem catch “targeted integrity attacks” that has been engineered into the model during the training phase.
