Researchers Show How to Outsmart the Best Deepfake Detectors
A team of scientists from the University of California, San Diego has demonstrated that even the most advanced deepfake detection systems can be deceived. The key is to insert adversarial examples or manipulated input data into every video frame of a deepfake.
Adversarial examples are slightly altered inputs that cause artificial intelligence systems to make mistakes. Notably, this method works even after the video has been compressed.
โOur work shows that attacks on deepfake detectors can be a real threat,โ said study co-author Shehzeen Hussain. According to her, itโs possible to create deepfakes without any understanding of the machine learning model used by the detector.
How Deepfake Detectors Work
Typical deepfake detectors focus on faces in videos: they first track the faces, then feed the facial data into a neural network that determines whether the face is real or fake. For example, deepfakes often fail to reproduce blinking, so detectors focus on eye movements.
If attackers have some knowledge of how detectors work, they can design input data to target the detectorsโ blind spots.
The Attack Method
The researchers created an adversarial example for each face in a video frame. The algorithm evaluates a set of input transformations, just as the model evaluates real or fake images. It then uses this evaluation to transform the images in a way that remains effective even after compression and decompression. The altered version of the face is inserted into the video frames. This process is repeated for every frame to create the final deepfake video.
Testing the Deepfakes
The researchers tested their deepfakes in two scenarios:
- Full Access: Hackers have complete access to the detector model, including the face extraction pipeline, model architecture, and classification parameters.
- Limited Access: Attackers can only query the machine learning model to determine the probability that a frame will be classified as real or fake.
In the first scenario, the attack success rate for uncompressed videos exceeded 99%. For compressed videos, it was 84.96%. In the second scenario, the success rate was 86.43% for uncompressed videos and 78.33% for compressed ones.
The team chose not to publish their code to prevent misuse by malicious actors.
Improving Deepfake Detectors
To improve detectors, the researchers recommend an approach similar to adversarial training: during training, an adversary continues to generate new deepfakes while the detector keeps improving.
Previously, researchers from Binghamton University and Intel proposed detecting deepfakes based on invisible skin color changes caused by blood flow. The photoplethysmography method allows for tracking changes in blood flow using an infrared or light source and a photoresistor or phototransistor.