AI Listens to Your Keyboard: How Algorithms Steal Secrets Through Your Microphone

How Algorithms Can Steal Your Secrets Through Your Microphone

A team of researchers from British universities has trained a deep learning model capable of stealing data about keyboard keystrokes recorded via a microphone with up to 95% accuracy. When using Zoom to train the sound classification algorithm, the prediction accuracy dropped slightly to 93%. However, even this level remains dangerously high and is a record for this type of communication channel.

This type of attack poses a serious data security threat, as it can lead to the leakage of passwords, private conversations, messages, or other confidential information to malicious actors. Unlike other side-channel attacks that require special conditions, acoustic attacks are becoming easier due to the widespread use of high-quality microphones. Combined with the rapid development of machine learning, this makes acoustic attacks much more dangerous.

How the Attack Works

The first step of the attack is to record the targetโ€™s keyboard keystrokes, as this data is needed to train the prediction algorithm. This can be achieved using a nearby microphone or the targetโ€™s phone, which may have been infected with malware that has access to its microphone.

Alternatively, keystrokes can be recorded during a Zoom call, where a malicious meeting participant correlates the messages typed by the target with their audio recordings.

How the Researchers Trained the Model

The researchers collected training data by pressing 36 keys on a modern MacBook Pro and recording the sound of each keystroke 25 times. They then created spectrograms from these recordings, which allowed them to identify differences between each key.

They generated signals and spectrograms from the recordings to visualize the identifiable differences for each key and performed specific data processing steps to enhance the signals that could be used to identify keystrokes.

The spectrogram images were used to train ‘CoAtNet’, an image classifier. The process required experiments with learning rates and data splitting parameters before achieving the best prediction accuracy results.

In the experiments, the same laptop, an iPhone 13 mini, and Zoom were used. The CoAtNet classifier achieved 95% accuracy from smartphone recordings and 93% from Zoom recordings. Skype showed slightly lower, but still acceptable, accuracy at 91.7%.

How to Protect Yourself from Acoustic Attacks

  • Change your typing style or use random passwords.
  • Use software that plays fake keystroke sounds, white noise, or software audio filters for your keyboard.
  • Remember, the attack model was highly effective even against very quiet keyboards, so adding dampeners to mechanical keyboards or switching to membrane keyboards is unlikely to help.
  • Ultimately, using biometric authentication and password managers for automatic entry of sensitive information can also help keep users safe.

Leave a Reply