Text Encryption and Cryptography: Methods, Algorithms, and Data Protection

Text Encryption

Encrypted messages are those in which letters and symbols are replaced according to a specific scheme. All ciphers use special signal symbols with the following meanings:

  1. “Empty” or “blind” symbols-symbols that have no meaning and are only used to mislead the uninitiated. For example, it may be agreed that every 5th letter (excluding vowels) is omitted, or every letter whose position is divisible by a certain number.
  2. Negative symbols-symbols that indicate the following message should be interpreted in the opposite sense. For example, after a negative symbol, the message “Come tomorrow, I’ll be waiting at eight o’clock” actually means not to come, as there is danger.
  3. Cancellation symbols-symbols that indicate the message is invalid, written under threat, or dictated.
  4. Change symbols-symbols that indicate the following word should be read using an alternate scheme, in reverse order, etc.

Template Cipher

It doesn’t take much ingenuity to read a letter written using a prearranged template (a cardboard plate with cutouts) that is placed over the text: only what is visible through the cutouts is to be read. A more complex template is a rotating grille, specially designed to reveal the entire filled square as it is rotated clockwise. Without the template key, the text remains an unsolved puzzle.

For example, one criminal writes to another: “Gold, silver, pistol, documents-hide them!” To be safe, he writes this in slang and then encrypts it using a grille key. The shaded cells in the left table represent the cutout windows.

To make decryption harder, the entire text can be written in a single line without spaces.

Encryption Using a “Floating Code”

This is one of the most complex ciphers, nearly impossible to break without the cipher table. Decryption usually relies on analyzing repeated symbols or the possible meaning of the message. The strength of this cipher is that no symbol repeats: the same letter can be represented by many different symbols, and each new message uses a new algorithm.

First, choose a cipher table-any book will do, as long as both sender and receiver have identical copies. For example, you might use a weekly newspaper. In the cipher, you indicate the page number, then pairs of line number (top to bottom) and letter number in the line. The line number can also be indicated by a letter.

Suppose we have the following text on page 26 of a book:

“Publishing today is unthinkable without computer systems, which expand creative horizons and allow you to realize all your ideas. Publishing programs are easy to master even for non-professionals. However, it’s not enough to just master the tools-they must be used properly. You also need basic knowledge of publishing, and an understanding of the publishing process. Without this knowledge, creating quality printed products is impossible.”

To encode the message “Today’s meeting is canceled,” it might look like: “26a9a17a21a22a23a24a25g26b17b37b38b5e5e6e2e3b24b2?2b10v7?11g3b37e3e20b6a25”. Numbers can be replaced with corresponding letters. The cipher can be made more complex by writing without spaces, breaking into columns or pairs, or reading right-to-left or bottom-to-top. A ruler with lettered lines can help speed up the process.

“Tarabar” Cipher

This is a simple and quick cipher that poses no problem for an experienced codebreaker and only protects messages from amateurs. All consonants of the Russian alphabet are arranged in two rows, one above the other. In the message, vowels remain unchanged, and each consonant is replaced by its counterpart from the opposite row. For example, “The meeting is compromised, recall the agents” becomes “????????????? ?????????????????.” The phrase can be split into letter pairs in columns, read bottom-to-top, or right-to-left.

Cipher Dictionary

This is another nearly unbreakable method. A dictionary of all possible words and phrases to be used in messages is compiled in advance, with each word assigned a unique alphanumeric code. These codes make up the encrypted message. You can also agree to change the order of letters in the codes each week, or shift them down by one, making decryption even harder. The downside is that the agent must possess the cipher dictionary, which could be discovered.

“Pricking” Method

This very old method was used by Russian underground revolutionaries. You take any book and, on a prearranged page, mark letters in the lines from top to bottom and left to right with barely visible pencil dots. Reading these letters in order reveals the hidden text. Another method is to prick selected letters with a needle, making sure not to pierce through to the next page. The pricks are visible when held up to the light. The book is then passed to the intended recipient, who reads the encrypted information.

There are many more coding systems-enough to fill several thick volumes. Here, we’ve covered only the simplest and most practical ones.

Cryptographic Data Protection

In addition to the information protection methods described above, which are only applicable within a system, users (and often the system itself) may use additional data protection tools-data encryption. When data is transmitted outside the system, it can be stolen, altered (for example, during network transmission), or accidentally copied by an interested party. The damage can be significant. For example, in bank transfers, all transmitted data is always encrypted!

Cryptography is the science of transforming data to make it useless to an adversary. The methods (algorithms) of such transformation are called ciphers. Any attempt by an interceptor to decrypt a cipher text or to encrypt their own plaintext to produce a plausible cipher text without the genuine key is called a cryptanalytic attack. If this is impossible, the system is called cryptographically secure. The security of a system is measured by the time required to break the cipher. The science of breaking ciphers is called cryptanalysis.

Cryptosystems and Encryption Principles

Based on the number of encryption keys, cryptosystems are divided into:

  • Symmetric cryptosystems (one key)
  • Asymmetric cryptosystems (two keys)

The general data encryption algorithm is as follows: there is plaintext to be encrypted and a password. During encryption, a so-called cipher gamma (also called a hash sum or cipher key) is generated from the password according to a specific rule and stored with the encrypted data. It is used to encrypt the data and later for authentication during decryption. Common gamma sizes are 128, 256, 512, and 1024 bits, depending on the cryptographic algorithm family. Highly secret military data may use a gamma of about 1344 bits. Text is encrypted in blocks, often 128 or 256 bits. Each block is transformed according to a rule, the cipher gamma is applied, and then the block is mixed several times-these are called passes, which ensure the encryption algorithm. The number of passes also affects the algorithm’s security.

Symmetric Cryptosystems

Also called secret-key cryptosystems. One key is used for both encryption and decryption with the same symmetric algorithm. This key is securely shared between the two parties before encrypted data is transmitted. Symmetric algorithms use relatively short keys and can quickly encrypt large amounts of data.

Common Symmetric Algorithms

  • DES (Data Encryption Standard): 64-bit block cipher, 64-bit key (56 bits used), 16 passes. Four modes: ECB, CBC, OFB, CFB.
  • 3-DES (Triple DES): 64-bit block cipher, uses DES three times with three different 56-bit keys. Very secure, but Blowfish, Twofish, or Rijndael are often preferred.
  • Cascade 3-DES: Triple DES with feedback mechanisms like CBC, OFB, or CFB. Highly secure.
  • FEAL: Block cipher, alternative to DES. Broken, but newer versions exist.
  • IDEA (International Data Encryption Algorithm): 64-bit block, 128-bit key, 8 passes. Twice as fast as DES and more secure.
  • Skipjack: Developed by the NSA, 64-bit block, 80-bit key, 32 passes.
  • RC2: 64-bit block, variable key size, about twice as fast as DES.
  • RC4: Stream cipher, byte-oriented, variable key size, about 10 times faster than DES.
  • RC5: Block size 32, 64, or 128 bits, key size 0-2048 bits, up to 255 passes.
  • CAST: 64-bit block, key size 40-256 bits, 8 passes.
  • Blowfish: 64-bit block, variable key size (32-448 bits), 16 passes, about 20 times faster than DES.
  • Twofish: Successor to Blowfish, NIST competition winner, 128- and 256-bit keys.
  • Rijndael: Chosen as the AES standard by NIST in 2000, 128- and 256-bit keys. 256-bit version is about 40% slower than 128-bit.
  • GOST 28147-89: 256-bit key, 64-bit block, very secure, no secrecy restrictions.
  • One-time pad device: Unbreakable cipher, key is as long as the data, used only once and then destroyed.

Stream Ciphers

Fast symmetric encryption algorithms, usually operating on bits rather than blocks. Designed as practical analogs to one-time pads, though not as secure.

Asymmetric Cryptosystems

Also called public-key cryptosystems. The encryption and decryption keys are different but mathematically related. One key (public) is shared with everyone and used for encryption; the other (private) is kept secret and used for decryption. Data encrypted with the public key can only be decrypted with the private key. Asymmetric systems require much longer keys than symmetric ones for equivalent security, which increases encryption time, though elliptic curve algorithms can mitigate this.

To avoid the slow speed of asymmetric encryption, a temporary symmetric session key is generated for each message. The message is encrypted with this session key and a symmetric algorithm. The session key is then encrypted with the recipient’s public key and an asymmetric algorithm. The recipient uses their private key to decrypt the session key, then uses it to decrypt the message.

It’s important that session and asymmetric keys provide comparable security. If a short session key is used, attackers will target it, not the asymmetric keys. If the private asymmetric key is compromised, all communications are at risk. This system is used in PGP keys.

Common Asymmetric Algorithms

  • RSA: Based on the difficulty of factoring large numbers. Used in PGP. There have been rumors of backdoors, but none confirmed.
  • DSA: Variable key length up to 1024 bits; ANSI X9.30-1.
  • ECC (Elliptic Curve Cryptography): Uses algebraic systems based on elliptic curves. Offers equivalent security with shorter keys and higher performance than other public-key systems.
  • ElGamal: Variant of Diffie-Hellman, used for encryption and digital signatures.
  • GOST R 34.10-94: Similar to DSS, but more secure due to larger parameters.

Digital Signatures

To sign a document, the sender encrypts it with their private key (usually just the checksum for efficiency). The recipient can decrypt it with the sender’s public key. The signed document can also be encrypted with the recipient’s public key (but you cannot sign encrypted messages). Key pairs for signing and encryption can be different, though common algorithms allow using the same pair for both (which reduces protocol strength).

  • DSS: DSA-based, US standard, slow.
  • RSA: RSA-based, PKCS 1 with MD2 or MD5.
  • ECC algorithms: Require procedures for resolving disputes (an arbitrator may request the private key to determine responsibility).

Cryptographically Secure Checksums (MAC, Hash, Digest)

Used to verify data integrity and for digital signatures. They also simplify visual comparison of public keys (fingerprints). Security means it’s hard to modify a message without changing the checksum or to generate a message with a specific checksum. The checksum should be at least 128 bits, preferably 160. HMAC (RFC 2104) combines a secure checksum (at least 128 bits) with encryption when transmitting the hash with the message.

  • MD2: 128 bits, slow, RFC 1319.
  • MD4: 128 bits, fast, RFC 1320, known flaws.
  • MD5: 128 bits, improved speed over MD4, RFC 1321; weak against collisions, though no practical collision attacks are known.
  • SHA, SHA-1: 160 bits, US standard, based on MD4, max message length 264.
  • GOST R 34.11-94: 256 bits, based on GOST 28147-89.

Data Encryption Tips

  • Do not use unknown cryptographic algorithms or software.
  • When choosing an algorithm, consider its parameters. For symmetric algorithms, Rijndael is preferred, but MARS, RC6, Cast-256, Twofish, or GOST 28147-89 are also good. For asymmetric, RSA (used in PGP) is standard, but DSA or ECC are alternatives.
  • Gamma size: For symmetric algorithms, use at least 256 bits, preferably 512 bits. For example, a 128-bit key has 3.4 x 1038 possible combinations, making it 1021 times more secure than a 56-bit DES key.
  • Data blocks: At least 128 bits. Smaller blocks reduce mixing and security, which depends on the number of passes.
  • Number of passes: 32 is sufficient, 64 is better.
  • Note: More secure algorithms increase encryption/decryption time. Choose your algorithm based on your data’s secrecy level. Even the most secret data can be cracked by special services if necessary. For small, highly secret data, use the strongest algorithm; otherwise, consider the time needed for decryption.
  • Use sequential encryption with several different algorithms and passwords. This reduces the chance of your data being compromised and/or increases the time required to break it.

Steganography: Another Step in Data Protection

In the 5th century BC, the tyrant Histiaeus, under the watch of King Darius in Susa, needed to send a secret message to his relative in Miletus. He shaved his slave’s head, tattooed the message on it, and waited for the hair to grow back before sending the slave. This is one of the earliest examples of steganography-the art of hidden writing-described by Herodotus.

The art evolved into a science, helping people hide not just the content but the very fact of communication. Ancient Romans wrote between the lines with invisible ink made from fruit juice, urine, milk, and other substances. This technique was even taught in Soviet schools, where children learned how to reveal secret messages written in milk by heating the paper.

During World War II, Germans used “microdots”-microphotographs the size of a printed dot, which, when magnified, revealed a full page of text. These dots were glued into ordinary letters, making them hard to detect and capable of carrying large amounts of information, including blueprints.

The spread of steganography during the war and widespread espionage led to many censorship restrictions. In the US, international mail bans included chess games, knitting and sewing instructions, newspaper clippings, and children’s drawings. Telegrams requesting specific flowers for delivery on certain dates were banned, and eventually, all international telegrams about flower delivery were prohibited. The situation in the USSR was even stricter.

Modern computer technology and communications have rendered such restrictions useless. Today, anyone can use steganography to hide information, which is especially useful in countries where strong cryptography is banned, and for copyright protection. For more on steganography, visit this resource.

Computer steganography is based on two principles:

  1. Files containing digital images or sound can be altered to some extent without losing functionality, unlike other data types that require absolute accuracy.
  2. Human senses cannot detect minor changes in image color or sound quality, especially in objects with redundant information, such as 16-bit sound or 24-bit images. Changing the least significant bits of a pixel’s color value does not noticeably affect the image.

Usually, before information is hidden (steganographed), it is also encrypted, further reducing the chance of data being compromised.

Digital Watermarks as a Steganography Variant

One of the most promising commercial applications of steganography is digital watermarking-creating invisible watermarks to protect copyrights on graphic and audio files. These watermarks can be detected by special programs, revealing information such as file creation date, copyright owner, and contact details. With rampant online piracy, this technology is invaluable.

Many companies offer watermarking products. One leader is Digimarc (digimarc.com), whose software is used by over a million users. You can download PictureMarc (a plugin for Photoshop and CorelDraw) or ReadMarc. Simply open a graphic file in your favorite program and read the hidden information, if present. You can also get your own Creator ID (free for one year) to sign your works before posting them online. Corporate users can use MarcSpider to scan the web for unauthorized use of their images.

However, the resilience of digital watermarks is limited. They can survive changes in brightness, contrast, special effects, even printing and rescanning, but not specialized eraser programs like StirMark (link) and UnZign (link). These tools allow users to independently assess watermark strength, and currently, all watermarks can be removed without noticeable image quality loss.

Irrecoverable Data Deletion

There are many reports of confidential information left on hard drives of computers sold at liquidation auctions: employee personal data, bank account numbers, credit card numbers, payroll, internal documents, and board meeting materials. Confidential information must not only be stored securely but also destroyed securely! Deleting files or formatting hard drives does not guarantee data cannot be recovered with special software or hardware tools. Only specially designed programs using complex algorithms can guarantee secure deletion.

Data is stored on hard drives in sectors, which may be scattered or sequential. The FAT table at the start of the disk lists files and the sectors they occupy. When a file is deleted, its sectors are marked as empty, and the file entry is removed from the FAT. In other words, deleting a file only removes the pointer to its location. This is a loophole for authorities to find prohibited information. Data recovery systems work with both FAT and NTFS file systems and are available to regular users. However, recovery is only possible if new data hasn’t overwritten the deleted file’s sectors. On large hard drives, data from a month or more ago can often be recovered, though sometimes even recently deleted data is lost forever after a single game session. The same applies to data recovery after formatting a partition.

Irrecoverable Partition Formatting

Similarly, when formatting a partition, only the disk’s header structures (like the FAT table) are overwritten, while the actual data remains on the disk.

Leave a Reply