How Mobile Operators Store and Process Your Data: Understanding Pre-Billing

Pre-Billing: How Mobile Operators Store and Process Your Data

Mobile operators collect a vast amount of data and metadata that can reveal a lot about each subscriber’s life. By understanding how this data is processed and stored, you can trace the entire information flow-from making a call to the deduction of funds. When considering the risk of an insider threat, the possibilities are even greater, since data protection is generally not a function of pre-billing systems.

What Is Pre-Billing and Why Do Telecom Operators Need It?

Subscribers expect ever-new and modern services, but it’s not feasible to constantly upgrade hardware. That’s where pre-billing comes in. Its first job is to implement new services and delivery methods without hardware changes. The second is to analyze traffic, verify its accuracy, ensure all data is loaded into the billing system, and prepare data for billing.

Pre-billing enables various data reconciliations and additional data uploads. For example, it can reconcile the status of services on equipment versus in the billing system. Sometimes, a subscriber uses services even though they’re already blocked in billing, or they used services that weren’t recorded by the equipment. Pre-billing helps resolve most of these issues.

There are also duplicate systems that check the data in billing against what was sent from pre-billing. Their job is to catch anything that left the equipment but, for some reason, didn’t get assigned to a subscriber. This role is often filled by an FMS (Fraud Management System), which primarily detects fraudulent schemes and monitors discrepancies between equipment and billing data.

Typical Pre-Billing Scenarios

Reconciliation between subscriber status on equipment and in CRM. For example:
1. Collect data from equipment (HSS, VLR, HLR, AUC, EIR) via SOAP.
2. Convert raw data into the required format.
3. QueryQuery is an online Q&A platform where users can ask questions on any topic and get answers from the community. It features voting, reputation points, and topic tags to organize and highlight quality content. While answer quality can vary, Query aims to provide quick, crowdsourced knowledge and create a collaborative space for sharing expertise. With active moderation and community engagement, it has the potential to become a valuable resource for learning and discussion. More related CRM systems (databases, APIs).
4. Reconcile the data.
5. Create exception records.
6. Request CRM to synchronize data.
Result: A subscriber downloading a movie while roaming in South Africa is blocked at zero balance and doesn’t go into a deep negative.
Data aggregation and further processing. For example, thousands of records from equipment (GGSN-SGSN, telephony) are aggregated before being sent to billing, reducing system load and resource consumption.

These are just typical workflows. More complex scenarios, such as those involving Big Data, also exist but are beyond the scope of this article.

Taming the Data Zoo: How Pre-Billing Systems Work

Let’s look at the Hewlett-Packard Internet Usage Manager (HP IUM, now eIUM) as an example. Imagine a giant grinder where you throw in all sorts of ingredients-meat, vegetables, bread-and get a uniform output. You can change the grinder’s plate to get a different shape, but the process remains the same: collection, processing, and output of data. In IUM, these stages are called encapsulator, aggregator, and datastore.

It’s crucial that the input data is complete; missing elements result in errors or warnings, as further processing is impossible without them. Each equipment type has its own handler (collector) that only works with its specific data format. For example, you can’t just feed a file from CISCO PGW-SGW (mobile internet traffic) to a collector designed for Iskratel Si3000 (fixed-line traffic).

If you do, at best you’ll get a processing exception; at worst, the entire data stream will halt until the issue is resolved. Pre-billing systems are highly sensitive to data not configured for a specific collector.

Initially, the raw data stream is parsed at the encapsulator level, where it can also be transformed or filtered if needed before aggregation. Files (.cdr, .log, etc.) with user activity records come from both local and remote sources (FTP, SFTP, etc.), and are parsed using Java classes.

Pre-billing systems aren’t designed to store the history of processed files (which can number in the hundreds of thousands per day). After processing, files are deleted from the source. Sometimes, files aren’t deleted correctly, leading to duplicate or delayed processing. To prevent this, there are mechanisms to check for duplicate files or records and to verify timestamps.

One of the most vulnerable points is data volume. The more data stored (in memory or databases), the slower the processing and the greater the resource consumption, eventually reaching a limit where old data must be deleted. Auxiliary databases (MySQL, TimesTen, Oracle, etc.) are used for storing metadata, introducing another system with its own security concerns.

Inside the Black Box

Early systems used languages like Perl for efficient regular expression processing. Most pre-billing, aside from external system integration, is about parsing and transforming strings-regular expressions are ideal for this. However, as data volumes and time-to-market demands grew, these systems became impractical due to slow testing, low scalability, and lengthy change cycles.

Modern pre-billing consists of Java modules managed via a graphical interface with standard copy, paste, move, and drag-and-drop operations. The interface is user-friendly. Linux or Unix is typically used as the operating system, with Windows being less common.

The main challenges are testing and error detection, as data passes through many rule chains and is enriched from other systems. It’s not always easy to see what’s happening at each stage, so logs are used to track variable changes and troubleshoot issues.

The system’s weakness is its complexity and the human factor. Any exception can cause data loss or incorrect data formation. Data is processed sequentially; if an error occurs at the input, the entire stream may halt or invalid data may be discarded. The parsed raw stream then moves to aggregation, which can have multiple, isolated schemes-like a showerhead splitting water into different streams.

After aggregation, data is delivered to consumers-either directly to databases, as files, or stored in pre-billing storage until cleared. Data can also be passed through multiple processing levels to increase speed and distribute load. At each stage, data streams can be merged, split, copied, or combined. The final stage is always delivery to consuming systems.

Pre-billing is not responsible for:

Monitoring whether input/output data has been delivered-this is handled by separate systems.
Encrypting data at any stage.

Not all incoming data is processed-only what’s needed for operation. The rest is ignored until required. Only necessary information is parsed from raw streams (text files, queryQuery is an online Q&A platform where users can ask questions on any topic and get answers from the community. It features voting, reputation points, and topic tags to organize and highlight quality content. While answer quality can vary, Query aims to provide quick, crowdsourced knowledge and create a collaborative space for sharing expertise. With active moderation and community engagement, it has the potential to become a valuable resource for learning and discussion. More results, binary files).

Privacy Concerns

This is where things get messy. Pre-billing is not designed to protect data. Access control can be implemented at various levels (management interface, OS), but if you force pre-billing to handle encryption, processing becomes so slow and complex that it’s unworkable for billing.

Typically, the time from service usage to its appearance in billing should not exceed a few minutes. Metadata needed for processing is stored in databases (MySQL, Oracle, Solid). Input and output data are usually stored in the directory of the specific collector stream, accessible to anyone with permission (e.g., root user).

The pre-billing configuration, including rules and access credentials for databases and FTP, is stored encrypted in a file-based database. Without the login and password, extracting the configuration is difficult. Any changes to processing logic (rules) are logged (who, when, and what was changed).

Even if data is passed directly between collectors without being written to a file, it is still temporarily stored as a file in the handler’s directory and can be accessed if desired.

Data processed in pre-billing is anonymized: it doesn’t contain names, addresses, or passport details. So, even if you access this information, you won’t get personal subscriber data. However, you can still obtain information tied to a specific number, IP, or other identifier.

With access to the pre-billing configuration, you can get credentials for all related systems. Access is usually restricted to the server running pre-billing, but not always. If you reach the directories where handler files are stored, you can modify files waiting to be sent to consumers-often just plain text documents. In this case, pre-billing processes the data, but it never reaches the final system, disappearing into a “black hole.”

It’s hard to trace the cause of such data loss, as only part of the data is missing. Emulating the loss is impossible during troubleshooting. You can check input and output data, but not where it went. An attacker just needs to cover their tracks in the operating system.

DarkNet KING Shedding Light on the Hidden Web — Your Gateway to Trusted, Secure Information