How Mobile Operators Analyze Your Data: Inside Prebilling Systems

How Mobile Operators Analyze Your Data

Mobile operators collect a huge amount of data and metadata, which can reveal a lot about the life of each individual subscriber. By understanding how this data is processed and stored, you can trace the entire chain of information flow—from making a call to the deduction of funds. When considering the risk of an insider threat, the possibilities are even greater, since data protection is generally not the responsibility of a mobile operator’s prebilling systems.

How Subscriber Data Is Collected and Processed

Subscriber traffic in a telecom operator’s network is generated and received from various types of equipment. This equipment can create files with records (CDR files, RADIUS logs, ASCII text) and operate using different protocols (NetFlow, SNMP, SOAP). All of this diverse data must be controlled, collected, processed, and passed on to the billing system in a standardized format.

Throughout this process, subscriber data is constantly in motion, and ideally, access to it should not be granted to outsiders. But how secure is this information, considering all the links in the chain? Let’s break it down.

Why Do Mobile Operators Need Prebilling?

Subscribers want ever-new and modern services, but it’s not feasible to constantly upgrade equipment. That’s where prebilling comes in. Its first task is to implement new services and methods of delivery. The second is to analyze traffic, verify its accuracy, ensure complete data upload to subscriber billing, and prepare data for billing.

Prebilling enables various reconciliations and data uploads. For example, it can check the status of services on equipment versus in the billing system. Sometimes, a subscriber uses services even though they’re already blocked in billing, or they used services but the equipment didn’t send records about it. Prebilling helps resolve most of these situations.

There are also duplicate systems that check the data in billing against the data sent from prebilling. Their job is to catch anything that left the equipment but, for some reason, didn’t get assigned to a subscriber. This role is usually filled by the FMS (Fraud Management System), whose main purpose is to detect fraudulent schemes and monitor losses and discrepancies between equipment and billing data.

Typical Prebilling Scenarios

  • Reconciliation between subscriber status on equipment and in CRM. For example:
    1. Prebilling collects data from equipment (HSS, VLR, HLR, AUC, EIR) via SOAP.
    2. Raw data is converted to the required format.
    3. Requests are made to related CRM systems (databases, APIs).
    4. Data is reconciled.
    5. Exception records are created.
    6. A request is sent to CRM to synchronize data.

    Result: A subscriber downloading a movie while roaming in South Africa is blocked at zero balance and doesn’t go into a huge negative.

  • Data aggregation and further processing. For example, thousands of records from equipment (GGSN-SGSN, telephony) are aggregated so that instead of sending 10,000 records to billing, only one record with the total internet traffic is sent. This saves system resources and electricity.

These are just typical scenarios. More complex schemes, such as those involving Big Data, also exist but are beyond the scope of this article.

Case Study: Hewlett-Packard Internet Usage Manager (HP IUM)

To better understand how prebilling works and where issues can arise, let’s look at the Hewlett-Packard Internet Usage Manager (HP IUM, now eIUM) as an example.

Imagine a giant meat grinder where you throw in meat, vegetables, bread—anything. The input is diverse, but the output is uniform. You can change the grinder plate for a different output shape, but the process remains the same: auger, blade, plate. This is the classic prebilling scheme: data collection, processing, and output. In IUM, these stages are called encapsulator, aggregator, and datastore.

It’s crucial that the input contains a minimum required set of data; otherwise, processing is impossible. Each equipment type has its own handler (collector) that only works with its specific data format. For example, you can’t just feed a file from CISCO PGW-SGW (mobile internet traffic) to a collector designed for Iskratel Si3000 (fixed-line traffic).

If you do, at best you’ll get a processing exception; at worst, the entire data stream will halt until the issue is resolved. Prebilling systems are very sensitive to data that their collectors aren’t configured to handle.

Raw data streams are parsed at the encapsulator stage, where they can also be transformed or filtered before aggregation. Files with user activity records (.cdr, .lo, etc.) come from both local and remote sources (FTP, SFTP, etc.). Parsers, often written in Java, process these files.

Prebilling systems aren’t designed to store the history of processed files (which can number in the hundreds of thousands per day), so files are deleted after processing. If deletion fails, records may be processed again or with a delay. To prevent duplicates, there are mechanisms to check for duplicate files or records, timestamps, and more.

One of the most vulnerable points is data volume. The more data stored (in memory or databases), the slower new data is processed, and eventually, old data must be deleted. Auxiliary databases (MySQL, TimesTen, Oracle, etc.) are used for storing metadata, introducing another system with its own security concerns.

How Does Prebilling Work?

Early prebilling systems used languages like Perl, which excelled at processing strings with regular expressions. However, as data volumes grew and the need for rapid service deployment increased, these systems became impractical due to slow testing and poor scalability.

Modern prebilling consists of modules, usually written in Java, managed via a graphical interface with standard copy, paste, move, and drag-and-drop operations. The interface is user-friendly. Linux or Unix is typically used as the operating system, with Windows used less often.

The main challenges are testing and error detection, as data passes through many rule chains and is enriched from other systems. It’s not always easy to see what’s happening at each stage, so logs are used to track changes in variables.

The system’s weakness is its complexity and the human factor. Any exception can cause data loss or incorrect data formation. Data is processed sequentially; if there’s an error at the input, the entire stream or a portion of incorrect data is dropped. The parsed raw stream then moves to aggregation, which can have multiple isolated schemes—like a single stream of water splitting into different jets through a showerhead.

After aggregation, data is delivered to consumers—either directly to databases, as files, or stored in prebilling storage until it’s emptied. Data can also be passed to subsequent processing levels to increase speed and distribute load. At each stage, data streams can be merged, split, copied, or combined. The final stage is always delivery to consuming systems.

Prebilling is not responsible for:

  • Monitoring whether input/output data has been delivered—this is handled by separate systems.
  • Encrypting data at any stage.

Not all incoming data is processed—only what’s needed for operation. The rest is ignored until required. Only necessary information is parsed from raw streams (text files, query results, binary files).

Prebilling and Privacy

This is where things get messy. Prebilling is not designed to protect data. Access control can be implemented at various levels (management interface, OS), but if you force prebilling to handle encryption, processing time and complexity increase to the point where it becomes unusable for billing.

Usually, the time from service usage to its appearance in billing should not exceed a few minutes. Metadata needed for processing is stored in databases (MySQL, Oracle, Solid). Input and output data are typically stored in the directory of the specific collector stream, accessible to anyone with permission (e.g., root user).

The prebilling configuration, including rules and access details for databases and FTP, is stored encrypted in a file-based database. Without the login and password, extracting the configuration is difficult. Any changes to processing logic (rules) are logged (who, when, and what was changed).

Even if data is passed directly between collectors without being written to a file, it is still temporarily stored as a file in the handler’s directory and can be accessed if desired.

Data processed in prebilling is anonymized: it does not contain names, addresses, or passport details. So even if you access this information, you won’t get personal subscriber data. However, you can still obtain information tied to a specific number, IP, or other identifier.

Access to prebilling configuration gives you credentials for all related systems it interacts with. Usually, access is limited to the server running prebilling, but not always.

If you reach the directories where handler files are stored, you can modify files waiting to be sent to consumers. These are often plain text documents. In this case, prebilling processes the data, but it never reaches the final system—it disappears into a “black hole.”

It’s hard to trace the cause of such data loss, as only part of the data is missing. Simulating the loss later is impossible. You can check the input and output, but figuring out where the data went is not possible. An attacker just needs to cover their tracks in the operating system.

Leave a Reply