Science —

Robocallers stand out in a troll through Chinese cell phone records

How to tell the robots from the humans even if you can't hear the conversation?

Robocallers stand out in a troll through Chinese cell phone records

The availability of electronic records of communications, from the use of cellphones to chats in online games, has given social scientists new options for studying how humans interact. Communication patterns, friendship networks, and the spread of ideas have all become accessible to large-scale analysis. Now, researchers have combed the records of 5.9 million Chinese cellphone users, trying to figure out the normal pattern of calls they make. And in the process, they've identified a few abnormal patterns, ones that probably aren't made by humans at all.

The researchers, four of whom hailed from Shanghai's East China University of Science and Technology, involved in the work obtained 108 days worth of call data from an unspecified Chinese carrier. They used these to identify the 100,000 most active callers, since these should call often enough to provide a decent picture of the statistics. Although their records could be analyzed a number of different ways, they chose to focus on the interval between calls: how often, in general, does one wait before making a second phone call?

You might expect that this value would show a classic poisson distribution, with a bell-shaped curve centered on some reasonable value. But, in fact, the typical time between calls overall showed a power law distribution, the classic spread that shows a peak towards one end of the graph, followed by a "long tail" of gradual decreasing.

However, when the researchers started diving into the data, something strange became apparent: the power law distribution was dominated by the accounts that made the most phone calls (and this is already the most active subset of users in the records). So, the researchers went through and categorized every single individual account. Some of them displayed power law distributions, but the majority (over 73,000) were a better fit to something called a Weibull distribution—think of a bell curve with a long tail on one end. Only about 3,500 showed a power law pattern of spacing in between calls.

The authors then compared these two groups based on a variety of statistics: the percentage of the total calls that were outgoing, the count of different phone numbers they dialed, and how diverse the recipients of their calls were. The result of this analysis is that the power law group had the most "anomalous and extreme calling patterns," according to the authors. And, in most cases, these are potential signs of trouble.

Some accounts showed a high frequency of outgoing calls, but only to a limited number of (or only one) target phone numbers. The authors inferred that they are "robot-based users." Another cluster of accounts had a high frequency of outbound calls, but had an inordinate number of targets, and called them all with equal frequency. The authors suspect that these are sales accounts, or represent instances of phone-based frauds.

The authors suggest that their work provides "information valuable to both academics and practitioners, especially mobile telecom providers." But they then go on to ignore the network providers and focus on academics. For them, the key message is that cell phone users are a diverse population, and shouldn't be modeled as if they all follow patterns that fall on a simple power law curve. In fact, even among the users that showed a power law distribution, the value of the exponent that described the curve varied a great deal.

Could an actual cell provider use this information? Clearly, a scammer like the guy who tried to gain access to one of our writers' computers will show one of the patterns seen here: lots of calls, almost all outgoing, and spread among a wide variety of contact numbers. Even if the phone company felt no ethical obligation to block the practice, it might still see it as a drain on its resources (provided the scammers have an unlimited calling plan). At the same time, there will be numbers that show the same pattern, but for legitimate reasons—automated appointment reminders from medical practices spring to mind.

So, sadly, although patterns like this could be a useful starting point for investigations, and could definitely serve as evidence if a scammer gets caught, they're not going to be especially useful in creating an automated system that could shut down scammers and spammers.

PNAS, 2013. DOI: 10.1073/pnas.1220433110  (About DOIs).

Channel Ars Technica