Nan Jiang, Yu Jin, Wen-Ling Hsu, Guy Jacobson, Siva Prakasam, Ann Skudlark, Zhi-Li Zhang
With widespread adoption and growing sophistication of mobile devices, fraudsters have turned their attention from landlines and wired networks to cellular networks. While security threats to wireless data channels and applications have attracted most attention, voice related fraud activities also impose a serious threat to mobile users. In particular, we have seen increasing numbers of incidents where fraudsters deploy malicious apps, e.g., disguised as interesting games to entice users to download; when invoked, these apps automatically -- and without users' knowledge -- dial certain (international) phone numbers which charge exorbitantly high fees. Fraudsters also frequently utilize social engineering (e.g., SMS or email spam, Twitter postings) to trick users into dialing such exorbitant fee-charging international numbers.
In this paper, we develop a novel methodology for detecting voice-related fraud activities using only call records. More specifically, we advance the notion of voice call graphs to represent voice calls from domestic callers to foreign recipients and propose a Markov Clustering based method for isolating dominant fraud activities from these international calls. Evaluation using data collected from one of the largest cellular networks in the US spanning two years shows that the proposed method can detection majority of the fraud calls at least one month before users reporting these calls. Moreover, we conduct systematic analysis of the identified fraud activities. Our work sheds light on the unique characteristics and trends of fraud activities in cellular networks, and provides guidance on improving and securing hardware and software architecture to prevent these fraud activities.
Public Review uploaded by LujoBauer:
This public review was prepared by Lujo Bauer.
This paper makes two important contributions, which are enabled by access to a large data set of phone call data. First, the paper sheds light on the properties of phone call graphs, particularly properties that are indicative of fraud, and shows that these can be used to automatically detect some instances of fraud before they would be reported by the victims. Second, the paper describes several methods, which will be new to most of the audience, by which fraudsters can use modern mobile devices to perpetrate fraud.
The paper is not without weaknesses. The analysis of call graphs, though leading to interesting results, does not stand out for its technical novelty. Another concern is that fraudsters may be able to change their behavior to circumvent detection; it is not clear to what extent the described approach could be modified to compensate for that. The practical benefits of the proposed solution are also not quantified as precisely as one might wish; this leads to some uncertainty about the solution's practical usefulness.
The paper's weaknesses are outweighed by how much we learn from it. The paper shines light on an important problem, and is one of the first studies of this problem from the two perspectives that it takes.
It will be interesting to see whether follow-up work emerges that further improves upon the fraud-detection method the paper describes. The paper also alerts us to a new type of malfeasance that modern mobile phones make possible, and so helps inform and inspire researchers and developers to continue improving mobile phone security mechanisms to cope with these and other dangers.
We thank Dr. Bauer and other reviewers for their constructive comments.
The main objective of this paper is to develop a system that can effectively isolate a small number of fraud numbers from millions of international voice call attempts. To this end, we propose the notion of voice call graphs and apply the Markov clustering method to mine dense communities from these graphs that represent fraud activities. The choice of the Markov clustering technique is to strike a balance between scalability and accuracy, and the proposed method does achieve good detection results on our testing datasets and require less than an hour for each run on a single core machine. However, we believe there is still room for developing more efficient graph mining algorithms to decompose and analyze voice call graphs, especially when the system is expected to run in real time and more features, such as call duration, repeated times and so on, are incorporated into voice call graphs.
In order to evade from detection, fraudsters need to avoid correlations among the employed fraud numbers. For example, fraudsters can use one single number or assign different numbers for different user groups (if applicable). However, we believe either of the methods can significantly reduce the efficacy of the fraud activity by attracting a much smaller number of victims. In addition, as our future work, we will integrate user international calling history into the detection algorithm, i.e., a number is more likely to be a fraud number if it attracts a lot of callers who have never launched international calls. This can make it much more difficult for fraudsters to circumvent detection.
Regarding the practical usefulness, the proposed algorithm provides a valuable “first-line” defense in alerting users and cellular providers of emerging fraud activities. After isolating from millions of voice call attempts a small number of fraud candidates (around 5K in each month) using our method, cellular service providers can allocate resources to track these suspicious numbers and combine with other expensive data sources, like billing information and customer call history, to further confirm the fraud activities. For example, we can investigate billing records associated with potential IRSF calls to look for exorbitant charges. IRSF numbers can also be confirmed by directly calling these numbers. We believe many of these methods can be automated to reduce the manual cost, e.g., through automated query scripts or using auto-dialing tools, to make post-investigation of our automatic detection results more manageable. More importantly, our analysis using real voice call records shows that, by proactively identifying and restricting potential IRSF numbers based on our detection results, cellular service providers can benefit from a reduction of tens of thousands of fraud calls per month, and subsequently a huge saving in customer care cost and an implicit increase in customer satisfaction, which far outweighs the cost of investigating the detection results.