Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi, James Rowland, Alexander Varshavsky, and Walter Willinger
Models of human mobility have broad applicability in fields such as mobile computing, urban planning, and ecology. This paper proposes and evaluates WHERE, a novel approach to modeling how large populations move within different metropolitan areas. WHERE takes as input spatial and temporal probability distributions drawn from empirical data, such as Call Detail Records (CDRs) from a cellular telephone network, and produces synthetic CDRs for a synthetic population. We have validated WHERE against billions of anonymous location samples for hundreds of thousands of phones in the New York and Los Angeles metropolitan areas. We found that WHERE offers significantly higher fidelity than other modeling approaches. For example, daily range of travel statistics fall within one mile of their true values, an improvement of more than 14 times over a Weighted Random Waypoint model. Our modeling techniques and synthetic CDRs can be applied to a wide range of problems while avoiding many of the privacy concerns surrounding real CDRs.
The full paper is available from http://www.kiskeya.net/ramon/work/pubs/mobisys12.pdf
Public Review uploaded by lzhong:
This public review was prepared by Rajesh Balan.
This paper addresses the problem of having accurate data for running large metropolitan-area scale simulations or experiments that involve the movement patterns of individual people. These types of large-scale human mobility-driven experiments have traditionally been limited to the few research groups that have access to sufficiently accurate movement data (from cell records etc.). To solve this problem, the authors described a method to extract synthetic yet still accurate movement models from call detail records (CDR). They describe the challenges involved with creating synthetic models and then showed, using models created from real CDR records of the Los Angeles and New York City metropolitan areas, that their models are accurate and useful for different types of applications.
While this paper addresses an important problem, it does have some deficiencies. First, while this is a great first step to making datasets available, it still requires quite a bit of work and data access to create the synthetic datasets for various cities. As the authors showed, using publicly available census data (which theoretically everyone has access to) results in much higher errors in the model. Hence, it will still require someone with access to a cell provider's CDRs (or similarly good data) to create good models for different cities. Maybe this process can be automated and provided as a service to researchers? Second, even with CDR records, the errors in the models can still be quite high -- on the order of a mile or more depending on how the model was created. Hence, the models may not be useful to applications that require more precise accuracy. This was shown in the paper's evaluation section where some types of applications performed better than applications (such as an epidemic routing application) that required more precise movement patterns.
Overall, this is an exciting first step in providing accurate large scale movement patterns for large metropolitan areas. Hopefully, the authors will take this work forward and provide mechanisms for other researchers to obtain models for cities of interest to them.
The authors would like to thank the public reviewer for his comments. Below we address his concerns in turn.
First, we hope to release our models to the broader research community after adjusting our algorithms as necessary to preserve differential privacy. We have already automated our procedures for producing synthetic models from spatial and temporal probability distributions drawn from large populations of real people living across wide geographic areas. We hope to make available models for select cities derived from the best data we have available for each of those cities, namely anonymized Call Detail Records (CDRs). Previous work has shown that different metropolitan areas exhibit distinct mobility patterns, and therefore different models are necessary for different cities.
Second, we argue that the location accuracy of our models is adequate and useful at the intended scale of a metropolitan area. The scale of our models, in terms of both the number of subjects represented and the size of geographic areas covered, is a major contribution of our work. For example, we have been working with hundreds of thousands of people living across 50-mile-radius areas around Los Angeles and New York. We have shown that our models reproduce important human mobility characteristics, such as daily range of travel, with a median accuracy on the order of 1 mile. This accuracy is sufficient to answer many concrete questions in disciplines such as urban planning. Furthermore, this accuracy should be viewed in the context of the accuracy of our CDR input data—also on the order of 1 mile due to the spacing of cellular towers. In the future, it may be possible to create hierarchical models that combine our large-scale models with previous smaller-scale models while preserving the strengths of both.
The paper presents additional detail on both of the above topics.