Emiliano Miluzzo, Alexander Varshavsky, Suhrid Balakrishnan, Romit Roy Choudhury
This paper shows that the location of screen taps on modern smartphones and tablets can be identified from accelerometer and gyroscope readings. Our findings have serious implications, as we demonstrate that an attacker can launch a background process on commodity smartphones and tablets, and silently monitor the user's inputs, such as keyboard presses and icon taps. While precise tap detection is non-trivial, requiring machine learning algorithms to identify fingerprints of closely spaced keys, sensitive sensors on modern devices aid the process. We present TapPrints, a framework
for inferring the location of taps on mobile device touch-screens using motion sensor data combined with machine learning analysis. By running tests on two different off-the-shelf smartphones and a tablet computer we show that identifying tap locations on the screen and inferring English letters could be done with up to 90% and 80% accuracy, respectively. By optimizing the core tap detection capability with additional information, such as contextual priors, we are able to further magnify the core threat.
The paper can be found here: http://www2.research.att.com/~miluzzo/pubs/sys015fp-miluzzo.pdf
Public Review uploaded by YangLi:
This public review was prepared by Yang Li.
As sensors such as accelerometers become commonplace on mobile devices, it is important to understand what we can possibly achieve with them. The paper presents TapPrints, an ensemble classification framework for inferring finger tap locations on a mobile device touchscreen from accelerometer and gyroscope readings. The experiment, based on more than 40,000 taps collected from 10 users, shows that TapPrints is able to achieve reasonable accuracy on a range of tapping tasks such as icon selection and text entry. The work is comprehensive and timely, and lays a solid foundation for future exploration of the topic.
TapPrints is not the first attempt to infer tap locations on a touchscreen using motion sensor data. Recent work has demonstrated the feasibility of deducing a user’s password from accelerometer readings as the user enters it via a touchscreen keypad. However, in addition to employing a different machine learning framework, TapPrints advances our understanding of the problem by studying a range of tapping tasks, multiple mobile devices and sensors, and various holding postures.
A concern with the paper is the feasibility of the proposed framework in realistic situations. More specifically, how is labeled training data acquired for each user, and does ambient motion such as walking interfere with the inference? To address the former, the authors clarified in their revision that a model can be pre-trained with other people’s data and adapted to each user, which alleviates the need to collect labeled data from every individual. They demonstrated the feasibility of this approach with additional experiments. Interference with ambient motion is beyond the scope of this work and left for future exploration.
Although the paper is motivated by security problems, the technique has broader implications for how these built-in sensors could improve mobile interaction in general. For example, an on-screen keyboard application could combine accelerometer and touch screen data to improve key identification accuracy. It will be exciting to see future work leverage sensor input to overcome these and other fundamental issues of mobile interaction.
Many thanks to Yang Li for his thorough review. First, we would like to highlight the differences between the two recent workshop papers and our work. While these two papers focus on the use of the accelerometer only, we rely on both the accelerometer and gyroscope combined and show, for the first time, that the gyroscope is the most effective sensor for tap inference. Our results show that the gyroscope alone can double the tap inference accuracy achieved by using just the accelerometer. Other key differences are that we present results from a more thorough evaluation, involving a larger number of users, exercising different typing modalities in realistic settings, device orientations (portrait and landscape), and different device form factors (smartphones and tablets).
Next, we also address the concerns about (1) the labeled data collection methodology and (2) the impact of motion (e.g., walking) on the tap inference accuracy. (1) We show that an attacker can build a tap inference model by collecting labeled data from a finite number of people -- whom agree to provide the data -- and train the model in a supervised manner by using this data. We then show that, by deploying this model, it is possible to infer taps with sufficient accuracy even from people not contributing to the training data. We finally show that the accuracy for these new users can be improved by collecting some extra labeled data from them (by knowing, for example, the tapped letter or icon within the malicious application, which has often access to the coordinates of the tap). This new labeled data can be exploited to improve the tap classifier and effectively tune it to the new user's typing behavior. (2) In spite of not having performed experiments in the walking scenario, we have reasons to believe that the TapPrints' inference accuracy might drop when typing while walking because of the noise injected in the motion sensor data by the body movement. We will conduct experiments in this direction. However, the tapping-while-walking scenario was identified with slightly lower priority given that the majority of people tend to be stationary while typing. Even if just a fraction of people (still in the order of the millions) type while stationary, we uncover a vulnerability that can still affect large volumes of users.