La tesis de David Montoya
A personal knowledge base integrating user data and activity timeline
Typical Internet users today have their data scattered over several devices, applications, and services. Managing and controlling one’s data is increasingly difficult. In this thesis, we adopt the viewpoint that the user should be given the means to gather and integrate her data, under her full control. In that direction, we designed a system that integrates and enriches the data of a user from multiple heterogeneous sources of personal information into an RDF knowledge base. The system is open-source and implements a novel, extensible framework that facilitates the integration of new data sources and the development of new modules for deriving knowledge. We first show how user activity can be inferred from smartphone sensor data. We introduce a time-based clustering algorithm to extract stay points from location history data. Using data from additional mobile phone sensors, geographic information from OpenStreetMap, and public transportation schedules, we introduce a transportation mode recognition algorithm to derive the different modes and routes taken by the user when traveling. The algorithm derives the itinerary followed by the user by finding the most likely sequence in a linear-chain conditional random field whose feature functions are based on the output of a neural network. We also show how the system can integrate information from the user’s email messages, calendars, address books, social network services, and location history into a coherent whole. To do so, it uses entity resolution to find the set of avatars used by each real-world contact and performs spatiotemporal alignment to connect each stay point with the event it corresponds to in the user’s calendar. Finally, we show that such a system can also be used for multi-device and multi-system synchronization and allow knowledge to be pushed to the sources. We present extensive experiments.