Hotel information and reviews were collected from,, and Google's Geocoding API was used to join hotels across websites. Both single-site and multi-site temporal, spatial, and behavioral features were extracted. A subsample of the data feature set is available here. A description of the features can be found here.

The entire hotel review dataset is available here. Note that both and have separate spaces for a comment and a paragraph, while has separate spaces for negative feedback and positive feedback. Due to the method of data collection, some review text is truncated and ends with "Read more". I also have latitude and longitude values for a subset of hotels, as well as hotels that I have matched across sites; email me for more information. If you use this dataset, please cite the TrueView paper. Feel free to email me with any questions.


This is the SQL file that creates the feature matrix above. This is the MATLAB file that computes the TrueView score. You need to have the CSV file from above located in the same folder as the script for it to run properly. It will output a matrix of the format: Hotel ID, TrueView score.