Broadly I study data mining and machine learning in the Department of Computer Science at the University of New Mexico. I have a BA in Integrative Biology from UC Berkeley and an MS in Computer Science from the University of New Mexico, and I am partial to highly interdisciplinary research. I enjoy using statistical methods to identify hidden structure and patterns in datasets from all domains. I am currently working on optimization of time series algorithms for sparse social media datasets with Prof. Abdullah Mueen. My CV is available here.
I began research in the field of microbiology. My first job in research was at Los Alamos National Laboratory working with Dr. Peter Pavlik in the summers of 2004, 2005, and 2006. I next worked in the Kuriyan Lab with Dr. Marsha Nidanie Henderson.
After my first year of college, I decided to switch out of the College of Chemistry into the College of Letters and Sciences, majoring in Integrative Biology. I did a three month independent research project in Fall of 2009 at the UC Berkeley Richard B. Gump South Pacific Research Station in beautiful Moorea, French Polynesia. I studied the stand ecology of the invasive tree species Falcataria moluccana and published my research in the Student Research Papers Series.
After graduating in December 2009, I served as a field assistant to Dr. Stephanie Stuart in Australia for four months. Following that, I worked as a post-baccalaureate research intern at Los Alamos National Laboratory with Dr. Helen Cui working on biothreat prevention.
This work motivated me to go back to school in August 2011 to get a Masters degree in computer science at the University of New Mexico. Upon applying to UNM, I was hired as a research assistant by Drs. Terran Lane and Darko Stefanovic to work on the NSF-funded project Computing with Biomolecules: From Network Motifs to Complex and Adaptive Systems. Along with postdoctoral scholar Dr. Matthew Lakin, we published Towards a biomolecular learning machine at the Unconventional Computation and Natural Computation 2012 Conference.
In Summer 2013, I interned at Sandia National Laboratory's Center for Cyberdefenders. I used machine learning to analyze system call traces to help characterize malware. In December 2013, I began working with Dr. Abdullah Mueen on data mining and machine learning. In Summer 2014, I interned at Mandiant, a FireEye company. I used machine learning for visualization and classification of malware families. In May 2015 I published my first first-author paper at WWW2015: TrueView: Harnessing the Power of Multiple Review Sites. In Summer 2015, I interned at Groupon on the Data Science Team. I designed a predictive bid regression model with an expanded feature set for improved SEM ad performance. I also implemented smart keyword generation for products using NLP analysis of product descriptions. In August 2016 I published a short paper at ASONAM2016: ClearView: Data Cleaning for Online Review Mining. I am currently working on exploiting sparsity in social media data to optimize time series algorithms.
Here is a link to code and data for TrueView, an algorithm for evaluating the trustworthiness of a hotel's ratings based on multi-site analysis of temporal, spatial, and behavioral review features. This work is based on my paper from WWW15: TrueView: Harnessing the Power of Multiple Review Sites.
Here is a link to code and data for ClearView, an automated pipeline to filter out noisy reviews for data mining. This work is based on my paper from ASONAM2016: ClearView: Data Cleaning for Online Review Mining.
DataThe entire hotel review dataset is available here. It is password protected; email me at aminnich AT unm DOT edu for access. Note that both TripAdvisor.com and Hotels.com have separate spaces for a comment and a paragraph, while Booking.com has separate spaces for negative feedback and positive feedback. Due to the method of data collection, some review text is truncated and ends with "Read more". If you use this dataset, please cite the TrueView paper.
I also have latitude and longitude values for a subset of hotels, as well as hotels that I have matched across sites; email me for more information.
We have annotated 10,000 randomly sampled reviews from TripAdvisor.com and the Google Play Marketplace with three sentiment scores given by Amazon Mechanical Turkers, for a total of 60,000 labeled samples. This dataset is password protected; please email aminnich AT cs DOT unm DOT edu to request access. If you use this dataset, please cite the ClearView paper.