As part of a project through the Microsoft Studios University Program, we collected tweets containing references to games. The tweets were collected using the streamR package for access to the TwitterStreaming API. The data was collected hourly between the dates of May 2nd, 2014 and June 29th, 2014. We restricted tweets to those containing at least one of a list of specified keywords or hashtags from broad categories such as platform (e.g. “Xbox”, “PlayStation”), title (“Call of Duy”, “Skylander”), or journalism (“Polygon”, “IGN”).
In total, the 7,294,826 tweets in the data set contain 1,928,257 retweets, 75,760 tweets containing negative emoticons, and 101,855 tweets containing positive emoticons. Additionally, the tweets span a time leading up to and including the Electronic Entertainment Expo for 2014.
One potential use for these domain-specific tweets is sentiment classification. Results for a Naive Bayes classifier trained using emoticons can be found in:
Sarratt, T.; Morgens, S.; and Jhala, A. 2014. Domain-Specific Sentiment Classification for Games-Related Tweets. 3rd Workshop on Games and NLP (GAMNLP-14).
Tweets: <link to tweets csv>
Users: <link to user csv>
Trevor Sarratt, Soja-Marie Morgens
Computational Cinematics Studio