metacritic scores and mmos
A while back i was momentarily considered giving the MMORPG Elder Scrolls Online (ESO) a go on the PS4. I wasn't expecting to really get into the game since i did not enjoy the combat mechanics of Bethesda's previous game Skyrim (too much like mediocre fps melee combat for my taste). Nonetheless, Bethesda does create massive worlds which can be fun to explore, and playing an mmo on a console would be an interesting change. However, beta informed previews were decidedly lukewarm. Ars Techinca came away skeptical of trying to fit Elder Scrolls into a massive multiplayer format. Similarly, Edge Magazine felt it invited comparison with Skyrim which it could not live up to. While Eliot Lefebvre over at Massively felt it was "nothing special", being "another generic fantasy MMO". In any event, a six month delay to the release of the console versions has since put paid to the idea of playing the console version any time soon.
Despite my misgivings, i thought i would check out the review scores on Metacritic after the Mac and PC release of the game to get a feel for its reception. After all, the previews were based on a beta of the game and the developers might have succeeded in improving the game in the meantime. Surprisingly, the game had some strongly favourable early reviews, receiving high ratings from the likes of Cheat Code Central, GamesBeat and Hooked Gamers. However, these reviews were posted on the first day of release, which seems premature for an MMORPG - MMORPGs typically require large time investments to get a good feel for the game while the levelling experience can be significantly different to the end game experience. The implication is that the reviews were largely informed by beta play or limited time spent with the final code, which raises concerns about the quality of the reviews. The fact that some of my preferred reviewers (e.g. Edge, Polygon, Ars Technica) had not weighed in so early tended to exacerbate my suspicions. I was consequently interested in the issue of whether latter reviews would be significantly different from earlier reviews in their opinion of the game, and whether any such differences existed for other MMOs.
A month after the release of the game i decided to analyse the metacritic results for ESO and compare them to other games. This task also provided the perfect opportunity to learn some programming and data analysis in Python, including how to scrape data from webpages. For the latter i used the Beautiful Soup Python library which makes parsing and processing html pages relatively straight-forward.
The graph immediately below shows individual critic review scores for ESO arranged according to how many days from the game's release the review was published. As the trend line shows, there is a clear significant downward trend in review scores over time: critics positing reviews on or shortly after the game's release tended to be much more satisfied with the game compared to those who released reviews much latter. Those reviewers i view as being competent tended to release their reviews well after the release date, and generally indicated a game that failed to meet expectations. For instance, Polygon released its review 20 days after the release date, awarding it 6 out of 10, while Edge Magazine released its review 25 days after the game's release date, awarding it 5 out of 10, concluding that "it is difficult to imagine many others investing hundreds of hours in a place this bland". The quality of some of the early reviews has to be questioned given they appear to be largely based on playing the beta version. For instance, Merlin'in Kazani released their review 2 days before the game's official release date!
Metacritic effectively agrees with my assessment. The metacritic score for ESO at the time of compiling the data was 72, which is 1 point lower than a straight average of all the review scores (73). The disparity reflects that those reviewers who metacritic assigns a greater weight (in recognition of their quality or stature) tended to give a lower review score for the game compared to lower weighted reviewers. (The exact weighting sysem used by metacritic is unknow and subject to significant speculation.)
The tendency for early reviewers to given more favourable reviews could reflect a variety of factors, including, but not limited to:
Early reviewers had relatively less game time and therefore did not get a chance to fully explore the potential shortfalls of the game. The fact that ESO followed the popular and well received Skyrim could have contributed to such an impression;
Later reviewers may have spent more time with the game on average and consequently become more dulled by the experience as any excitement associated with playing a 'new' game naturally wore off;
Early reviewers may have been more likely to be fans of the Elder Scrolls series and therefore more tolerant of any deficiencies; and
Pure coincidence: by pure chance a number of positive reviews for the game appeared early after release.
Given the competitive pressures of publishing, whereby being first to publish can be key to attracting audiences (while also satisfying a demand from readers who are looking for some indication of whether a game is worth purchasing), I would not be surprised if limited play time was the driving force behind the early positive reviews. If this were the case though one would expected that such tendencies may hold for other MMORPGs. In order to gain insight into this hypothesis I decided to examine trends in average scores over time for other MMORPGs. The moving average score for select MMORPSs are shown in two graphs below.
The results show that a sharp downward trend in average review score is not a consistent pattern across MMORPGs, although eventual slight declines are apparent. The clearest downward trend (apart from ESO) is for Marvel Heroes and, to a lesser degree, the Mists of Pandaria expansion for World of Warcraft (WoW), with the latter stabilising relatively quickly after about a week. DC Universe Online exhibited a sharp fall early on but this seems to reflect the impact of only a handful of disparate reviews being published for the game in the first couple weeks after release. The average score for the WoW Cataclysm expansion declined in the days immediately after release but then actually rose over subsequent weeks. Star Wars: The Old Republic rose sharply over the first few days before declining slightly over subsequent weeks. Final Fantasy: A Realm Reborn has a pretty stable average review score with a slight downward trend. Finally, World of Tanks, which is admittedly an MMO rather than MMORPG, provides an interesting case study. The game appears to have been a bit of a sleeper, attracting only a few reviews over the first couple weeks after release which saw its average score ratchet up sharply before easing over time.
Perhaps the key lesson to draw from the various scores is that it takes between two to three weeks for average review scores for MMORPGs to stabilise. Anyone who takes into account metacritic review scores when deciding to play an MMORPG should bear this in mind. Indeed, metacritic does not publish a metacritic score until a certain minimum numer of reviews are available in order to minimise subsequent large swings. Given the nature of MMORPGs (i.e. evolving gameplay over long time horizons with potentially significant changes in gameplay) it is arguable that a greater number of reviews may be needed to get a reliable indication of the eventual metacritic review score.
With respect to the last point i also tracked the average scores for several single player or non-mmorpg games (i.e. Titanfall, Hearthstone, Bioshock Infinite: Burial at Sea and Diablo III Reaper of Souls) to see how quickly their review scores stabilised. As you can see in the graph below, scores for these games stabilised after only about a week. However, this is a small sample size and the games are all highly regarded, meaning one could expect relatively smaller dispersion of review scores for these games. More significantly though, these games tended to attract more reviews compared to the mmorpgs listed above, with the former averaging 28 reviews per game in the first 10 days of release compared to 11 reviews per mmorpg. This outcome would probably reflect that mmorpgs attract a relatively smaller audience (and therefore reviewers) and that reviewers on average take longer to review mmorpgs.
As a final task i calculated the standard deviation for the various games. The standard deviation is a measure of how much variation from the average exists in the data. It thus provides insight into the level of consensus on the quality of particular games, with a high standard deviation indicating high variation in review scores (i.e. low consensus) and low standard deviation indicating low variation (i.e. high consensus).
DC Universe Online, Marvel Heroes and ESO all had relatively high standard deviations, indicating relatively high variation in scores among critics. One interpretation of this result is that these games may be more a matter of taste, appealing to particular audiences. At the other end of the spectrum, the Cataclysm WoW expansion, Hearthstone, Guild Wars 2 and Final Fantasy - A Realm Reborn all had low standard deviations.
That wraps it up for this post. I hope to return to analysing Metacritic review scores soon. It would be nice to flesh out the analysis with a broader sample of games while an examination of user submitted scores would also be interesting - even a curios glance of user scores makes you wonder why Metacritic even bothers.