The context and long-term relevance of scores as a means to refence quality

The RPGWatch vote took into consideration hundreds of games, only the top 50 received the award of getting into the top 50.

As is RPGWatch custom. The end of year vote also consists of around 100 games being voted on, but only the top 10 are awarded a top 10 slot.

I think it's safe to say every game ever made is loved by at least one person, unless even the developer themselves are particularly regretful of their project.

So have you compared the two polls yet? How does the RPGWatch red-line compare with the RPGCodex red line? What was the difference?
 
Joined
Nov 1, 2014
Messages
4,762
I have no info on the games in our vote beyond the top 50. Do you maybe have that? Their vote has a clear top 100 cut-off and it would be trivial to analyse that as well. That is still double the amount of data I can work with which I would characterize as a bigger data set.
 
Joined
Jun 24, 2014
Messages
899
Hold back a mo.

the graphs you provide make it difficult to see the numbers between 75 and 100, due to there being no numbers on the graph in that range.

So what was the number where the RPGWatch red line started on the left and at what number did it finish on the right in the critic graph with removed outliers? And where did it start and end on the codex/critic one with removed outliers? What were the exact numbers?
 
Joined
Nov 1, 2014
Messages
4,762
I added the lines for each one point step to the charts so you can read it more easily. One note: in the Codex vote analysis I multiplied all metacritic user scores by 10 so instead of the score being, say, 8.1, it is 81. I had some problems with formatting and it kept formatting numbers with decimals as a date. It worked fine with our vote, but for some reason the Codex spreadsheet didn't want to cooperate. Anyway, it shouldn't affect the analysis in any way since all the data points on the same axis were multiplied by the same factor.

So the trend line progression is like this:
Our vote:
Votes Percentage vs Metacritic User Score 7.6-8.9 (essentially 76-89)
Votes Percentage vs Metacritic Critic Score 80-94
Votes Percentage vs Metacritic User Score (Witcher 3 removed) 75.5-86
Votes Percentage vs Metacritic Critic Score (Witcher 3 removed) 80.33-88
Codex vote:
Points vs. Critic Score 78-92
Points vs. User Score 81-93
Points vs. Critic Score (Planescape, Fallout and BG2 removed) 78-87
Points vs. User Score (Planescape, Fallout and BG2 removed) 79-91
 
Joined
Jun 24, 2014
Messages
899
Awesome, and how many games are below that line and how many games are above that line in each?
 
Joined
Nov 1, 2014
Messages
4,762
It's all in the spreadsheets, you can look at that yourself.
 
Joined
Jun 24, 2014
Messages
899

Ah, I see. Let me rephrase my question.

Now that we've had all the data, and you've now had a nice bigger sample size, would you now agree that there's no definite guarantee that a game's high'er score is a guarantee to it's long term popularity?
 
Joined
Nov 1, 2014
Messages
4,762
Interesting discussion. I know one has to be careful about comparing percentages between things, or even looking at them for a single thing, without knowing the context. Even looking at a rating for a single game on its own, versus a comparison, you need to know the population used, and if any bias exists, on the voting.

Comparing adds a lot of other factors. A good example is at work. One question I got asked was what is the percentage rate of graduates in the PHD in Physics by gender, as it relates to women in STEM programs.

A quick glance might show that the success rate for females was 100% while for males it was 65%. At first glance you might be like, wow females do really good in physics, much better than the males. Not true numbers, just giving an example.

Unless you happen to know that there were only 2 very motivated females in the PHD program, over 8 years, and both graduated, versus over 75 males of which some changed to a masters and left or changed to another PHD.

A final example is who defers their admission to a later term more - US citizens or International? The US percentage is 15% while International was 12%, so you might think more US defer which they do if you go against the grand total of all applicants. But if you say what percentage of US who get admitted then defer, versus what percent of international defer of international who get admitted the percentage changes dramatically. About 10% of US defer when balanced against the same group while 42% of Internationals defer balanced against their group.

Even then you also have to take into context the high deferral rate has been impacted by VISA issues due to COVID and political climates.

Anyhow, as I am sure most people know, context is very important when looking at statistics and any stat needs to be really understood as there is a reason people commonly say that you can make data back-up what you want it to.

This post isn't for or against any position - more just something I was thinking about that didn't seem to really be mentioned in the 6 pages I read.
 
Joined
Jun 4, 2008
Messages
3,959
Location
NH
Ah, I see. Let me rephrase my question.

Now that we've had all the data, and you've now had a nice bigger sample size, would you now agree that there's no definite guarantee that a game's high'er score is a guarantee to it's long term popularity?

Definite guarantee? No, that would be ridiculous. But from the data I can see that there is a clear correlation between the game having a higher score and being remembered fondly.
 
Joined
Jun 24, 2014
Messages
899
Definite guarantee? No, that would be ridiculous. But from the data I can see that there is a clear correlation between the game having a higher score and being remembered fondly.

Awesome, glad we got there in the end!
 
Joined
Nov 1, 2014
Messages
4,762
We got there 3 days ago.
If we think of the top games of the decade poll as a measure of popularity/longevity, I would say that the more popular and fondly remembered games tend to have better scores.
 
Joined
Jun 24, 2014
Messages
899
Yes, that's right, we got there when I started the thread.

What exactly were you disagreeing with me about that motivated you to do all the graphs?

Your conclusion is exactly what I was saying to Nereida.
 
Joined
Nov 1, 2014
Messages
4,762
Popularity isn't the same thing as objective quality anyways.

That would be the next logical step in the issue, yes.

But then we're verging on the "what is an RPG" thing, or, rather "what is a computer game" even.

If someone releases a completely bug-free 6000fps quantum graphics 10,000 resolution piece of software - but it's just you clicking random boxes to choose between looking at different animated sneaker adverts, is that a good quality game? And compared to what?

I would say that quality is indeed mostly qualified by popularity. But that mostly is only 60%, the 40% will be je-ne-sais-quoi. But that quality, and all it's constituent parts, are only really relevant in like-for-like comparisons. Is an armchair of as good quality as a well manicured Bonsai plant? Well, no-one would ever try to compare the two, as there's no relativity there. In order to compare, there must be something relevant to compare that thing with.
 
Joined
Nov 1, 2014
Messages
4,762
From the data I can see that there is a clear correlation between the game having a higher score and being remembered fondly.

That is a fair conclusion. If a game is good, it will always leave a longer-lasting impression. This is also often independent of how popular the game is, or how many players actually played the game.

We have games in all the sides of the axis popularity-quality that get ranked just as they deserve, without popularity playing a significant role.

Example of high popularity but bad score: Fallout76. A game many played, and whether it was a financial success or not, which is not in the discussion and nobody cares in the end but the company itself, this was a game that many rather had never existed, as it is reflected by its 56 score.

Example of high popularity but good score; Why, the one and only, Divinity Original Sin 2. Even today it gets played and reviewed ten times more than, for example, PF:KM, and still gets a 93 average rating. That's when a game is so good that doesn't matter how many people play it, the vast majority will find it to be really good.

Example of low popularity but good score; Disco Elysium. This is where stats can get skewed. Sometimes a game can be so niche that only a handful play it, and they usually love it and overpraise it because they are so thankful that this company made the one type of perfect game they love to see and have zero criticism to offer, in a blind exercise of adoration. Without a neutral point of view to give a fair critique, some games in this end of the spectrum tend to get a little inflated score. Still, if it gets the high score, it's for a reason. If anyone disagrees, they can play it and say why they disagree.

Example of low popularity and bad score; A ton. Don't need to name and shame. Any game that doesn't look worth playing doesn't get played, and the few that do, will leave a bad review. This case is too common, and unfortunately it visits the RPG genre a tad too often.


So when popularity doesn't play a role, the only factor that really matters is how good the game is as perceived by the players. We can agree that the more players that play the game, the more accurate and fine-tuned the score gets, but in the end, it is always objective, irrefutable data. When 95% of people who played a game like it, then it is a great game. When 70% players that play the game like it, then it was okay. It could have been better, and definitely there will be better alternatives in its same genre.

This also does not tell us that Gran Turismo with a 95 score is better than Pathfinder:Kingmaker with a 73 score. That's absurd. The games don't compare.

What it does tell us however, is that Gran Turismo is better than Need for Speed that got a 83 score, and that Pathfinder is worse than PoE, DAO and DOS games that all got 88-95 scores. This is true in the short, mid or long term. Some games may decay faster in popularity, but the quality perceived by those who play it will vary very little in general. As we already agreed, popularity and quality are not related. And we are not judging success, only how good the game is, according to the people who play it. From Bethesda's point of view, Fallout76 was a success, as it created significant revenue. It was still an awful game.


As a curiosity, I learned today that the absolute worst quest that there is in Pathfinder Wrath of the Righteous, the absolute most detrimental experience any player will have in the game is there because of feedback given by RPG Codex.

This also teaches us a lesson - niche audiences are self-entitled and dangerous to the quality of a game. They do not represent anyone and do not add any value to anything.


To close, I'm just going to commend you for the nice work with the graphs, I found interesting some of the discrepancies between large pools of RPG players and the tiny niche ones, and there is data that I could find valuable there, especially if I worked in the videogame industry! (or do I? ;) )

If you give me permission, I would love to show those graphs around to some buddies come Monday. Of course, if you don't I will respect that.

Either way, have a nice weekend!
 
Last edited:
Yes, that's right, we got there when I started the thread.

What exactly were you disagreeing with me about that motivated you to do all the graphs?

Your conclusion is exactly what I was saying to Nereida.

I never disagreed with you on that regard. I objected with you providing what I deem is insufficient data about your claim (showing Steam user ratings of just three games) and I disagreed with some of you opinions you expressed in this thread unrelated to the initial question. I simply did what I deem was a more in-depth analysis on the issue.
 
Joined
Jun 24, 2014
Messages
899
If you give me permission, I would love to show those graphs around to some buddies come Monday. Of course, if you don't I will respect that.

Either way, have a nice weekend!

Sure, I can give you (and I do give you) full permission to do anything you want with the spreadsheet and charts I made about our vote, however you need to know that I used a spreadsheet made by RPG Codex members to do the second analysis. I can only give you the permission to use the data I added (metacritic user and critic score columns and the charts), but I assume that is mostly what you need.

Have a nice weekend as well.
 
Joined
Jun 24, 2014
Messages
899
Example of high popularity but bad score: Fallout76. A game many played, and whether it was a financial success or not, which is not in the discussion and nobody cares in the end but the company itself, this was a game that many rather had never existed, as it is reflected by its 56 score.

This game is universally hated. People gave it a chance because of the fallout name but very little actually likes it.

It scores 2.6 in user score.
https://www.metacritic.com/game/pc/fallout-76



Example of low popularity but good score; Disco Elysium. This is where stats can get skewed. Sometimes a game can be so niche that only a handful play it, and they usually love it and overpraise it because they are so thankful that this company made the one type of perfect game they love to see and have zero criticism to offer, in a blind exercise of adoration. Without a neutral point of view to give a fair critique, some games in this end of the spectrum tend to get a little inflated score. Still, if it gets the high score, it's for a reason. If anyone disagrees, they can play it and say why they disagree.

This game is generally loved. It is just that it is "niche" that is sold relatively poorly.
https://www.metacritic.com/game/pc/disco-elysium

But the thing is within their genres big name mmo and unknown niche rpg. Fallout 76 sold poorly, and Disco Elysium sold really well considering all things. So sales as a measure of quality can apply here.

That said the original Planescape Torment game despite great reviews and great user score sold so poorly and the sequel… lets just say i dont know what they were thinking with that game. Which I have to say is one of the greatest crimes against gaming.

Sorry to go slightly off to a tangent. But i thought user score and critic score differences are relevant to this topic.
 
Joined
Sep 17, 2021
Messages
368
Back
Top Bottom