12

There has been speculation that anomalous values of a correlation metric1 may hint to the use of outside assistance.

Example

Examination of Hans Niemann vs Matthieu Cornette surprises:

Hikaru Nakamura:

this looked like a perfect game

Yosha Iglesias:

one-hundred percent - wow!


Context:

  • 90% Sébastien Feller (Paris 2010 - known to be cheating)
  • 72-75% Correspondence World Champion (pre engine era)
  • 72% Bobby Fischer during 20 consecutive winning streak
  • 70% Carlsen at his best
  • 69% Kasparov at his best
  • 62-67% Super GMs
  • 57-62% GMs

Question

How often do super Grand Masters score 100% in this correlation metric. For example, 1/1,000, 1/1,000,000 1/1,000,000,000 etc.

Notes

Fischer had 0 games at 100% in his entire career

  • Some methods for calculating the correlation statistic can be made to exclude book openings (and even more impressively book openings at the time of the game).

  • It may be sensible to exclude short games (e.g. Junior Speed Chess Champion, Arjun Erigaisi, is said to have 1 game at 100%, however, it was a 10-move game; the shortness of this game is notable).

  • In the days following the release of Yosha's video, there has been some confusion around the methods used to attain the alleged 100% engine correlation. Grand Master Ben Finegold dismisses the idea of them entirely:

A lot of the information on the internet which says Hans played 100% in all these games, that's all nonsense. ... None of that is true.


1 Let's Check

Evargalo
  • 15,326
  • 46
  • 61
stevec
  • 2,077
  • 11
  • 27
  • 4
    What is a 100% game? – David Sep 27 '22 at 15:20
  • @David in this case strictly 100% by the chessbase '*Let's check*' metric. Naka and Yosha read it in their videos; I hadn't encountered it before. Please note (I *think*) it's distinct from the oft used accuracy metric – stevec Sep 27 '22 at 15:21
  • 1
    @David there seems to be a mathematical description of the metric in the chessbase computer program. It is on screen during Yosha's discussion (perhaps someone with chessbase can copy/paste the definition for those of us without the chessbase program) – stevec Sep 27 '22 at 15:25
  • @David although I would not rely on information [here](https://www.reddit.com/r/chess/comments/xnv9r9/fm_yosha_iglesias_finds_several_otb_games_played/) one of the top comments suggests that user (and presumably the many who upvoted it) were also keen to discover how to interpret the metric. – stevec Sep 27 '22 at 15:30
  • 8
    I still don't know what you mean. Are you referring to that "engine correlation score" metric they're talking about in the linked page? In that page I'd say that's far from being a decisive measurement to detect cheating? For what it's worth, you can play 1.e4, resign on the next move and get a 100% engine correlation score – David Sep 27 '22 at 15:32
  • @David yes. Spot on. If that is what it should be called please let know or feel free to edit the question – stevec Sep 27 '22 at 15:34
  • 3
    @David given many of us (including naka, based on his comments from his analysis) don’t actually understand the correlation/“let’s check”/chessbase metric, do you think it sh/could justify its own question (i.e. how is x calculated and how should it be interpreted)? – stevec Sep 27 '22 at 19:27
  • Well, a metric like that will always have problems with its definition. The engine you choose, the machine it runs on, the time it's allowed to "think" and so on... will always be arbitrary choices and can skew the results in either direction. – David Sep 27 '22 at 20:23
  • Black had 77% correlation in that video. How does that work if 77% is superhuman? – Allure Sep 28 '22 at 01:15
  • @Allure there several games with 100% – stevec Sep 28 '22 at 02:56
  • Related: [ChessBase 12 Documentation](http://help.chessbase.com/Reader/12/Eng/index.html?lets_check_context_menu.htm) on the "Let's Check" engine/game correlation analysis. "_The current record for the highest correlation (October 13th 2011) is 98% in the game Feller-Sethuraman, Paris Championship 2010._" The documentation here doesn't precisely (mathematically) define this metric. – SecretAgentMan Sep 28 '22 at 18:01
  • 1
    @SecretAgentMan thank you I’ll read it thoroughly. I cannot vouch for the veracity of the statement but Yasha comments on the Chessbase documentation being out of date stating that Feller’s 98% is no longer the highest. I’ll try to find the exact time in the video – stevec Sep 28 '22 at 18:03
  • 1
    @stevec The Feller part isn't important (I expected that such a comment wasn't current). I've +1 your Q and the point of the comment is that there is no precise definition of the metric in the documentation. I'd be interested in seeing a few examples of high correlation super GM games. I don't have a good feel on what this metric is without seeing the mathematical definition. – SecretAgentMan Sep 28 '22 at 18:50
  • 1
    Possibly relevant: On the [Perpetual Chess Podcast](https://www.perpetualchesspod.com/new-blog/2022/9/28/bonus-pod-gm-jonathan-rowson-and-gm-david-smerdon-discuss-the-carlsenniemann-saga), GM David Smerdon, PhD explains that in one of Niemann's 100% correlation games (using ChessBase Let's Check tool), Niemann has a blunder and an average centipawn loss (ACL) of 30. – SecretAgentMan Sep 29 '22 at 16:23
  • @SecretAgentMan thanks for more useful info. This seems very odd. I don’t think it has been fully analysed yet (in public/open source) but it doesn’t seem that Hans’s results are anomalous in any evaluation technique other than *Let’s Check*, which (I think) is concerning, in that his supposed 100% games may not be 100% engine correlated at all. Is this what you make of it? – stevec Sep 30 '22 at 11:30
  • Forced moves count toward the correlation. During tactical sequences, there is often only one move in each position and that is true for several moves straight. So it is expected that even a mediocre player without engine assistance would have a high correlation during tactical / forcing sequences. At the least, you should remove forced moves from the correlation analysis, that would immediately be a better metric than let's check for detecting anomalies. How much of your play is forced depends on your opponents' ELO, which varies by the types of tournaments you play, must account for that too – rajb245 Oct 04 '22 at 17:05
  • It may not be possible to answer this at all if "correlation metric" can be manipulated by simply more analyzing users using more engines. – qwr Oct 04 '22 at 22:23

3 Answers3

7

It depends on how many people analyze his games. The more different engines are used to compare a player's move, the more likely a match with an engine moves.

There is even evidence that people (e.g. ChessBase user gambit-man) uploaded engine moves of many old and unusual engines to the cloud analysis of Niemann's games in order to make him look even more suspicious.

There has been speculation that anomalous values of a correlation metric1 may hint to the use of outside assistance.

ChessBase documentation states, that the correlation metric is not suitable for cheat detection and can be manipulated, as any user-generated cloud content.

Hauptideal
  • 5,632
  • 14
  • 31
6

According to this tweet:

I analyzed every classical game of Magnus Carlsen since January 2020 with the famous chessbase tool. Two 100 % games, two other games above 90 %. It is an immense difference between Niemann and MC. Niemann has ten games with 100 % and another 23 games above 90 % in the same time

Carlsen's two 100% games were his 9th round win in the 2022 Tata Steel tournament against Shakhriyar Mamedyarov and his 12th round draw! in the 2021 Tata Steel tournament against Wojtaszek. That's 2 games out of 107 analyzed.

Scoring 100% is harder against better opposition (the more your opponent blunders the easier it is to play the best move and vice versa).

SecretAgentMan
  • 3,467
  • 2
  • 12
  • 44
Brian Towers
  • 92,895
  • 10
  • 227
  • 372
  • 1
    This suggests 100% correlation games are *far* more common than first thought (based on Fischer apparently having none!); perhaps as common as 1/100 or thereabouts for super GMs – stevec Sep 28 '22 at 22:23
  • 8
    @stevec You have to remember that current computers and their evaluations reflect current theory and vice versa. Examining Fischer's games, reflecting 1970's theory, is not going to be flattering for him. Doing the same analysis in 50 years time is likely to be equally cruel to 2020's Carlsen. – Brian Towers Sep 28 '22 at 22:31
  • I read in another forum that chessbase can calculate % based on the book openings of the time, although I don’t have a reliable source. – stevec Sep 28 '22 at 22:47
  • Wait, did Wojtaszek also play >90% to draw a 100% Carlsen? Or did Carlsen agree to a draw when he was in a better position? – insipidintegrator Sep 29 '22 at 03:55
1

Well, it is pretty easy to calculate. Let's do the exercise for Magnus Carlsen, who has an engine correlation of 70%, which seems reasonable given that similar strong players like Fisher and Kasparov achieved similar values.

So the probability that Carlsen makes 20 out of 20 moves like the engine is 0.7^20 = 1/1253, i.e. about 0.1%. However, games are typically longer. If we go for 30 out of 30 moves, the probability is 0.7^30 = 1/44366, i.e. about 0.002%.

So, if one longer 100% correlation game (30 out of 30 moves excluding opening prep) occurs in 40 000 games of Carlsen, that would be totally normal.

Now let's look at the probability that Carlsen has K longer 100% correlation games in a total of N games. Here, we need the binomial distribution. The probability is (1/44366)^K * (1-1/44366)^(N-K) * (N choose K) Let's consider an example: Let's say Carlsen plays N=1000 games and gets 5 times 100% correlation. The probability for that to happen is: (1/44366)^5 * (1-1/44366)^995 * 1000999998997996/2/3/4/5 = 1 in 21 trillions.

Now, many of Niemann's 100% games were not that long, which is in his favor. So let's go with 20 out of 20 moves like the engine and 6 out of 500 games having 100% correlation. We take the engine correlation of strong super GMs instead of Carlsen's, i.e. 67%. The probability for that to happen is (0.67^20)^6 * (1-0.67^20)^494 * 500499498497496/495/2/3/4/5/6 = 1 in 42 million.

The author teaches graduate courses on probability and statistical inference.

  • 7
    As I'm sure you know, GIGO. Your assumptions are wrong. – Brian Towers Oct 01 '22 at 17:14
  • 3
    "The author teaches graduate courses on probability and statistical inference" If "the author" is a fancy way of saying "me", then this answer seriously calls into question your qualification. Why are you "calculating" the percentage? It's an empirical matter, not a matter of calculation. What you seem to be doing is calculating the "expected" percentage, based on a binomial distribution, but the binomial distribution comes from independent Bernoulli trials, and the idea that each move is independent is absurd. – Acccumulation Oct 07 '22 at 01:31