Did AlphaZero also have to learn that each piece has a value?

Question

When AlphaZero learned on its own that a queen has a value of 9, a pawn 1, knight 3, etc., then did it also need to learn on its own that a piece has a value? Or was it that the concept of value was baked into its code and it only had to figure out on its own what the values are. The value of a queen is not strictly 9, but it is relatively more valuable than a pawn.

It didn't learn on it's own that a "queen has a value of 9" because she doesn't. That's a highly simplistic rough evaluation humans have given her with no other context as a guiding tactic for new players. — Adam Barnes, Jun 01 '21 at 06:40
Not strictly 9, but only that a queen is worth more than a pawn, for example. Of course, exceptions exists, for example if the pawn can deliver a mate. It is very clear that Alphazero values each pieces differently, as this is prime requisite in position evaluation. If Google baked this notion of piece value, then Alphazero only had to find on its own the piece value, which could be dynamic based on the demands of the position. Hence, the notion of piece value is a-priori. — daparic, Jun 01 '21 at 11:25
I think you meant stockfish, alphazero doesn't assign a numeric value for pieces only for boards — Rainb, Jun 02 '21 at 04:22
The "Zero" in alpha zero is because it was "given no domain knowledge except the rules". https://en.wikipedia.org/wiki/AlphaZero. MuZero was a variant that didn't even know the rules! https://en.wikipedia.org/wiki/MuZero — Mooing Duck, Jun 02 '21 at 15:32
Pieces not having a value, by common sense alone, is a stupid thinking. How is arithmetic possible then, if pieces have no values? And without arithmetic, how can our current computer do its calculation? `And learning different rules shifted the value AlphaZero placed on different pieces: Under conventional rules it valued a queen at 9.5 pawns; under torpedo rules the queen was only worth 7.1 pawns` https://www.wired.com/story/ai-ruined-chess-now-making-game-beautiful/ — daparic, Jun 22 '21 at 04:42

Allure · Answer 1 · 2021-06-01T09:53:29.697

28

It sort of did. "Sort of" because once you examine how neural networks work it's not clear what AlphaZero is actually learning.

AlphaZero has a neural network evaluation function. How that works is that it takes as input the position on the board (along with other things like whether a pawn can capture en passant, whether castling is possible, etc) and converts that into a list of candidate moves to play as well as the probability of winning. There is no "queens are worth 9 pawns" input, there is only "there's a queen on this square". Of course, queens usually have quite an impact on a position, so the neural network will quickly learn that having queens strongly correlates with a high probability of winning, but nobody actually knows how much value the neural network ascribes to the queen.

What was definitely not given to AlphaZero was the idea that each piece has a value. Remember, it was only given the position on the board.

edited Jun 01 '21 at 09:53

answered May 31 '21 at 23:56

Allure

25,298
1
65
143

Actually, the neural part doesn't "converts that into a list of candidate moves". It simply evaluates each position. The MCTS part then lists the candidate moves and search the tree of potential moves. – Jeffrey Jun 01 '21 at 01:49
6

@Jeffrey are you sure? I am pretty certain that AlphaZero's neural network also provides a "move probability" for the next move, which is also how the MCTS decides which branch to search first. – Allure Jun 01 '21 at 01:54
5

Ok, I'll take it back. It outputs both an evaluation and a policy over moves: "a continuous value of the board state vθ(s)∈[−1,1] from the perspective of the current player, and a policy → pθ(s) that is a probability vector over all possible actions." from https://web.stanford.edu/~surag/posts/alphazero.html – Jeffrey Jun 01 '21 at 02:01
1

I had misread your answer as it outputting only a policy – Jeffrey Jun 01 '21 at 02:01

Nikodem · Answer 2 · 2021-06-01T14:09:08.947

The short answer is: No.

AlphaZero learns to evaluate only the position. The position consists of all the pieces and their placement on the board (plus castling and en-passant information). There is no way to distinguish the material's value from the position's strength -- which is a really beautiful idea and offers a lot of learning potential for humans. AlphaZero chooses its moves to maximize the (positive) value of the position. (The checkmate-win has the maximal positive value, the checkmate-loss has the maximal negative value.)

The information about the value of single pieces is stored nowhere. It can be potentially obtained by comparing position evaluations when removing/adding pieces to the board. But it will also influence the position's strength which cannot be separated from the material's value itself. So in best case some average value of pieces (when averaging over many positions) might be estimated in this way.

score 9 · Answer 3 · answered May 31 '21 at 18:43

9

One central piece of alphazero is the neural evaluation function. This function takes the board as input and produces a value as output. The only input is what pieces are on the board and where.

So, you can see it as the programmers enforcing this rule: the value of a position depends on which pieces are placed where.

This is what the AI "knew" as it as born, if you will. It knew that pieces being present and located relative to each other had value. Now, one possible way to score the board is a linear combination of "piece value" times "number of pieces of that kind". But in practice, the network learned a much more complex and non-linear function of the pieces present on the board.

It certainly could do what we often use: queen times 9 + knight times 3 + ... but it learned much better than that.

https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0

The deep neural network on the left gives you a hint of how much more complex the evaluation function can be, over this simplistic linear evaluation.

answered May 31 '21 at 18:43

Jeffrey

191
4

If we will tweak the rule of chess a little, say, such that if a piece will have on its left and right squares two knights and to its front a rook, then this piece becomes a queen, without making any code change, can Alphazero identify this notion? Or say, if white can check black's king then its knight has an option to become a queen on its next immediate move but not thereafter. – daparic Jun 01 '21 at 05:49
5

@eigenfield Yes, AlphaZero can identify this notion. What AlphaZero knows about the rules originally is that it can "play a move" by choosing a start square and an end square. Then it can "read ahead": when it selects a move in its head, it is provided with the resulting board position, or with a warning if the move isn't legal. Originally it didn't even know how pieces were supposed to move - it had to learn through trial and error that bishops move diagonally and rooks orthogonally, etc. – Stef Jun 01 '21 at 08:38
3

@eigenfield You would need to retrain the network though. – Taemyr Jun 01 '21 at 09:35
2

@Stef: I don't believe that. Are you **sure** that it had to learn castling and en passant by trial and error, not to say the 50-move rule? If those rules were not programmed into it, which of them did it fail to learn? – user21820 Jun 01 '21 at 10:11
2

@user21820 I seemed to recall the input included some kinds of **flags** on the game position, for instance answering the question "would this move repeat the position?" or "is castling still allowed in this position?", but it's possible it has a more than that. Input is described in [Deepmind's paper on arxiv](https://arxiv.org/pdf/1712.01815.pdf), starting page 12, "Domain Knowledge". – Stef Jun 01 '21 at 13:37
1

@user21820 So there is indeed a boolean input flag for each castling, but it's up to AlphaZero to understand what this flag means. – Stef Jun 01 '21 at 13:43
@Stef: That section is too brief to be helpful enough. It doesn't even mention "en passant". The point is, it cannot understand such rules. – user21820 Jun 01 '21 at 14:03
1

The "Domain Knowledge" in that paper says that it has "AlphaZero is provided with perfect knowledge of the game rules." which should include en-passant etc. And earlier in the paper they write "The action space for chess includes all legal destinations for all of the players’ pieces on the board" which also suggest that castling and en-passant are "known". So (likely) it doesn't know "en passant" as a specific rule, but it is told when it can be used and the result of moving the pawn like that, but it might have to learn that it must be used immediately (as it is only possible directly). – Hans Olsson Jun 02 '21 at 15:45

Flater · Answer 4 · 2021-06-01T12:27:24.210

6

From the comments below, maybe a good summary of this answer:

Alphazero doesn't necessarily respect piece value the way we do, but both Alphazero's experience and our piece valuation (which is effectively our human experience) are approximations of same the balance of power for a given board state.

Piece value is a form of bias, by inherent definition of saying that one piece is worth more than other. It's an artifact of how humans reason about the strength of a certain position/play. Our piece valuation, when taken as literal truth, suggests that a queen is always worth more than e.g. a knight or a pawn.

Part of what makes machine learning so interesting is that having an unbiased objective view can fundamentally shake things up. For example, in certain strategies you could do more with a knight than a queen, therefore throwing out the idea that a queen has more value than a knight.

However, this is just a small subset of strategies and positions, among million others. You already touched on this point yourself, when you commented:

Not strictly 9, but only that a queen is worth more than a pawn, for example. Of course, exceptions exists, for example if the pawn can deliver a mate.

Humans aren't able to mentally track different individual piece valuations for millions of different positions or plays. Therefore, humans use a valuation that works in most cases. In most cases, a queen is more useful than a knight. In most cases, a knight is more useful than a pawn. This is why our piece valuation is the way that it is.

However, machines don't have these limitations. They are much more able to track different piece valuations for each distinct board situation, and therefore a machine doesn't need to rely on some arbitrary global piece value, as it is a needlessly imprecise measurement.
For a machine, it makes a lot more sense to simply calculate each piece's value on the fly, rather than relying on some half-baked approximation.

Different strokes for different folks. Machines are better at this, and therefore they don't need the oversimplified crutch that our piece valuation represents.

In order to understand machine learning, and how the machine "thinks", you need to sidestep your own bias and preconceived observations. Unsurprisingly, sidestepping human bias is by far the biggest challenge in machine learning - more so than the raw learning algorithm in and of itself.

That being said, if you asked Alphazero to play a billion games, look at the victories, asked it to re-simulate those games after taking away one random piece, and then asked which type of piece had the biggest impact on turning a victory in a defeat, then it would very likely tell you that it's the queen.

That is sort of the same as saying "the queen is the most valuable piece", but in a much more pedantic and deeply analytical way. It is pedantic to us, but for Alphazero this isn't pedantry but rather necessary accuracy. One man's pedantry is another man's precision.

Our human piece valuation system isn't wrong, it's just very crude and unrefined.

edited Jun 01 '21 at 12:27

answered Jun 01 '21 at 08:35

Flater

197
5

Alphazero following throigh the same main lines variations that were carved by humans for hundred of years. If I see alphazero moving `h4` and then marching his `king` to the center of the board and deliver a win by so doing and in effect overthrows 99% of chess theory that humans have formed, then I will believe in what you said. But instead, alphazero followed the main lines, and occasionally find novel ones. Hundred years of human chess theory will only be a form of bias if I see alphazero places knight at `h3` and then marches the king to the center. – daparic Jun 01 '21 at 11:46
4

@eigenfield You may be misunderstanding. Human preconceptions aren't a bias to how AlphaZero works, they're a bias to *our understanding of* AlphaZero, as you can tell by your own question, which assumes from human bias that AlphaZero thinks pieces have relative value. – Magma Jun 01 '21 at 11:51
3

@eigenfield: The balance and tactics involving chess is significantly more fine-grained and nuanced than humans having missed the tactic of "marching [your] king to the center of the board and deliver a win by so doing". Humans have reasonably approximated a good overall strategy, but not to the extent that a machine can. Those "main lines" you point out are effectively "the human approximation". The fact that Alphazero **independently** came up with a similar approach is proof that humans did a reasonably good job at approximating, i.e. we were on to something, but with less detail. – Flater Jun 01 '21 at 11:53
2

@eigenfield: However, it's important to note that Alphazero **independently** learned this. If we were to tell it the piece values beforehand, then it wouldn't be doing it independently, it would be replicating our human tactics. Machine learning is in some ways a pain to work with, but the main benefit is striving towards an unbiased, cold, analytical, objective confirmation of what humans subjectively and intuitively suspected. Telling Alphazero about piece value achieves the opposite effect. – Flater Jun 01 '21 at 11:56
2

@eigenfield Maybe a better way to explain: You said _"But instead, alphazero followed the main lines"_ and in doing so, you implied that Alphazero followed a road that humans had already built. That is not the case. Alphazero was giving a fresh start, and told to figure out its own way. It looked at the surroundings, figured out what path would be the most efficient, and Alphazero **happened to come up with** a road that is eerily similar to the road humans already made. Alphazero's proverbial "road" is confirmation that humans made a reasonably good choice of where to put the road. – Flater Jun 01 '21 at 12:01
1

In most part, indeed yes. But Alphazero still cannot correctly (this needs citation) evaluate the positions posed by Penrose. To me, this is clear indication that Alphazero considers highly sum of piece value, which is the basis of its evaluation. – daparic Jun 01 '21 at 12:21
3

@eigenfield: Phrasing matters here. Alphazero doesn't necessarily respect piece value the way we do, but both Alphazero's experience and our human piece valuation (which is effectively our human experience) are approximations of same the balance of power for a given board state. – Flater Jun 01 '21 at 12:25
1

@Magma but chess pieces do have relative values as humans sees it, and so does Alphazero sees it in the course of its learning. Pieces have relative values, this is truth#1, I would be surprise if such notion does not exist within Alphazero. The fact that it played well, means it learned those values like a human did. Maybe the value of a Queen is not 9 for alphazero, but still it still assigns a value to a piece, based on position dynamics. – daparic Jun 01 '21 at 12:31
1

@Flater, If you check the Penrose chess position, a human can immediately tell that it's a draw despitte adding all the piece values. But can Alphazero calculate correctly the position? (needs citation) – daparic Jun 01 '21 at 12:46
1

Seeing ML or AI being the de facto arbiter of things, to show us the way could be a misleading naive belief until it can be shown that Alphazero can correctly evaluate the Penrose chess position. – daparic Jun 01 '21 at 13:03
4

@eigenfield "Pieces have relative values, this is truth#1" [Citation needed] This seems to be where your bias comes in, and where you are fundamentally misunderstanding things. Alphazero does not start with "truths" like this programmed into it, and it does not necessarily simplify its neural model to conform to it (unlike humans!). This seems to be more of an issue of you not understanding neural networks than anything else. – Carl-Fredrik Nyberg Brodda Jun 01 '21 at 13:22
4

@eigenfield: _"The fact that it played well, means it learned those values like a human did."_ Playing the game well is not measured by adherence to the piece valuation. You're conflating two very different things here. Think of it this way: a cheetah uses its legs to move forward very fast. A tuna can move forward just as fast, but it has no concept of legs. Or a zeppelin and a helicopter both hover in the air, but for very different reasons. You're confusing _doing_ something (moving forward, hovering, winning at chess) with _how_ you do something (leg movement, buoyancy, chess strategies). – Flater Jun 01 '21 at 13:23
2

@eigenfield: In other words, piece valuation is _one_ way of expressing advantage and disadvantage on the board. It's not a perfect expression, but it is a reasonable estimate. However, just because this way exists, doesn't mean that other ways cannot possibly exist. There is no reason to require that Alphazero must invariably have come up with the same concept of piece evaluation as we did. The entire point of machine learning is to see if machines can come up with **different** solutions to the same problem. – Flater Jun 01 '21 at 13:26
1

@eigenfield https://en.wikipedia.org/wiki/AlphaZero "...called AlphaZero's play style "alien": It sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a positional advantage. "It's like chess from another dimension." "grandmaster Peter Heine Nielsen likened AlphaZero's play to that of a superior alien species. Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero's play as "insane attacking chess" with profound positional understanding." "The program often made moves that would seem unthinkable to a human chess player." – Mooing Duck Jun 02 '21 at 15:41
@Carl-FredrikNybergBrodda, I am curious if Alphazero can correctly evaluate the Penrose position https://www.chess.com/news/view/will-this-position-help-to-understand-human-consciousness-4298 . From that position, black would appear winning from the perspective of a computer, despite in reality it is a draw. My rough impression of NN is that it is just doing weight-adjustments, and could never discover something entirely new. Stuff that are not assigned a weight, etc. I guess, this is the gist of my question. – daparic Jun 03 '21 at 06:29
@eigenfield "My rough impression of NN is that it is just doing weight-adjustments, and could never discover something entirely new". Your impression is wrong. – Carl-Fredrik Nyberg Brodda Jun 03 '21 at 08:53
@eigenfield: If you don't know how neural networks work, which you clearly don't based on your last comment, then theorizing about it and assuming your theories must be correct is simply not productive. – Flater Jun 03 '21 at 08:53
I'm not theorizing. Piece value is one of the basis of evaluation upon which arithmetic operations are possible. Whatever NN is doing, it all boils down to a certain degree to the concept of piece value when assessing a position.. – daparic Jun 15 '21 at 05:09
@eigenfield: You said yourself that piece valuation is not a perfect system (_"Of course, exceptions exists, for example if the pawn can deliver a mate."_). If it's not a perfect system, then it's an approximation, and if it's an approximation, it's not the inevitable core of the game of chess. You keep conflating "I have one way of approaching this" (humans doing piece valuation) with "There's no other possible way of approach this" (Any NN playing chess must invariably use piece valuation). The latter is **not correct**. – Flater Jun 15 '21 at 07:39
@eigenfield _"Piece value is one of the basis of evaluation upon which arithmetic operations are possible."_ Who says AlphaZero is approaching board state arithmetically? If you know anything about NN and machine learning, it's considerably less about calculation and much more about logical analysis and memory of past experiences. Your understanding of what machine learning entails and how it must work is so fundamentally wrong that it renders this question unanswerable if you're not willing to accept that your layman's assumptions are not correct. – Flater Jun 15 '21 at 07:41
I understand that NN is not arithmetic and that NN is way more than arithmetic. And that it involved learning algorithms from past experiences, pattern recognitions etc. But at the lowest level, one way or another, it has to add, subtract numbers for whatever purposes it may serve on whatever it's doing. After all, it runs on top of a computer. And so, it assigns numeric values to pieces at one moment, such that it can calculate and make decisions. At this moment, values are temporarily assigned to pieces based on its heuristics etc. It may look to a future possible positions... – daparic Jun 15 '21 at 11:15
... but still the same, recursively it does the same as it did before it, which always adds or subtract and that it involved temporary values assigned to pieces. This putting of temporary values to pieces at these moments of calculation is what my question was. What if the value exists in the square instead, for example? So my question was, this piece value assignemt is an introduced bias by a human – daparic Jun 15 '21 at 11:18
I present the Penrose positions as proof that certain higher meta information that is not from the board exist, and that Alphazero cannot find out what this meta information is because it is not in the fixed bias programmed into it. Otherwise, citation is needed if indeed alphazero can tackle the Penrose position. – daparic Jun 15 '21 at 11:27
@eigenfield: About the Penrose positions: _"Computers struggle, mainly because of the presence of the three bishops, **which add enormously to the number of positions that need to be calculated**."_ The issue with the Penrose positions is practical computational complexity, rather than the principle of NN and machine learning. You're taking a practical obstacle and are using it to argue against the theory, which is not sound reasoning. There is no claim that AlphaZero learns faster (= by playing less games) than a human, which is the only conclusion you could draw from the Penrose positions. – Flater Jun 16 '21 at 07:46
@eigenfield _"But at the lowest level, one way or another, it has to add, subtract numbers for whatever purposes it may serve on whatever it's doing."_ Even if so, who says those numbers relate to some arbitrary value assigned to a piece, unrelated to its position on the board. Because let's be very clear here, you've been claiming that AlphaZero must invariably use some form of **piece valuation**. Your initial claim was not that it was using "some numbers". – Flater Jun 16 '21 at 07:48
@eigenfield _"And so, it assigns numeric values to pieces at one moment, such that it can calculate and make decisions."_ Even if it assigns numbers, which again is not proven nor relevant, there is no reason to assume that those numbers directly correlate to a piece's type and nothing but its type. I've put this quote of yours into the spotlight because **this is the faulty assumption you've been making from the beginning**. I'm quite frankly tired of having to repeat it over and over, just to have you repeat your faulty assumption without listening to given answers. – Flater Jun 16 '21 at 07:49
@eigenfield: Your assumption that piece valuation is the only way to understand chess is the equivalent of me assuming that you must invariably have traveled to school by train, purely because that's what always got _me_ to school the fastest. Just because going by train is one way of doing so, does not mean it's the only way. Maybe I didn't have access to a car, and you did, which means that you were able to use a car whereas I could not. Analogously: humans can't do complex considerations (deep logic, the car) and therefore settle for the best they can handle (piece valuation, the train). – Flater Jun 16 '21 at 07:54
@Flater, piece have values and this is core to Alphazero evaluation system. Kindly read https://www.wired.com/story/ai-ruined-chess-now-making-game-beautiful/. It may look into other angles to evaluate the position (past or future), but it still boils down to piece valuation and assignment at any given position, based on whatever heuristic it learned. – daparic Jun 22 '21 at 04:47
@Flater, I'm assimilating your every comments as addition to my learning. Thanks. And it seems that you are not that deep into chess and how it works. You cut-and-paste that comment regarding the Penrose position and you literally believe it? What complexity did the 3 bishops added? Nothing. There is a need for ML to really dive a little into the domain that it tries to model. Failure to do that will not yield good results. – daparic Jun 22 '21 at 05:20
@eigenfield: It is disingenuous to ask a question and then refute any given (and repeatedly corroborated by others) answer, in favor of sticking to your preconceived answer to the question you asked. If you already "knew" that and weren't going to change your mind, why bother asking the question? – Flater Jun 22 '21 at 08:27
@Flater at the moment of asked, I had no preconception and I knew nothing. I laid upfront to the table my ignorance about the subject in hopes that a matching answer will arrive. But I also did active research on my own regarding the question. And it did arrived, by presenting the whitepaper showing that alphazero do have piece values assignments. Perhaps, my question was too low-level and that your explanations are at a higher-level. Pieces do have values assigned to them. To a certain degree I find that `NN` is an irresponsible kind of `ML` bcoz it evades deterministic causal explanations. – daparic Jul 01 '21 at 04:23

score 6 · Answer 5 · answered Jun 01 '21 at 21:25

AI engineer here.

The answer to this question is hiding in the name: AlphaZero. This refers to "zero prior knowledge of the game". AlphaZero can also play Go.

Now, it might be a bit strong to say that AlphaZero has absolutely no knowledge about chess. AlphaZero knows that it's a turn-by-turn board game, and that the 3 possible outcomes are a win for either side or a draw. (unlike Go, which has no draw). But AlphaZero had to figure out itself how pieces contributed to winning.

Since AlphaZero is a Deep Neural Network, it can quickly learn highly complex valuation functions. That is to say, it can learn a valuation function which includes how well each piece cooperates with other pieces.

What might also be surprising is how Google got started with these valuations. They simply started with entirely random values! The learning algorithm just tweaked the values step by step, and gradually won more and more games.

So just as I originally asked, they baked the concept of the existence of a value to a piece. And then, through its learning process, it learned these values depending on the demands of the position. What would have happen if the value existed on the square and not on the pieces? If they baked the notion of value to the pieces, then it would be oblivious that the value is in the squares. So, they will then re-baked it again and tell it where to value. That is, requiring human interference. This is what my question was. — daparic, Jun 03 '21 at 06:06

user134593 · Accepted Answer · 2022-06-06T19:08:35.193

6

In short, no there was no explicit concept of relative piece importance coded in. But the position evaluation is based on a neural network which possesses weights which indirectly account for the pieces in the position.

That said, in this AlphaZero paper they use self-play games to fit a linear model that predicts the game outcome from the difference in numbers of each piece only, result:

edited Jun 06 '22 at 19:08

answered Jun 14 '21 at 10:56

user134593

208
1
7

1

My original question was, did alphazero also have to learn that values can be placed under a piece. And the answer to this is no. This concept that a numeric value can be placed under a piece is pre-programmed. That is, it is already been told a-priori that there are empty buckets out there that it needs to fill-in. Otherwise, it cannot construct from out of the blue, these empty buckets. However, Alphazero can place optimal values under each piece based on a given position and what it learned. – daparic Jun 22 '21 at 05:00
@eigenfield: AlphaZero was not taught that and it doesn't do it at all. It only evaluates entire boards, and doesn't do anything piece by piece. There are no buckets. The values in this answer were *computed by humans* based on scores AlphaZero gives to various positions with and without material differences, they're not numbers that AlphaZero uses internally at all. – RemcoGerlich Jun 06 '22 at 19:31
Then who taught alphazero the requisite conditions for a win? Why will it strive for a win? Why not for a lose? From these basic ponderings, it is very clear that some initializations has to exists. This is the question. – daparic Jun 07 '22 at 03:16
If a computer can correctly judge the "Penrose position", then I, for one, welcome our new overloads. But this is not happening yet. Therefore, putting too much trust on A.I. is a dangerous concept. – daparic Jun 07 '22 at 03:18
It will strive for a win because if it wins a game it will receive a positive reward in reinforcement learning, so it will be more likely to repeat actions that led to that win in the first place. (But that topic is beyond this question.) – user134593 Jun 07 '22 at 09:24

score 1 · Answer 7 · answered Jun 01 '21 at 21:02

1

This is a short answer, but hopefully it helps.

Consider how AlphaZero values pawns. It throws them away in disgust sometimes, usually to open up files or diagonals on which to plant menacing rooks and bishops. AlphaZero doesn't care one bit about the nominal value a piece may have (I'd argue most human players also don't care about a piece's value).

answered Jun 01 '21 at 21:02

stevec

2,077
11
27

1

Indeed so. There are times when a pawn, due to its board position, is worth far more than a queen. – Rafael_Espericueta Jun 02 '21 at 20:15
All of you have misunderstood the question. How would alphazero adapt, if the rule is such that, the quicker you move-respond, then one points will be added to you. And then, at the end of the game, these more-respond points gets added and compared and the bigger points wins. Hence, the piece values no longer matters here, as the condition for a win is as I described. Can alphazero find out this condition for a win? – daparic Jun 07 '22 at 03:41
@eigenfield I'm not sure what you mean. But I don't think AlphaZero cares about the value of a piece, that is, it doesn't look a queen and say "it's worth 9", instead it evaluates an entire *position* (not individual pieces), and then plays the move that maximises its expected win probability, at least, I think that's how it works. – stevec Jun 07 '22 at 03:47

Did AlphaZero also have to learn that each piece has a value?

7 Answers7

Linked