Two Tier Scoring

Use this forum to discuss Diplomacy strategy.
Forum rules
This forum is limited to topics relating to the game Diplomacy only. Other posts or topics will be relocated to the correct forum category or deleted. Please be respectful and follow our normal site rules at http://www.webdiplomacy.net/rules.php.
Message
Author
Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#41 Post by Restitution » Thu Jan 09, 2020 1:04 am

jay65536 wrote:
Wed Jan 08, 2020 1:51 pm
At the same time, because the top 2 powers' point totals will simply depend on their center counts, there is no real need for them to fight; while they don't care about draw-whittling, they also care less about competing. The 11-center powers don't care where they get their next centers from. However, the bottom 2 powers do care about draw-whittling.
Why is this bad? To me, "The 11-center powers don't care where they get their next centers from" is a good thing - not caring about where your SCs come from, and only wanting to get them, is the same behavior that a solo-attempting player would make.

I would be down to playtest, if you made a game.

tr1285
Posts: 59
Joined: Tue Oct 23, 2018 8:25 pm
Location: NJ, USA
Karma: 79
Contact:

Re: Two Tier Scoring

#42 Post by tr1285 » Thu Jan 09, 2020 7:38 pm

Puscherbilbo's idea should be fairly easy for the masses to grasp and not too different from what I was proposing. But I think SoS can be at times hard to predict (without a calculator) how centers changing hands are going to affect the score, so PPSC would be better.

In my opinion an 11-center power should get more points from a draw than a 6-center power, and a 6-center one more than a 1-center one.

I would be interested in play testing some alternative scoring system(s), but I hope we could have some consensus first on a good one to try. I think there is a lot of room for improvement over the current systems. I might even end up volunteering to implement the code for a new one if that's needed.

If it is very important for the board top to get some special treatment, I think I have a possible compromise solution:

1. A single board top gets their points by PPSC (# centers / total owned centers), which is a number that doesn't change when the number of remaining players does
2. The remaining players, or all players if no single player has a board top, are awarded points by the average (50/50) of DSS and PPSC, after factoring out the leader's points and centers.

Here are some examples of how the calculations would come out: (I'm using percentages of the pot)

11/11/6/6 = 29,29,21,21
12/10/6/6 = 35,25,20,20
11/10/7/6 = 32,26,22,20
17/17 = 50,50
17/16/1 = 50,36,14
16/12/6 = 47,31,22
16/12/4/2 = 47,27,15,12
16/16/2 = 40,40,20
14/10/10 = 41,29,29
8/8/7/6/5 = 22,22,20,19,17
9/7/6/6/5/1 = 27,18,16,16,15,9

Seems fair enough to me. Even very small powers are usually going to take at least 10% of the pot just for surviving.

It would also be nice if any new scoring system was easily translatable to other variants, not depending on any magic numbers: 34, 14, 12 etc.

Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#43 Post by Restitution » Thu Jan 09, 2020 10:26 pm

tr1285 wrote:
Thu Jan 09, 2020 7:38 pm
I might even end up volunteering to implement the code for a new one if that's needed.
I can implement it too. I've spoken to JMO and bo_sox, and the bigger challenge is going to be proving that a 2-tier system is necessary, which will require testing.

User avatar
foodcoats
Posts: 3593
Joined: Mon Jan 15, 2018 7:34 pm
Karma: 1562
Contact:

Re: Two Tier Scoring

#44 Post by foodcoats » Fri Jan 10, 2020 1:31 pm

:evil: Is any of this relevant outside of tournaments? Isn't GR better than points when it comes to the "eternal" ranking of cash/pick up games? :evil:

I totally get why you'd want to develop scoring systems that incentivize a certain kind of play (or that avoid incentivizing certain kinds of "bad" play) in a tournament environment where you have fixed parameters (# of games, specified player pool, time limit, etc.), but I'm not sure it makes sense to implement such a scoring system for non- tournament play. :points: Diddle coins :points: are fun and all but the only true evaluation of Diplomacy play is solo, draw, survival or elimination. Tournaments necessarily must compromise their qualitative purity to establish quantitative differences between the overwhelming proportion of draws. But webDip has no such limitations and therefore no need to profane itself thusly.

But, by the same token, it wouldn't really matter to me if such a system were implemented - I already "play DSS" in SoS games and would play DSS if I found myself in this sort of game, too. But I remember reading or hearing somewhere that one of webDip's design philosophies is simplicity and ease of entry for new players. My recommendation would be to make any tourney scoring, such as this or SoS, available to TDs but hidden from regular games.

Claesar
Posts: 1965
Joined: Tue Oct 03, 2017 10:34 am
Karma: 1490
Contact:

Re: Two Tier Scoring

#45 Post by Claesar » Fri Jan 10, 2020 2:16 pm

foodcoats wrote:
Fri Jan 10, 2020 1:31 pm
:evil: Is any of this relevant outside of tournaments? Isn't GR better than points when it comes to the "eternal" ranking of cash/pick up games? :evil:
...
GR also takes the scoring system into account. If you care about your rating, you'd do well to play accordingly.
3

Octavious
Posts: 3844
Joined: Fri Sep 29, 2017 4:16 pm
Location: The Five Valleys, Gloucestershire
Karma: 2605
Contact:

Re: Two Tier Scoring

#46 Post by Octavious » Fri Jan 10, 2020 2:47 pm

Claesar wrote:
Fri Jan 10, 2020 2:16 pm
GR also takes the scoring system into account. If you care about your rating, you'd do well to play accordingly.
You'd do even better to not care about your rating :)

User avatar
foodcoats
Posts: 3593
Joined: Mon Jan 15, 2018 7:34 pm
Karma: 1562
Contact:

Re: Two Tier Scoring

#47 Post by foodcoats » Fri Jan 10, 2020 4:02 pm

Claesar wrote:
Fri Jan 10, 2020 2:16 pm
foodcoats wrote:
Fri Jan 10, 2020 1:31 pm
:evil: Is any of this relevant outside of tournaments? Isn't GR better than points when it comes to the "eternal" ranking of cash/pick up games? :evil:
...
GR also takes the scoring system into account. If you care about your rating, you'd do well to play accordingly.
Ahh, okay, I didn't know that! I am clearly a luddite. I thought GR was a sort of "pure Elo" system (not that I really know what "pure Elo" means but... in any event...). Thank you for the clarification. :)

Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#48 Post by Restitution » Fri Jan 10, 2020 6:04 pm

https://discord.gg/qRH5Du

I made a new discord for anybody who wants to test 2ts.

tr1285
Posts: 59
Joined: Tue Oct 23, 2018 8:25 pm
Location: NJ, USA
Karma: 79
Contact:

Re: Two Tier Scoring

#49 Post by tr1285 » Fri Jan 10, 2020 8:47 pm

I mainly just think there should be a more moderate scoring system between the two extremes we have today. On the one side DSS gives the same points to all draw participants. Number of centers completely doesn't matter. On the other extreme, SoS rewards large powers and penalizes small powers. In many cases, SoS can give over 50% of the pot to a leader who has failed to solo, and it could go as high as 80+% with certain center distributions. If you control 3 centers or less you likely don't have enough skin in the game to justify putting much effort in any more.

I don't know the whole story of why PPSC system was discontinued on this site, but it seems the main problem was it awarded points to losers. So why not just modify it to winner-take-all and in the case of draws, award points proportionally?

Two-tier scoring is maybe not my first choice for an alternative, but at least it's an alternative.

Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#50 Post by Restitution » Fri Jan 10, 2020 9:58 pm

tr1285 wrote:
Fri Jan 10, 2020 8:47 pm
Two-tier scoring is maybe not my first choice for an alternative, but at least it's an alternative.
The reason 2TS is so good is because it elegantly completely removes the incentive for the top player to cut the draw, replacing it entirely with an incentive to nab centers. When I figured this out it blew my mind.
1

RoganJosh
Silver Donator
Silver Donator
Posts: 556
Joined: Sun Dec 31, 2017 1:02 am
Location: Stockholm
Karma: 464
Contact:

Re: Two Tier Scoring

#51 Post by RoganJosh » Sat Jan 11, 2020 3:16 pm

This post will only be about the solo-incentives, and not about the draw-whittling aspects. Anyway. F* me, I put way to much sambal oelek on these potatoes.

Mercy, I don't even think you contemplated about how small 1/9 is. It seems it's rather a principal question for you - that the board leader should have nothing to lose going for the solo. Or as Jay said in the article: the scoring system should not punish people for trying to win. I'm gonna try to explain why this is a bad idea.

First, let's just all remind ourselves:

That option A could potentially yield higher rewards, does not necessarily imply that the player has any incentives to choose option A.

It's funny because you all know this. This is Mercy's complaint about DSS: despite the solo yielding the highest reward, a player might not have incentives to play for the solo. But you seem to forget that this is true for all players, not only the current board leader. And you keep posting these end-of-game point distributions, with some intuitive interpretations of how they affect incentives, without actually checking what these scoring systems do to incentives. If you only care about the board leader then that is often enough but - no - you need to look at all players.

Man, these potatoes are spicy.

Let's go back to the Germany example from the article. Before math, maybe just a comment about "playing for the solo." Because, it's not only Germany that has a choice. The duo Austria/Italy also has a choice: securing the four way draw or try for the three way draw. Now, playing for the solo is often a balancing act of seeming innocent enough so that Austria/Italy keep fighting Turkey and then - at the right moment - be aggressive.

So, I used the following model:
* If A/I decides to secure the four way draw, then the game ends G/A/I/T.
* If G settles for a three way draw, and A/I attacks T, the the game ends G/A/I
* If Germany goes for the solo, and A/I attack T, then the game ends G solo with probability p and it ends G/A/I/T with probability (1-p).
Turkey has no choice - his tactics is just a function of A/I. Since there is no separation between 2nd and 3rd place in either system (in this scenario!), we can treat A/I as one player. Giving a nice two-by-two game. Btw, you're more than welcome to criticize the model or, even better, come up with a model of your own an analyze it.

In DSS scoring, we get a pure state Nash equilibrium which depends on p. Germany should play for the solo if she thinks it has a 1/9 probability of success. Likewise, Austria/Italy should try to eliminate Turkey only if they think it gives Germany at most a 1/9 shot at the solo. (I'm gonna pause here and just be amazed. If Germany has a 1/9 shot at the solo, then the reward of a three way draw for Austria/Italy is so small that it's not worth the risk.)

Let us now use Mercy's scoring system from his first post in this thread. That is, we remove the incentives for Germany to settle with a three way draw. That is, we remove all of Germany's innocence. The result? We get a payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the four way draw.

I actually find this to be quite intuitive. Germany has the by far biggest possible reward - the solo. Now, it is a zero sum game. If you remove all the risk from Germany, then A/I get stuck with all the risk, yet their possible reward is tiny. They have no incentives whatsoever to take that risk.

Let's do this. Let's pretend it's not only T sitting in the corner, its T/F/R/E are all sitting on one center each. So that A/I is not cutting the draw down from 4 to 3, they're cutting the draw down from 7 to 3. Well, Mercy's scoring system gives the same result. The payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the seven way draw. These incentives are bizarre. This is a scoring system where as soon as one player is beginning to look like a solo threat, the other players should immediately ally and force draw the game.

Let's continue with Jay's scoring system, from the article. (No, I'm not gonna go through all proposed modifications.) Now, in this system, the rewards depend on Germany's sc count, so we have to be a little more elaborate. Germany's alternative to playing for the solo is to simply play to optimize sc count. (I have no clue how this particular game played out, but it is easy to imagine that a Germany which doesn't even try to capture Tunis could secure Spain and/or Marseilles.) The reward of Germany's alternative play depends on how many centers she can secure. So, let's consider three scenarios.
J15: Germany could only have secured 15 centers - equalling the failed solo attempt.
J16: Germany could have secured 16 centers, if she had not tried to solo.
J17: Germany could have secured 17 centers, if she had not tried to solo.

In J15, Jay's scoring system gives the same incentives as Mercy's. That is, Germany should always go for the solo, and A/I should always secure the four way draw. This is a general thing in Jay's scoring system: if going for the solo and optimizing center count is the same for the top player, then the other players should always secure the draw.

J16 and J17 are similar to each other, with slightly different figures.
- If p is sufficiently small (in J16: p < 1/17 and in J17: p < 2/17), then we get a pure state Nash equilibrium for G to back off and A/I to eliminate T.
- If p is larger, then we get a mixed Nash equilibrium. Let F(p) denote the probability by which G should go for the solo. Interestingly enough, F(p) is actually decreasing in p! Yes, read that again. If the probability for G to succeed with the solo is large enough (in J16 p > 8/17 and in J17 p > 15/34), then the mixed strategy favors that G does not go for the solo! (Btw, both of these numbers are smaller than 1/2..) What happens is the following. If attacking T gives Germany a solo probability of at least 1/2, then A/I has too small incentives to attack T, diminishing the possibility of the solo, making it a better strategy for Germany to ignore the solo and aim at 16 or 17 centers.

So, to summarize. Again, p denotes the probability that G succeeds with the solo if she goes for it and A/I tries to eliminate T. Let's assume player are "smart" and understand the incentives of each scoring system. What are Germany's incentives?
DSS: G should go for the solo if .111 < p
Mercy: G should go for the solo, to no avail, as this is a settled 4WD.
Jay15: G should go for the solo, to no avail, as this is a settled 4WD.
Jay16: G should go for the solo if .058 < p < .470
Jay17: G should go for the solo if .117 < p < .441

I checked for comparison, and SoS (which is the by far most misunderstood scoring system) is very similar to Jay's scoring system, in this scenario.

Now shoot me. Game theory is not my subject, so I probably made plenty of mistakes. Let me know if you want me to post the payoff matrices and the computations, or if you want them in PM. I'll be happy to share it, just think this post is too long already. I'm hoping this will inspire you guys to do some proper analysis. After all, you now have the incentives to prove me wrong!
1

Mercy
Posts: 257
Joined: Thu Oct 19, 2017 4:03 pm
Karma: 220
Contact:

Re: Two Tier Scoring

#52 Post by Mercy » Mon Jan 13, 2020 8:40 am

I haven't replied in this thread for a while, but I see that I am mentioned by name and I will take the time to write some replies.
jay65536 wrote:
Tue Dec 31, 2019 4:17 pm

That's why Mercy's proposed "simplification" is actually a sizable alteration, in my view, and I see it as a shift away from one of the goals of the system. If we just say all draws are equally split until someone has 13+ centers, that means that until we reach the point where someone grows large enough, draw-whittling is still a major motivator of play.
Jay, I agree with everything from your post except the part I am quoting here; I think we are pretty much on the same line and are only disagreeing about details. That being said, this is why I don't agree with this part of your post:

If no one is yet in reach of a large number of centers, typically draw-whittling is not a major motivator of play. The major motivator of play is building up to a strong position in the endgame. Especially if you reward players for being a mere solo threat, I think in the early- and midgame, players will be far more inclined to think about how they can get in a strong position to be a solo threat, or to outright solo, than to draw-whittle. And even if they are draw-whittling at an early stage of the game, every player can still be cut from the draw.

In your blog post, you outlined some flaws of DSS and I agree with you, but I think these flaws only play a role in the endgame.

On top of this, I think that if you can give points equally to all survivors while not giving bad incentives to players, I think you should do so. Otherwise, you can get some center-grabbing when the game should be over. Consider for instance a case where an EG alliance and an AI alliance are in a stalemate. Germany is the biggest power by a few centers, but arguably England is in a stronger position strategically than Germany is. Should Germany get the biggest share of the draw? Should in this case England demand that Germany throws him a center to make their center counts more equal before they hit draw, thereby prolonging the game? No, I think this is a case where the points should just be split equally, and the game should be drawn if no one wants to stab their ally.
jay65536 wrote:
Wed Jan 01, 2020 8:24 pm
teccles wrote:
Wed Jan 01, 2020 7:55 pm
Jay: Thanks for the explanation on the rationale for top scores, that makes perfect sense. I wonder whether, in practise, you don't need to worry so much about people draw-whittling. For example, your 7 centre top score is based on a rather absurd scenario, where a board leader on 7 centres has the power to ensure a 7/13/14. So it might be fine to change the top score to something simpler (with SCs/34 being a natural option), despite the theoretical issues with that.
I mean, it's not just the assumption that it could happen in the same game; it's the straight fact that I don't want a 7-center 3way to be worth more than a 7-center board top. In the set of all X-center finishes, I want board tops to be the highest possible score for all X.
I agree with teccles. I don't think board tops should necessarily receive the highest possible score; see my EG vs AI example. While I agree that giving board tops the highest score when they are a solo threat gives good incentives to players, as Jay argued, and I think that's a good reason to give them the highest score; but that's the only reason I see, and it does not apply for low-center board tops.
Restitution wrote:
Tue Dec 31, 2019 5:47 pm
For the sake of my sanity, can you reformulate the system assuming that DIAS is true (which it is in webdip)?
Yeah, I think it's better to formulate the system assuming that DIAS is true. I am a proponent of DIAS only anyway, and it's the only one that's used on webDip.

For the sake of time, I am skipping over a lot of posts now and move straight to the post from RoganJosh.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
This post will only be about the solo-incentives, and not about the draw-whittling aspects. Anyway. F* me, I put way to much sambal oelek on these potatoes.

Mercy, I don't even think you contemplated about how small 1/9 is. It seems it's rather a principal question for you - that the board leader should have nothing to lose going for the solo. Or as Jay said in the article: the scoring system should not punish people for trying to win. I'm gonna try to explain why this is a bad idea.
I don't know about Jay, but for me it is not a principal question. I don't want the board leader to have nothing to lose from going for the solo just for the sake of it. It's about giving players incentives that make the game fun to play. Jay mentions a lot of reasons in his blog post why this implies giving more points to the board leader, at least when the board leader is big enough. Let me just give you one example.

Suppose one player has 17 centers. Another player has 1 or 2 centers and has successfully gotten himself in a position where he is vital for stopping the solo; a stalemate line is formed. Under DSS, the 17 center player has an incentive to retreat his units to give room to the other player to eliminate the small player, so that everyone gets more points. Do you think that is fun? If not, then this is an instance where you'd prefer the systems Jay and I are proposing; in our systems, the 17 center power has nothing to gain from the elimination of the small player. By the way, under SOS, the small player would barely get any points so I won't see that as a solution either - the small player would have too little to play for.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
First, let's just all remind ourselves:

That option A could potentially yield higher rewards, does not necessarily imply that the player has any incentives to choose option A.

It's funny because you all know this. This is Mercy's complaint about DSS: despite the solo yielding the highest reward, a player might not have incentives to play for the solo.
That's not my complaint, see above.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
But you seem to forget that this is true for all players, not only the current board leader. And you keep posting these end-of-game point distributions, with some intuitive interpretations of how they affect incentives, without actually checking what these scoring systems do to incentives. If you only care about the board leader then that is often enough but - no - you need to look at all players.

Man, these potatoes are spicy.

Let's go back to the Germany example from the article. Before math, maybe just a comment about "playing for the solo." Because, it's not only Germany that has a choice. The duo Austria/Italy also has a choice: securing the four way draw or try for the three way draw. Now, playing for the solo is often a balancing act of seeming innocent enough so that Austria/Italy keep fighting Turkey and then - at the right moment - be aggressive.

So, I used the following model:
* If A/I decides to secure the four way draw, then the game ends G/A/I/T.
* If G settles for a three way draw, and A/I attacks T, the the game ends G/A/I
* If Germany goes for the solo, and A/I attack T, then the game ends G solo with probability p and it ends G/A/I/T with probability (1-p).
Turkey has no choice - his tactics is just a function of A/I. Since there is no separation between 2nd and 3rd place in either system (in this scenario!), we can treat A/I as one player. Giving a nice two-by-two game. Btw, you're more than welcome to criticize the model or, even better, come up with a model of your own an analyze it.

In DSS scoring, we get a pure state Nash equilibrium which depends on p. Germany should play for the solo if she thinks it has a 1/9 probability of success.
I checked your calculations and I agree.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Likewise, Austria/Italy should try to eliminate Turkey only if they think it gives Germany at most a 1/9 shot at the solo.
That is wrong; A/I should try to eliminate T if G has at most a 1/4 shot at the solo. You can calculate this as follows. If A/I play for the 4-way draw, each gets 1/4 of the pot. Suppose p = 1/4. If they attack T, each gets 1/3 of the pot with probability 3/4, and 1/3 x 3/4 = 1/4 expected payoff. If p < 1/4, then the expected payoff is higher than the 1/4 they would get if they didn't attack Turkey.

This means that if p is between 1/9 and 1/4, we play DSS and everyone is behaving rationally, then A/I will attack T even while knowing that it gives Germany a shot at the solo. T may say 'Don't do this, you're giving Germany a shot at a solo' and A/I may reply 'We know, but it is worth it anyway'! In practice, draw whittling by weaker players is how solo's sometimes happen.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
(I'm gonna pause here and just be amazed. If Germany has a 1/9 shot at the solo, then the reward of a three way draw for Austria/Italy is so small that it's not worth the risk.)
So that's wrong.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Let us now use Mercy's scoring system from his first post in this thread. That is, we remove the incentives for Germany to settle with a three way draw. That is, we remove all of Germany's innocence. The result? We get a payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the four way draw.

I actually find this to be quite intuitive. Germany has the by far biggest possible reward - the solo. Now, it is a zero sum game. If you remove all the risk from Germany, then A/I get stuck with all the risk, yet their possible reward is tiny. They have no incentives whatsoever to take that risk.
Now I think I know where your mistake came from in assuming that A/I should attack T only if p < 1/9. G and A/I are not playing a zero-sum game, because the payoff for G/A/I is not fixed. But yes, under my scoring system, A/I would not attack T indeed. If they both successfully eliminate T and stop the solo, then Germany will be so big that each of them will get 1/4 of the pot anyway, the same as when they would have just played for the 4/way draw.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Let's do this. Let's pretend it's not only T sitting in the corner, its T/F/R/E are all sitting on one center each. So that A/I is not cutting the draw down from 4 to 3, they're cutting the draw down from 7 to 3. Well, Mercy's scoring system gives the same result. The payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the seven way draw. These incentives are bizarre. This is a scoring system where as soon as one player is beginning to look like a solo threat, the other players should immediately ally and force draw the game.
My scoring system does not give the same result. A 7-way draw is worse than a 3-way draw in which Germany is a solo threat. In a 3-way draw with German solo threat, A/I each get 1/4 of the pot. But anyway, even if you were right, honestly I don't think these incentives are bizarre. The incentives you mention would also happen in SOS scoring (since A/I gets the highest score from preventing Germany from getting extra centers; I am ignoring the incentive for A/I to attack each other, which also happens under SOS and which I don't like), with the difference that in my scoring system, small players get rewarded more from getting in a draw - under SOS scoring, they would get virtually nothing.

I am skipping your commentary on the scoring system of Jay.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Now shoot me. Game theory is not my subject, so I probably made plenty of mistakes. Let me know if you want me to post the payoff matrices and the computations, or if you want them in PM. I'll be happy to share it, just think this post is too long already. I'm hoping this will inspire you guys to do some proper analysis. After all, you now have the incentives to prove me wrong!
You've inspired me indeed. :razz:
1

Mercy
Posts: 257
Joined: Thu Oct 19, 2017 4:03 pm
Karma: 220
Contact:

Re: Two Tier Scoring

#53 Post by Mercy » Mon Jan 13, 2020 9:02 am

(I can't edit my previous post, But I wanted to say that I assumed (which RoganJosh did not) that A/I would always be successful in eliminating T, which changes a few things.)

jay65536
Posts: 59
Joined: Mon Dec 16, 2019 7:36 pm
Karma: 53
Contact:

Re: Two Tier Scoring

#54 Post by jay65536 » Mon Jan 13, 2020 2:19 pm

I don't do Discord, so I can't join that link; but Restitution, if you get a bunch of people for a play-test, keep me apprised by PM. My original plan was to see if enough people PMed me about play-testing that I could organize a game or games myself, but maybe your way will yield better results.

Sorry I have not had time to write a real reply lately. I'm going to start here, and then segue into a response to RoganJosh.
foodcoats wrote:
Fri Jan 10, 2020 1:31 pm
Is any of this relevant outside of tournaments?

...

But, by the same token, it wouldn't really matter to me if such a system were implemented - I already "play DSS" in SoS games and would play DSS if I found myself in this sort of game, too. But I remember reading or hearing somewhere that one of webDip's design philosophies is simplicity and ease of entry for new players. My recommendation would be to make any tourney scoring, such as this or SoS, available to TDs but hidden from regular games.
So, first off, in case I wasn't clear, yes, I intended this as a tournament scoring system, not an alternate rating system. Since getting back into the online scene 3 years ago (I consider myself mainly a FtF player), I have noticed a shocking, zealot-level devotion to Calhamer points as a scoring system. It's totally pervasive, and I'm not interested in trying to change that.

Instead, my goals in trying to make this system include (but maybe aren't limited to) the following:

1) Trying to make a tournament system that preserves what I consider to be the good aspects of Calhamer points. If you were to play a competitive FtF tournament today, it is extremely rare that the system you'd be playing under even has draw-based scoring as a component, let alone its primary component. The reason for that, as I pointed out in my original article and Mercy seems to agree, is that draw-based endgames are just less fun.

So if draw-whittling and playing expressly to eliminate people aren't the good parts of Calhamer points, what is? Well, in my opinion, the best part of Calhamer points is the lack of emphasis on center count at the end of the game. If you make it to a 14/10/10 endgame, there's no motivation for the 14-center power to dot someone, like there is in SoS (or some other quadratic system--see Peter McNamara's comment on my article). And there's also no motivation for a 10-center power to dot the other 10-center power, like there would be in a rank-based system. In this endgame, as soon as each power is convinced they can't solo, they take the draw. (Well, or there's a 2way push, but whatever, you see what I mean, right?) As another example, if a 2-power coalition is stalemating someone on 17 centers, draw-based scoring doesn't care about the distribution of centers between the stalemating powers.

I am in the minority among FtF players, I think, but I consider those things to be positives. The negatives are that in real life, the "you take a draw as soon as everyone knows they won't solo" ideal isn't reached. Another negative is that in a tournament, you need a way to create "separation" so there can be a winner. Draw-based systems in use in FtF tournaments today do this by considering center count, which of course means that what I consider the good parts of Calhamer points are partially if not entirely diluted.

So I tried to make a scoring mechanism that preserves the lack of caring about center counts as much as possible for a tournament scenario. Hence two "tiers" of scoring instead of one. This is also the main reason why my system has the "buffer zone" at the top, that 1-center difference exception that none of the proposed alternatives are considering. But I do think, based on personal experience, that the rank-based component is important in a tournament setting also, which is why I want to have the top score for the board leader even in scenarios where that top power is too small to be a real solo threat.

(Mercy, our disagreement about how the endgame scoring will or will not drive midgame decision-making is something I feel is best settled by play-testing, not arguing. Although I will somewhat address your comment later on below.)

2) I started making my system before realizing this was a thing, but now that I know it is, it dovetails really nicely. In the above quote, the poster cites the mentality of just always playing for draw size and not caring about the scoring system in use. In my system, if you do that, but manage to achieve what you consider a good result, you aren't heavily penalized (and sometimes you are not penalized at all).

I first noticed this mentality while observing the 2019 ODC on this site. There were multiple games in which people played out 2way draws simply because they were playing for a CP result instead of a SoS result. One of the two participants in the 2way was actually costing themselves points (and potentially a berth in the semifinals) by playing the way they were used to instead of adapting to the system. I was not the only one who noticed this; Dave Maletsky (longtime FtF player and inventor of the Carnage system) played in the tournament and commented on this mentality on the forum (the post must still exist somewhere).

I guess there is an argument to be made that if people choose to be ignorant of the system then they should be penalized, but I'd counter that we primarily want people to have fun in a tournament, otherwise what's the point? So in my system, draw-based players can play the way they are used to and not risk torching their tournament prospects. As I said above, I'm a pluralist, so I don't believe every good system must have this property; but most systems in use today don't, and so I wanted to try to make one that does.

* * *

OK, now to RoganJosh, who is trying to speak my language. (I am extremely used to these kinds of calculations.) I take it I'm not the only poker player here, since I saw someone else using the term "cash games". RoganJosh's calculations, unfortunately, assume we are in a "cash game" scenario. What I mean by "cash game", if you're not familiar with the analogy, is a standalone game where, once it's done, you just move on to the next one, of which there always is one. In that scenario, a smart cash game player will play to maximize their expected utility at all times. This is the basis of the calculations RoganJosh is doing.

However, this is not true for tournaments. This is because in some scenarios where your EV is maximized by a high-risk, high-reward option, you have to pass on that option anyway because you know those spots don't come up enough for the reward to appear before the tournament is over. This is directly applicable to the 15/8/7/4 example if we remember that it is a tournament. EV considerations take a backseat to tournament considerations.

In that light, actually, the most pertinent question for Germany isn't "What is the EV of going for the solo?" Instead, the first question is "What are the tournament standings?" and the next question is "What are my chances of getting the solo if I push for it?" If we assume it's the last round and Germany can't make the top board without a solo, Germany will push for the solo no matter what his chances are. But if we assume it's the first round and everyone still has 0 points, the chances of getting stuck in a 4way matter more than the EV.

I can tell you, from personal experience playing under draw-based systems in high-level tournaments, that all top tournament players intuitively grasp this. 4way draws are absolute anathema to them. Getting stuck with one 4way draw over the course of 3 rounds will usually torpedo their chances to get a high finish in the tournament.

This gets back to what Mercy was saying about how the downsides of draw-whittling only play a role in the endgame. My experience is that this isn't true. In a midgame with 4 roughly equal powers, good players are going to be thinking ahead to make sure it'll be easy to procure a 3way, not trying to put maximum pressure on their opponents. I will grant you, though, that some of this may be due to it being a FtF environment, where real time (as opposed to game time) is a factor. Even in untimed rounds, there is always the "let's get this done so we can go [to sleep/out drinking/to dinner]" factor. But even in some recent online games I have played, people will play passively in the opening and midgame just because they know that a 3way is a good result andn they don't need or want anything better. But ultimately, I'd rather try to test who's right than argue.
3

Mercy
Posts: 257
Joined: Thu Oct 19, 2017 4:03 pm
Karma: 220
Contact:

Re: Two Tier Scoring

#55 Post by Mercy » Mon Jan 13, 2020 3:54 pm

To add to my previous post:

I was wrong to assume midway that I/A would always be successful in their effort to defeat T. Under the assumptions of RoganJosh, indeed I/A will only try to eliminate T if p < 1/9. Under the assumption that they will be successful in eliminating T even if G tries to solo, they will try to eliminate T if p < 1/4, and G will always try to solo.

I do think my alternative assumption is revealing of what can happen in a real game, though. In a real game where there is a strong Germany, an Austria/Italy alliance and a weak Turkey, my alternative assumption would benefit Germany, so it is in the German interest to make that assumption come true. He can do so by simply withdrawing his forces, make himself weaker, until the point is reached where A/I can always eliminate T even while fighting G at the same time. Interestingly, G does only need to reduce his solo chances to below 1/4 for him to create a situation where it is in the best interest of A/I to eliminate T, assuming that A/I will always be successful in their efforts. Maybe even more realistically, G will not immediately try to solo when A/I attack T, but instead he will wait precisely until the situation arises that it is in the best interest of A/I to continue fighting T even if G is threatening a solo - this situation does not need to arise immediately when they attack T.

This means that G can achieve a solo even while all players are behaving rationally, and T gets always screwed. That is a downside to DSS, especially the fact that T is always screwed here. I also think that situations like I sketched above are not uncommon to see in normal games, especially if G is a savvy player and both G and A/I want to optimize their rating.
1

jay65536
Posts: 59
Joined: Mon Dec 16, 2019 7:36 pm
Karma: 53
Contact:

Re: Two Tier Scoring

#56 Post by jay65536 » Mon Jan 13, 2020 4:34 pm

Mercy's further analysis reminds me of something. These calculations assume that T has no agency--he either sits back and waits to be eliminated, or helps stalemate G and gets a 4way. In reality, T has another option: he can actively aid G's solo bid to prevent A/I benefiting from eliminating him.

This is another divide between "cash game play" and tournament play. As Mercy correctly points out, a savvy G who is playing for draw size can simply weaken himself to allow T to be eliminated safely, or at least to appear to allow this before making a well-timed "comeback run". In a standalone game, Germany can make this run aggressively, so as to maximize the chances that T helps throw him the game. But in a tournament, Germany probably is smartest by erring on the side of caution, making sure that the worst result he's stuck with is the 3way even if he's also trying to win.

RoganJosh
Silver Donator
Silver Donator
Posts: 556
Joined: Sun Dec 31, 2017 1:02 am
Location: Stockholm
Karma: 464
Contact:

Re: Two Tier Scoring

#57 Post by RoganJosh » Mon Jan 13, 2020 6:22 pm

Ah, let me begin with clarifying, I am only arguing against the statement

I don’t believe a scoring system should ever punish someone for trying to win

which Jay maybe only meant in a tournament context, and Mercy never actually stated. I'll leave it up to you guys to express if you agree with this statement or not. Both Mercy's and Jay's system does make the German solo attempt risk free - and this has some unwanted side-effects for the incentives of A/I. That said, making the solo attempt risk free is essentially what solves the draw-whittling problem. I have no problem understanding that the draw-whittling problem is the more important issue to solve in an FtF tournament situation. Still, I do think there is merit in analyzing a proposed scoring system in itself.

Mercy, I was just gonna say, you changed the model to one where Germany gets a 3WD in case the solo fails. Definitely a realistic scenario, but it is not a scenario where Germany is punished for trying to solo (since the solo becomes the dominant strategy), so it is not the scenario I consider.
Mercy wrote:
Mon Jan 13, 2020 8:40 am
Let me just give you one example. Suppose one player has 17 centers. Another player has 1 or 2 centers and has successfully gotten himself in a position where he is vital for stopping the solo; a stalemate line is formed. Under DSS, the 17 center player has an incentive to retreat his units to give room to the other player to eliminate the small player, so that everyone gets more points. Do you think that is fun?
Forming the stalemate line is fun. The last part about retreating - not so much. Yes, this should be dicouraged. What I am saying is that, in your system, if people play according to incentives, the game will not even get close to the stalemate line. It will end a 4WD, or maybe even a 5WD, with a board top at 14. Is that better?
Mercy wrote:
Mon Jan 13, 2020 8:40 am
But yes, under my scoring system, A/I would not attack T indeed. If they both successfully eliminate T and stop the solo, then Germany will be so big that each of them will get 1/4 of the pot anyway, the same as when they would have just played for the 4/way draw.
Exactamundo! There will be no exciting end-game along the stalemate line! Where is my adrenalin?!
Mercy wrote:
Mon Jan 13, 2020 8:40 am

My scoring system does not give the same result. A 7-way draw is worse than a 3-way draw in which Germany is a solo threat.
I realize that also here you considered the alternative model where if the solo fails then it is a 3WD, while I used a model where if the solo fails it is a 7WD. Of course, there is a range of intermediate possibilities here. So let me acknowledge that as soon as A/I have a possibility of obtaining something better than a 7WD, then they do have some incentives to cut down the draw, also in your system. Not as much incentive as in DSS, though. And in my opinion, the problem already in DSS is that A/I has too little incentive to attack T.
jay65536 wrote:
Mon Jan 13, 2020 4:34 pm
These calculations assume that T has no agency--he either sits back and waits to be eliminated, or helps stalemate G and gets a 4way. In reality, T has another option: he can actively aid G's solo bid to prevent A/I benefiting from eliminating him.
This is included in the probability p. Which G and A/I can only estimate.

As both of you point out in your last posts, and which is also the essence of the decreasing function in p we found when analyzing Jay's system: yes, Germany should play down his solo threat so that A/I lets their guards down. This is the beautiful paradox of playing for a solo. If the threat is too obvious - then the defensive alliance will form too early. If you are like me, and you want the game to end with a climax around the stalemate line, then you want G to play for the solo and you want A/I to attack T. But notice the asymmetry! If you only increase G's incentives to play for a solo, then you will discourage A/I to attack T. But if you try to encourage A/I to attack T, then that can actually increase G's incentives to go for a solo.

Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#58 Post by Restitution » Mon Jan 13, 2020 8:42 pm

jay65536 wrote:
Mon Jan 13, 2020 2:19 pm
I don't do Discord, so I can't join that link; but Restitution, if you get a bunch of people for a play-test, keep me apprised by PM. My original plan was to see if enough people PMed me about play-testing that I could organize a game or games myself, but maybe your way will yield better results.
I already have, I think, 5 or 6 people from Discord. I seriously recommend you join that channel as I expect discussion about the system will continue in there once the game begins.

I'll PM you the game

Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#59 Post by Restitution » Mon Jan 13, 2020 8:52 pm

jay65536, when you mentioned you designed this system for "tournament games", not "cash games" - do you think mine or Mercy's system would be better for cash games? Because, when it comes to implementing a system for webdip, all games are cash games.

Restitution
Posts: 225
Joined: Thu Jan 31, 2019 7:00 am
Karma: 180
Contact:

Re: Two Tier Scoring

#60 Post by Restitution » Mon Jan 13, 2020 8:57 pm

@RoganJosh, could you critique my version of 2TS? I'm going to call it "Proportional-2TS".

The top 2 players in a game are allocated a share of the pot proportional to their share of the SCs.

All other players then receive the remaining pot split equally among them.

By "top 2 players", it is possible for there to be zero, one or two top players. If two players are tied for first, they are the top 2. If three players are tied for first, then there are zero top players. If there is one top player and 2 players tied for 2nd, then only the very top player is the top player.

Post Reply

Who is online

Users browsing this forum: No registered users and 47 guests