Forum
A place to discuss topics/games with other webDiplomacy players.
Page 1301 of 1419
FirstPreviousNextLast
MrcsAurelius (3051 D(B))
02 Feb 16 UTC
Hope someone knows!
Recent changes question.
1 reply
Open
CommanderByron (801 D(S))
02 Feb 16 UTC
Quality Live Game
I was considering starting a RP (Rulebook Press?) around 4pm EST (New York City Time). I was thinking 10 minute phases (assuming that the auto ready on retreat and builds will save us a ton of time) I would prefer if you have at least 10 games to your name and the bet will be 50 D. add your name in list format below. I will send out the Passwords at 3pm EST
3 replies
Open
lauridsena (910 D)
02 Feb 16 UTC
Checking adjacent territories
Is there any way to check ahead of time, in any manner, if two territories are adjacent? There are some territories that seem adjacent, but I don't know if fleets can travel between them. I just don't know how to see if they are or not and don't want to take the chance they aren't and try to move somewhere only to find out the next move is impossible
4 replies
Open
steephie22 (182 D(S))
13 Dec 15 UTC
Playtesting my boardgame online
Hello everyone,
version 1 of my boardgame is finished. It was brought to my attention that it's probably a good idea to test it online. Two things needed there: 1. players and
2. some sort of adjudicator to use, with which I can easily add and move around 6 unit types, factories, territory markers and 4 kinds of resources, while also keeping track of various variables (although that can be done fairly easily outside of the adjudicator).
Can you help with either of those?
61 replies
Open
brainbomb (295 D)
01 Feb 16 UTC
(+1)
New site feature: Voice chat press
Its simple you get a 7 player game public voice chat and webcam channel. So its like the closest thing to ftf Diplomacy since sliced bread. Can you imagine the look on Valis face when I build a fleet in Mar and hes in Italian Leponto. Come on webcam and voice chat would be hilarious.
20 replies
Open
Fluminator (1500 D)
29 Jan 16 UTC
Epic Mafia makes the news.
Hey, our live Mafia website has made the UK news:
http://www.theguardian.com/technology/2016/jan/28/death-of-a-troll
23 replies
Open
Valis2501 (2850 D(G))
31 Jan 16 UTC
SoS gunoats
I went from 60 games to 3 games waiting for this fucking ODC Finals to start and I need something to play but can't commit to press games.

Here's 14 SoS gunboats. Join if able and willing please. Thanks.
13 replies
Open
A_Tin_Can (2234 D)
29 Jan 16 UTC
(+1)
Reliability Rating (RR) discussion
Since this has come up in the other thread-
82 replies
Open
domwnec (254 D)
31 Jan 16 UTC
How to create an app?
At work we're thinking of creating an app. No one knows how to do it internally. Has anyone gone through this before? Any valuable lessons learned? Pros and cons? Cost of development and upkeep? Thank you.
6 replies
Open
brainbomb (295 D)
01 Feb 16 UTC
And the Groundhog.....
Wait for it...Fuck winter
1 reply
Open
Lando Calrissian (100 D(S))
31 Jan 16 UTC
long phase gb
Id like to play a game but cant get Online consistenly. If you're willing please consider http://webdiplomacy.net/board.php?gameID=173461
3 replies
Open
zultar (4180 DMod(P))
29 Jan 16 UTC
Site Update: Skill Ratings and integration
Important. Please read.
Page 5 of 5
FirstPreviousNextLast
 
TheGhostmaker (1545 D)
30 Jan 16 UTC
(+6)
Hello all,

It’s great that you’re looking into improving the rating system. I’ve wanted to do that myself for a long time, but haven’t had the opportunity; I’m certainly not offended that after 8 years the week of work I did as a teenager is being critically reviewed! I’ve a few things to discuss here.

1. Comments / Discussion of Ghost Rating
2. Rating Sum of Squares games
3. Rating Categories
4. Trueskill
5. Having an official rating system

=============Ghost Rating=============
The reason losses are always the same and wins different amounts of gain is because I chose to scale the learning rate with the ratings of the players. This was pretty arbitrary, but the reasoning was that we needed to have ratings move fast for strong players to converge to near their actual rating in a similar number of games as would average players: in pre 1.0 versions of GR, the top ranks measured “who’s the good player who has played most games”, and since the target audience was always the top players, it needed fixing.

A_Tin_Can, when you tested between having the learning rate change based on player’s ratings, or having it constant, did you tune for the hyperparameter I set at 17.5 in the initial version?

Of course, that priority is only appropriate for an off-site system, and a measure such as squared errors is a good one for general users. I personally prefer the measure SIGMA(log(E_i^R_i)), where R_i is the result for player i, and E_i is their expected result.

Diplomacy is difficult to rate because you get relatively little data, and it’s quite noisy too. Furthermore, players play vastly different numbers of games, whereas Elo assumes that the numbers of games played by each player is similar. There is definitely room for improvement, so I won’t spend more time discussing this version.

=============SoS games=============
On webDiplomacy, players are assumed to try to maximise their points return. In situations where the return cannot be 100% of the pot, this presents problems. You need some sort of model which will give a sensible expected result. In PPSC, the way that was achieved was essentially to decompose the game into two: A fight for the win, and then a fight for second. I can’t work out a good model for SoS games.

From the point of view of a ratings designer, I hate the SoS scoring system. I think it makes sense as a tournament scorer, but it is really impossible to directly use to rate players, because just writing down a sensible model is difficult.

It also has weird behaviour in terms of your score going up or down in ways that seem weird. Is a solo really less valuable if there are 2 other players left than if there are 4? This is more of a view on how diplomacy works issue, but I would think not. More importantly, is that difference reflective of the player’s skill?

=============Rating Categories=============
Again, we want to be able to have category ratings, but we lack the data to do so effectively. When the first category ratings came out, I tested their prediction accuracy versus the accuracy of the general ratings on the subcategory, and the category restricted versions were all worse.

That is to say, if you wanted to bet on a Mediterranean map, public press game, the best method was to look up the standard GR, not any subcategory GR.

Obviously, that is a bad situation to be in: I also know a simple methods I would like to use to solve it, but I don’t know whether it integrates with out-of-the-box Bayesian methods:

In brief, a player i has a rating vector r_i in R^k, where k is a hyperparameter. Each possible game setting s has a vector associated with it, w_s. The player’s rating in that particular game setting is then the dot product of r_i and w_s. Thus when versions of the game are similar, the weight vectors align, and the dot products are also similar. Utterly unrelated games would have perpendicular weight vectors.

=============Trueskill=============
Trueskill is a good system. However, when I looked into it, I decided that it wouldn’t be appropriate for webdiplomacy, particularly as an official integration. The reason was that it relies only on the ranking of the players. Sum of Squares, for instance, changes dramatically to just being about who has the most SCs. This is a serious misalignment of the points objective and the rating system objective. In creating GR, I held that sacrosanct, because you don’t want different players attempting to achieve different win conditions (there’s enough bitching about that in diplomacy already!)

Secondly, I want the only thing that affects your new rating to be: The ratings of your opponents, and your points return. This is essentially the same thing: I don’t want a 3-way draw in DSS to score any differently for the member of it based on who the other two players are. From memory, trueskill doesn’t do this.

My plan (if I had ever had the time to do it), was to take the same models for expected return in PPSC vs WTA (because this was the scoring back then) and to do a maximum a posterior fit. Then I would use a LaPlacean approximation to fit a normal (or possibly a thicker tailed e.g. t-) distribution to each players’ rating, and report the conservative estimates (like used by Trueskill). The online version, where you find the max a posterior for a single game, fit a normal around it, and then move on to the next game, was my intended GR 2.0

=============Official Rating Systems=============
Kestas didn’t want to have a new rating system on the website, and he had valid concerns.

Once you have the rating system on the site, the distortive effects will be much more pronounced than in the status quo. Points actively encourage playing weaker players, but many rating systems struggle with overconfidence in the tails, meaning that with large disparities, high rated players stand to lose rating by playing.

If points continue to exist, then the alignment problem that I was discussing above becomes even more important. I don’t think you can have an official rating system where the strategy for maximising points return for a game, and the strategy for maximising the rating you receive after a game, are different.

Finally, I would consider carefully switching from a monthly publishing cycle. I don’t know what it is like now, but I remember it being something which actually added to the popularity of the ratings, at least among forum goers. Live updates certainly lose some drama, whilst the immediate feedback may only encourage gaming the system more. (That said, I obviously understand the appeal of updating live, too- I don’t know where I would stand if I were an active player!)
Jamiet99uk (1307 D)
30 Jan 16 UTC
(+1)
Thank you Ghostmaker for those interesting and well-articulated points. Some of the algebra went over my head but I think the concerns about Sum-of-Squares, in particular, are worthy of a response from ATC and others.
spyman (424 D(G))
30 Jan 16 UTC
Great to see around TGM. Been a long time. Excellent post too :)
A_Tin_Can (2234 D)
30 Jan 16 UTC
Just quickly (it's very late where I live):

"I don’t think you can have an official rating system where the strategy for maximising points return for a game, and the strategy for maximising the rating you receive after a game, are different."

I agree, and the intention is for the best thing for your rating to always be to maximise the points you get out of a game.

I'm not sure about your comments about SoS - it seems to me that it's possible you're misunderstanding how it works. A solo is always worth the same (ie, all the points).

TrueSkill doesn't directly model any scoring systems the way that GR does, which is a neat sidestep.

I/we are aware of the reasons that Kestas didn't want to integrate ratings into the site (including wanting players to be encouraged to play with newbies), and we have plans for making sure that the site stays healthy if we integrate ratings.

I personally believe that integrating the ratings will be very good for the site.
Valis2501 (2850 D(G))
30 Jan 16 UTC
TGM is certainly misunderstanding how SoS works
TheGhostmaker (1545 D)
30 Jan 16 UTC
Indeed, I didn't realise how SoS worked. In particular, I missed that it's only SoS in a draw. Cool, that's much easier to deal with.
VillageIdiot (7813 D)
30 Jan 16 UTC
I actually didn't realize that either, was having similar biases against SoS until hearing this.

@Oct: You know as well as anybody that motivators in Tournament settings are not typical to standard game play, so not a good example.
TheGhostmaker (1545 D)
30 Jan 16 UTC
"I agree, and the intention is for the best thing for your rating to always be to maximise the points you get out of a game...

TrueSkill doesn't directly model any scoring systems the way that GR does, which is a neat sidestep."

Okay, but the sidestep doesn't quite work out. Sometimes in diplomacy, you have to take risks, which would on average improve your position (in terms of points). But if you don't change your rank in the game, then the calculation on whether to take the risk changes when Trueskill projects the points down to just a ranking.
Octavious (2802 D)
30 Jan 16 UTC
@ VI

I agree completely that tournament settings can lead to unconventional play, but what we have is a seasoned Playdipper talking about their dislike of draw whittling as if it was a widely held view. Yes, it is entirely possible that it is a tactic designed to fool the other players. However, unless the view itself is commonplace such a tactic couldn't hope to work. Deception needs plausibility, and for this particular deception to be plausible then the view that "a draw is a draw is a draw" must be firmly rooted in a decently sized segment of the Playdip community.
Yonni (136 D(S))
30 Jan 16 UTC
(+1)
+1 for keeping monthly updates.
jmo1121109 (3812 D)
30 Jan 16 UTC
(+1)
Well having seen theGhostMakers work in the past and having worked with A_Tin_Can I'd be quite excited to see them working together to come up with an integrated scoring system. I know TGM has more mathematical know-how then pretty much anyone else on the site and ATC has the coding expertise to make pretty much any system TGM comes up with run.
peterwiggin (15158 D)
31 Jan 16 UTC
Hey TGM. First, I want to thank you for your contributions to the site. Flawed as it is, GR was still a huge improvement over points, and part of the reason I came over to webDip from Bounced.

I have a few questions/comments about your earlier post.

First, as an engineer myself, I know it’s often tempting and easier to just use familiar jargon, but I think the site would be better served if we all took the time and effort to explain things using as few equations and as little jargon as possible. We've worked hard to keep the discussion accessible, so I think there's value in taking the extra time to rephrase in an easy to understand way.

How stable a player's rating is shouldn't have anything to do with how high or low it is, but with some measure of how certain we are about his rating.

ATC tried different values of the 17.5 number. At smaller numbers, the rankings are very unstable and less predictive. When you don't do any scaling at all, the ratings are more stable and actually more predictive.

I don't understand how your proposed loss function measures any sort of sensible loss. Maybe you could clarify?

I’m also not sure I understand your concerns about the differences between points and rankings. I understand that you were a bit confused about how SOS worked, but apart from that, did you have concerns? Why can’t improving your ranking in the game be equivalent to improving the number of points you get out of it?
VillageIdiot (7813 D)
31 Jan 16 UTC
@Oct - That tournament has special scoring. If the final game ends in a draw then they select the tournament winner based on their seeding from the preliminary rounds. For somebody there it's most definitely an incentive to draw, regardless of the number of participants in it.
TheGhostmaker (1545 D)
31 Jan 16 UTC
(+4)
"How stable a player's rating is shouldn't have anything to do with how high or low it is, but with some measure of how certain we are about his rating."

"There were reasons for choosing what I did, but data seem to show that it was worse for predictive power. Experiments trump intuition, so not much point in discussing it!
I don't understand how your proposed loss function measures any sort of sensible loss. Maybe you could clarify?"

If all games ended as solo wins, then expected result would be the winning probability, and the product of E_i^R_i would just be E_j, where player j was the game winner. Thus what I've written down is the log likelihood.

We now generalise this to allowing 'partial wins', hence the formula. Both are maximised in expectation by the true probabilities of the outcomes, but I think log likelihood makes more sense in this context.

"I’m also not sure I understand your concerns about the differences between points and rankings. I understand that you were a bit confused about how SOS worked, but apart from that, did you have concerns? Why can’t improving your ranking in the game be equivalent to improving the number of points you get out of it?"

Three concerns here:

In trueskill, who is above or below you matters. You can be a lower rank, but if you sniped out the experienced, good players, and lost to new players with higher uncertainty in their rating, you can do better than coming higher in the ranks, but swapping other players round.

Secondly, you can get more points in a 3 way SoS draw by having 10scs versus 12 and 12, than having 9 scs versus 8 and 17, but your rank is lower in the former.

Lastly, even if having more points guaranteed a higher rank, they wouldn't do so proportionally. While it might be worth taking a risk that gains you lots of points (say from largest player in a draw to a solo win), at a risk of forming an alliance against you and being eliminated, so if that gain would not be rewarded in your rank in the game, you should play differently depending on whether you care about points or trueskill. Conversely, in a tight draw, you'd like to gain an extra sc to leapfrog a player in rank, but for points that's not worth as much, and if it's risky to do, you wouldn't want to try it.

Hope that helps, sorry for any Autocorrect mistakes, I'm on my phone.


134 replies
steephie22 (182 D(S))
28 Jan 16 UTC
Design Competition
See broadexpert.com if you want to help with the design of my start-up company. You may make some money. I'm not going to a design company for a reason though :-)

Meanwhile, I want to start a discussion: I have a debating competition coming up next week and one of the statements will be: 'High school students lack ambition'. If I'm against this statement, I thought it would be a good idea to bring up the company. I'm not sure whether that's socially acceptable though?
52 replies
Open
Jamiet99uk (1307 D)
29 Jan 16 UTC
(+2)
POINTS PER SUPPLY CENTRE
January is almost at an end. Can we anticipate an early publication of the report into the Moderators' grand experiment, and their verdict on the success (or otherwise) of their trial?
30 replies
Open
KingCyrus (511 D)
22 Jan 16 UTC
(+2)
Roe v. Wade
141 replies
Open
abgemacht (1076 D(G))
29 Jan 16 UTC
(+1)
I will Survive
Interested in knowing why we can't ditch Survive stats, but don't want to clutter the other thread...
86 replies
Open
CommanderByron (801 D(S))
30 Jan 16 UTC
(+1)
Volunteer
It seems ATC and others are stressed about all the changes going on. I need a volunteer to show up at his house and give him a massage any takers?
9 replies
Open
Valis2501 (2850 D(G))
07 Oct 15 UTC
(+4)
School of War; Fall 2015
This thread is for the Fall 2015 class of the School of War. Please be courteous to those running the game and respect any reasonable requests they may make. This semester will be taught by Professors The Hanged Man and Hellenic Riot. gameID=168281
406 replies
Open
brofistme (100 D)
30 Jan 16 UTC
JOIN LIVE GAME
NOW NOW NOW
9 replies
Open
brofistme (100 D)
30 Jan 16 UTC
JOIN THE LIVE GAME
please
2 replies
Open
Jamiet99uk (1307 D)
29 Jan 16 UTC
Screen shots
Is it against the rules to send screen shots to a player, in an attempt to prove that something has been said to you in private press?
31 replies
Open
TrPrado (461 D)
03 Jan 16 UTC
(+9)
Mafia XVI Game Thread
See inside for buckets of fun.
4426 replies
Open
abgemacht (1076 D(G))
28 Jan 16 UTC
(+1)
Grand Prix and Boroughs 2016
Two tournaments you guys should definitely try to make it to!
Grand Prix at TotalCon http://www.totalcon.com/
The Boroughs 2016: www.TheBoroughsDiplomacy.net
7 replies
Open
Riotleader007 (100 D)
29 Jan 16 UTC
Newish
Hey I am new to the website but have played the board game so I am not a complete newbie haha I want to play some on here and figured someone can set up a fun starter game and we all have a little fun! :) Game on!
4 replies
Open
ishirkmywork (1401 D)
26 Jan 16 UTC
Russian Opening to Silesia Spring '01
This opening has become a little personal favorite of mine, (if I am Russia, France or Italy) and am wondering if anyone has thoughts, experience, or tactics to share about it. Convincing Russia to do it if you are France or Italy can be difficult -- but well worth it for all involved. You need an imaginative Russian though.
33 replies
Open
00matthew2000 (454 D)
28 Jan 16 UTC
New Vdiplomacy game if anyone is interested.
http://www.vdiplomacy.com/board.php?gameID=25187
1 reply
Open
charlesf (100 D)
28 Jan 16 UTC
1648 Variant: Join the Tournament!
I shall be running a tournament featuring my 1648 (v5.8) variant.
1 reply
Open
abgemacht (1076 D(G))
26 Jan 16 UTC
(+5)
Supreme Court Cases Thread
Utilize this thread by posting your Supreme Court Cases here and only here.
23 replies
Open
Tolstoy (1962 D)
28 Jan 16 UTC
(+2)
Racist or not racist?
White woman starts fire that burns several hundred thousand acres and hundreds of homes. Not charged with a criminal offense. Apache Indian starts fire that burns several hundred thousand acres and hundreds of homes. Charged, convicted, and sentenced to ten years in prison. Is this proof of the systemic racism of the American court system or not? Please cast your votes below.
17 replies
Open
bo_sox48 (5202 DMod(G))
03 Jan 16 UTC
(+1)
Wildlife Preserve "Occupied"
Occupied by "armed Oregon militia." Not terrorists. Even though they're armed and provoking a standoff with the federal government and putting countless lives at risk.

http://www.ibtimes.com/armed-oregon-militia-led-bundy-family-takes-federal-building-support-hammond-ranchers-2246986
147 replies
Open
Page 1301 of 1419
FirstPreviousNextLast
Back to top