Playdiplomacy’s scoring system

My previous post looked at webDiplomacy’s scoring system and I promised, as part of the post, that I’d be looking at both webDiplomacy and Playdiplomacy, specifically the bits that I feel are not as good as they could/should be. So now I’m moving onto Playdip’s system of scoring games.

Ratings not scoring

Playdiplomacy uses a ratings system rather than a scoring system. Although you can something about how the system works under the STATS tab on Playdip’s games site, the best place to find out how it works on the Forum (if you want a more detailed description, you need to go here).

The difference between ratings and scoring systems is that a scoring system will score each game individually; under a scoring system, the the score you receive from the game is based on the outcome or result of the game. Under a ratings system, the score you receive from a game depends on the players you play in the game.

Playdip’s ratings system is an modified Elo scoring system. Elo scoring tends to be used in chess and other one-on-one games; as Diplomacy features multiple players, it has been adapted to meet the needs of Diplomacy.

Perhaps one frustrating thing about Playdip’s system is that it is pretty much a secret. This was a deliberate decision to prevent it being “played”. This was built on the lessons played by Playdip’s original scoring system, when players were able to cherry-pick games and thereby raise their site ranking.

webDip has a similar problem. Have a look at Playdip’s “Hall of Fame” where they maintain a record of the final positions when the old system was discontinued. Notice who was in second place. Now look at webDip’s “Hall of Fame” under “All Time Points” and notice who is top! This player was renowned on Playdip for cherry-picking games; they stopped playing on Playdip when the system was changed. Funny, that.

First problem: Not utilising a ratings system properly

I’ve got to say, I’m generally happy with Playdip’s system. I don’t have a problem with a scoring system, if it’s used properly, to score an on-going system of games. That wouldn’t simply include a raw score, ie a total of points scored in all games; it would need to include a component involving the number of games played, and – possibly – some sort of Fading Echoes system, whereby games finished over a certain time period where weighted, which would keep the more recent results on top.

Under an Elo system of scoring, though, the idea is that you do better when playing against players who are rated similarly. Playdip doesn’t do this.

Again, this is a conscious decision. Playdip has a wide variety of games – as does webDip – and it can therefore be difficult to full games quickly.

Playdip’s variables include turn length; DIAS or non-DIAS scoring; open or anonymous draw voting, three ways or assigning powers to players; five game types; seven map variants; five game variants; Escalation, Fog of War and Stuff Happens variants which can be applied to games; 2 different styles of map, and 7 different icon designs; games can be played with fixed deadlines or “finalizing” (where deadlines can end if all active players finalise their orders). Games may be “Protected” or not: Protected games extend the game by a full deadline if a player drops out, and by a full deadline if a player doesn’t enter orders on time (one chance only!). They can also feature “1st turn NMR protection” or not: if a player misses the deadline on the first turn they are removed from the game and the game will reset until a new player joins.

Additionally, games can be open to everyone or Ambassador-only games. Playdip rates players based on two broad definitions of reliability. Ambassadors need to have played the last three games to the end, and have entered orders – when they could – at least 97%. [There is an additional rank of Star Ambassador which is based on how communicative players are. This rank is honorary, however, having no impact on the variables when creating a game (unless the game is password protected).]

In short, there are so many options that games can be difficult to fill. Even things like style of icon can stop some players entering a game! If, then, the site made it possible to limit games to players within ratings scores, filling games would be even more difficult. In essence, the site decided that reliability was of more importance than making full use of the ratings system.

To me, while I – like most players! – would prefer to play games with reliable players, this is unnecessary when creating a game if a ratings-based variable were used instead. This is because the higher rated players should be the most reliable almost by default.

This doesn’t make sense early in a person’s playing career, of course. You might play your first ranked game and win it. That would increase your rating pretty significantly. Then, in subsequent games, you might be one of those incredibly annoying players who drops from a game when the going gets tough. While this would have a negative impact on your ratings, it would mean that you could enter higher rated games until your rating settled down.

It could be, of course, that until you had completed a set number of games, you could only play in games that were “Open”, ie games that had no rating variable. This would protect the reliability of other games. After this probationary period, you would then be restricted to games appropriate to your rating if the game is scored.

There is no reason for unscored games to rely on ratings at all. After all, it is only scored games that affect your rating on Playdip; this isn’t true of reliability, I think.

It would mean games were difficult to fill if players could define the ratings boundaries they want to play in when creating a game. You’re relying on enough players within a ratings band to be available and willing to join your game – with all the other variables to consider.

However, if the site were to rank players on reliability, then this would resolve the issue. Players above a certain rating could be called Ambassadors; players below a certain rating could be called Diplomats, and players in the mid-section called Envoys.

The player who created the game would define the rank of the game. The game could then only be played by players of that rank or above. It would be beneficial for a player to play in a lower ranked game because they wouldn’t score as well if they won of drew the game, and they score worse if they lost, but I don’t see why they shouldn’t be allowed to enter a lower ranked game, at their own risk.

The only issue would then be joining a game as a replacement. Currently the site uses a ‘Ratings Shield’, which allows players who completed a prior game to enter a game as a replacement and not have the result impact their ratings if they don’t leave the game early. I’d suggest that, if ratings were used as I describe above, then players could enter a game as a replacement and it not effect their ratings in the same way – they’d be playing as if it were a no rank game, whatever the outcome.

Second Problem: Comparing unlike games

For me, rankings on any gaming site only make sense if you’re comparing like-for-like games. This isn’t the case on Playdiplomacy.

Playdip do separate some ratings. If you go onto the STATS tab and look at Player Stats you’ll see there’s a drop down menu that lets you sort the ratings by different variables. The default sort is for all scored games. However, the preeminent site rating is for Standard games – that is 7-player Diplomacy played under standard rules. The others are:

  • Gunboat: All games played under Gunboat rules – no communication at all.
  • Fog of War: All games played using the Fog of War variant which can be applied to any game.
  • Classic: All games played using the standard rules.
  • Milan: All games played on the Milan map.
  • Ancient Med: All games played on the Ancient Mediterranean map.
  • 1900: All games played on the 1900 map.
  • Versailles: All games played on the Versailles map.
  • Hundred: All games played on the Hundred map.
  • War in the Americas: All games played on the WitA map.

There are a number of variants not in the list, presumably because they aren’t played often enough. However, within all these variants, the actual game might be very different to each other. Even the ‘Standard’ ratings include a lot of variables that aren’t recognised.

One of the reason for these fairly broad definitions is that, when a member does a stat search, it uses a significant amount of server time to sort through the variables. For instance, if you looked at Gunboat games, you could sort the results by every possible variable – maps, additions, deadlines, etc. Ridiculous.

However, I’m not so much worried by the way the stats are seen as how the ratings are calculated. Let’s look only at the Standard game. Below I’m going to discuss how the variables in a standard game affect the game – even at the very beginning.

Power Allocation

On Playdip, powers can be allocated at the start of the game using one of three options:

  • Random: Each player is allocated a power to play randomly (or as randomly as possible when it’s an automated system). This is the equivalent of players pulling a playing piece out of a bag after being randomly given an order to draw them in.
  • Preferences: This system allows players to enter an order for the powers they wish to play.
  • First Come – First Served: Players choose which power they wish to play when they join a game. The first player in a game gets to choose any power; subsequent players get to choose from those that remain.

Now, there’s nothing wrong with any of these systems of allocating powers. However, a game is certainly affected by the way the powers are allocated. Preferences is random to some extent: if two or more players choose the same power as their first preference, then that power is allocated randomly. This could mean that you don’t get any of your preferences if, say, your first three preferences are allocated before you get to apply your second preference. FCFS favours players who create the game, as you will very likely then be the first to join the game. You could, if you wanted, only play in games you created.

For me, in a ranked game, only one system of allocation should be used, and only Random is a fair system. If you want to play a certain power more than any other, you can choose to play unranked games.

Game Type

There are five game types on Playdip. Two of them are Gunboat and Public Press: these are communication variants. In a Gunboat game there is no communication at all; in a Public Press game communication is limited open press that every player in the game can see. For the purpose of a Standard game, we can ignore these types, which leave three types:

  • Regular: Games which show all the players in the game and which powers they’re playing.
  • Anonymous Countries: You know who is playing in the game but not the power they are playing.
  • Anonymous Players: You don’t know who is playing in this game.

If you know who you’re playing against, you may decide to approach the game differently. If you know who’s in the game, you may find an advantage if you can work out which powers they’re playing. If you don’t know who’s in the game at all, there’s no advantage.

Personally, while I understand the Anon Countries idea, I don’t see that this adds enough to the game to be there at all. Games should either be Regular or Anon (Players). Because in a Regular game you may build an alliance against a top player simply because they are a top player, or play to try to “play” the scoring system, for me ranked games should be Anonymous.

Protected or Not

This really doesn’t make a lot of difference as far as scoring goes, except that it gives a player coming in as a replacement a full turn to get their head around the game. This seems to be a fair choice. I don’t see why the standard application for a ranked game shouldn’t be that it’s Protected.

1st turn NMR protection

Without this in place, a game may start with a power not entering orders. This would be a great advantage to some powers and, personally, giving some players this advantage, while in comparable games some players don’t get it, is unfair. Either in or out – it really doesn’t matter but it needs to be consistent.

Escalation/Fog of War/Stuff Happens

I think these can be applied to a Standard game of Dip and keep the game ranked. This shouldn’t be the case. They produce very different games to a Standard game without them. Leave these out of ranked games.

Draw Proposals and Draw Voting

I don’t really care but they need to be standardised for a ranked game.

How draws are proposed is between DIAS – in which any player who survives in the game has to be included in a proposal – or non-DIAS – in which a draw can be proposed between all or just some – at least two – players who survive.

Under both systems, all the surviving players have to agree to the draw, so you can vote to allow the draw without being included in it if you want. DIAS is more in line with the way the game was designed to end in a draw, so I’d go with DIAS.

Voting is between being able to vote anonymously or having a voting decision be known. Again, I’m not really bothered which but anonymous voting seems to be the better option.

Why do they need to be standardised? Well, if there’s a reason to have them as options, there’s a reason to standardise these options. After all, if they’re options in the first place they must be important enough to the games to make them options!

Deadlines

This is a big one. On Playdip you can choose between deadlines that last:

  • 12 hours
  • 24 hours
  • 2 days
  • 3 days
  • 5 days
  • 7 days

You can also have games where the deadlines can be fixed or have the finalization option as described above. And you can set different length deadlines for different phases: movement, retreats and builds.

Ridiculously, the most used deadline is 12 hours. I really don’t understand how anyone can carry out communications with players in 12 hours. But there you go.

The problem is that, with shorter deadlines, the chances of players missing the deadline is exaggerated; with longer deadlines the games can drag out interminably. But does this deadline variation make a difference to how the game is played?

Yes, it does. If you’re a player who uses communication a lot, you need time to do this. If you’re in a game where players are regularly missing deadlines because they haven’t judged the deadline length, then you not missing deadlines is going to give you an advantage. If all games have the same deadline it makes scoring them fairer.

The problem with taking variables away from players for ranked games is that you’re not going to please everyone. “But I want to choose my power.” “I don’t like DIAS.” “These games are too long/short.”

Well, perhaps, but nobody is saying you only have to play ranked games! If you want to play with all the variables, play an unranked game.

The main point is that, if you’re going to score games, you need to compare like with like to make the results meaningful. Again, you can choose to not play in ranked games if this is a problem but it seems silly to force anyone who’s concerned about comparing unlike games – and this is why the rankings were separated into different categories – to not play in ranked games.

Published by Mal Arky

I'm a Diplomacy nut... if you haven't guessed. I write about the game Diplomacy, mainly as played online on websites, such as Playdiplomacy, webDiplomacy and Backstabbr. I write books on Diplomacy, too. First one to be published soon!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: