Sunday, May 26, 2019

Most popular time controls on lichess

Have you ever wondered which time controls were the most popular online? I sometimes do, because it's sometimes more difficult to get a game with certain time controls than others (looking at you, classical games!). It can be useful to know the relative frequency of time controls when planning on an online game.

TLDR: Bullet and Blitz reign supreme, longer time controls less popular

As you could expect, Bullet and Blitz games are more common than longer games (like Rapid and Classical). In fact, Rapid and Classical games represent approximately only 15% of the most popular time controls, while Bullet and Blitz represent about 79% of all games (UltraBullet accounts for 1% of the games; the remaining 5% least popular time controls contain a mix of time controls and are not shown here):


As can be seen from this graph, the most common time control is 1 minute per side with no increment (1+0; 25% of all games), followed by two Blitz time controls (in light green), 3+0 (18% of games) and 5+0 (17% of games). The only Rapid time control (in yellow) that made the list is the 10-minute game (10+0; 11% of games), while the only Classical time control (in red) that made the list is 15+15 (4% of games). UltraBullet remains largely unpopular, with the 15-second game accounting for less than 1% of all games.

Methodology

To examine the distribution of time controls in online games, I've used a subset of the games played on lichess in April 2019. There were a total of 33,565,536 games played on lichess that month, and in the interest of saving processing time, I used 2.8M of those games (about 9% of the games played that month, i.e. about 2.5 days of chess games). This means that the percentages above are estimates for April 2019 and not the exact numbers. (In fact, it would be interesting to look at whether preferences for time controls change over the course of the week.)

To parse the PGN file, I've used the library python-chess 0.28.0 in Python 3.6.7 and extracted the "TimeControl" tag for each game.

Remaining Questions

Of course, this simple analysis points to more questions than answers. Do preferences for time controls vary over the course of the week? Perhaps more interestingly, do preferences vary by websites? For example, if someone is interested in playing a longer time-control game, should they search for it on lichess or would they do well to look elsewhere? (I doubt it, but it remains to be investigated.)

Also, because Rapid and Classical games take more time to play out than Bullet and Blitz games, it's possible that the most time playing chess (as opposed to the number of games) is actually spent playing longer time-control games. But I haven't made those calculations...

Wednesday, December 26, 2018

IM scalp in Chess960

Beat an IM rated 2237 in Chess960 (ranked 28th on lichess.org) in a 3+2 game:

Sunday, December 16, 2018

Chess960

Chess960 is a super fun chess variant where the starting position of the pieces on the back rank is randomized (though still symmetrical). It gets its name from the fact that there are 960 distinct starting positions that can arise from the randomization of the back rank. Chess960 is also known as Fischer Random Chess, since Bobby Fischer is the one who invented this variant. The main purpose is to steer away from the modern game, which is based a lot on memorization of opening theory, and to give way to strategic creativity.


Rules for Starting Positions

According to our good friend Wikipedia, there are two restrictions for the starting position, both of which seem rather obvious: the bishops must be on opposite-color squares, and the king must be placed between the rooks.


Castling

I find castling a bit tricky in Chess960. The confusing thing about castling in Chess960 is that even though the king and rooks are typically not in their standard position, the rules for castling remain pretty much the same as in standard chess, which can lead to weird stuff--sometimes you can even castle and have only one of the two pieces change positions, or have the king and rook move many squares at once!

The key thing to remember when castling in Chess960 is that the resulting king and rook position is exactly the same as in standard chess. This means that castling a-side (as it's called in Chess960) leads to the king sitting on c1 (or c8) and the a-side rook sitting on d1 (or d8), while castling h-side leads to the king sitting on g1 (or g8) and the h-side rook sitting on f1 (or f8). As in standard chess, you cannot castle if a square in the king's path is attacked or occupied by a piece, or if the destination square for the king or rook is occupied by a piece. Note that just like in standard chess, a-side castling is notated O-O-O, and h-side castling is notated O-O.

I suppose the mechanics for castling online depends on the specific website, but on lichess, you can castle in Chess960 by dragging the king to the rook with which it wants to trade sides. Here's an example game in which I tried to castle but it all went wrong: rather than dragging the king to the rook, I dragged the king to its intended destination square, which was interpreted as a normal king move. On move 5, I tried castling a-side, but I played 5. Kc1?:



Stockfish evaluates the position has having gone from about +0.5 (a half-pawn advantage for White) to about -1.7 (nearly a two-pawn disadvantage for White!) after 5. Kc1. In fact, following this mishap, the rook on a1 didn't get to enter the game until move 32:




Popularity

On December 16, Chess960 was the most popular of non-standard chess variants on lichess, even ahead of UltraBullet, which is standard chess with an extremely fast time control. This table shows the number of people who played each variant this week:

Popularity RankVariantNo. of Players This Week
1Blitz214213
2Rapid115104
3Bullet102849
4Classical58231
5Chess9604955
6UltraBullet4858
7Crazyhouse4255
8Antichess3731
9Atomic2512
10Horde1429
11Three-Check1383
12King of The Hill856
13Racing Kings809

It's clear that non-standard chess variants are much less popular than standard chess variants (which vary only in their time controls), but Chess960 fares well relative to other non-standard variants.


Seeing Chess960 in Action

If you want to see Chess960 in action, you can take a look at the top-rated Chess960 game on lichess being played right now on LichessTV.

You can also take a look at my most recent stream on Twitch, in which I played the daily Chess960 tournament (5-minute games)... I played a total of 9 games and finished 9th (win rate = 56%, performance = 1880):


Sunday, January 14, 2018

Chess Analytics: Cluster analysis & Patterns of time usage in 10-min games

In this installment of Chess Analytics, I've decided to investigate the question: What are the distinct patterns of time usage during 10-minute online chess games? In doing so, we will also attempt to determine just how many distinct patterns we can distinguish.


Cluster Analysis

To attempt to answer this question, we can use cluster analysis. Cluster analysis is possibly also known as "segmentation analysis" or "taxonomy analysis", though I am not familiar with those terms.

Through cluster analysis, we can group chess games into similar (homogeneous) subgroups, based on how similar the games are on certain variables. The goal of cluster analysis is to obtain games that are very similar within groups, and very different between groups (if that's at all possible). So to examine patterns of time usage during online chess games, we can calculate the time taken at each move by the players, and perform a cluster analysis on those time values. This will give us subgroups of games that are similar in the time taken at each move.

If you're familiar with machine learning, you'll know that cluster analysis is a type of "unsupervised" learning algorithm: The algorithm will categorize the chess games in different groups without having known beforehand which group the games belong to (we just don't know what they are yet!). The term unsupervised is used to contrast with "supervised" learning algorithms, in which the algorithm learns to categorize observations based on a sample of observations that have already been categorized into groups.


The Data

To look at the patterns of time usage in chess games, I used the first 200 MB of the rated games played in October 2017 on the internet chess server lichess, retaining only those games for which the computer evaluation of each move was available (this was not necessary for this particular analysis, but this same dataset could be used for other analyses as well). In total, this gave us 7,387 games. However, because the current investigation is about time usage, I restricted the analysis to those games whose time control was 10 minutes in total, with or without increment (for example, 10+0, or 4+9), and which lasted at least 25 moves, so as to look at the patterns in the first 25 moves of the game. So in the end, a total of 843 games fit those criteria. The goal of this analysis is to group those 843 games into homogeneous subgroups of games that are similar in their patterns of time usage.

To give you an idea of what the dataset looks like, here's a screenshot of the first few rows (this is a Pandas DataFrame in Python):


Because of the likely relationship between the time taken at each move by both players during the game, here we focus on the time pattern for White players (but the results were similar when looking at the Black side instead). The first column, clocks_diff_31, indicates the difference in the remaining time of the White player (in seconds) between the third ply (i.e., White's second move) and the first ply (i.e., White's first move). In other words, the first column gives the number of seconds White spent on their second move. Analogously, the second column, clocks_diff_53, indicates the time taken by White between the third and fifth ply (i.e., during White's third move), and so on, up to the 25th move. Note that on lichess, the clock only starts ticking at the second move, so we don't have access to the time taken during the first move for these games.


Number of Patterns & Hierarchical Clustering

Perhaps the most important decision that goes into a cluster analysis is choosing the number of clusters to detect, or in this case, the number of distinct patterns of time usage we can distinguish in the games. One data-driven way to do this is through hierarchical clustering, where chess games are grouped two by two based on the similarity of their pattern of time usage, then these new pairs are grouped with another pair again based on similarity, and so on, until only one big cluster--containing all the games--is created. In this way, clusters are formed sequentially, and we can decide where to stop the clustering.

We can visualize this hierarchical clustering through a dendrogram (don't do like me and misspell it "dendogram" every time!). A dendrogram shows which games are being grouped together due to their similar time-usage patterns at each iteration of the process, starting with no clusters at the bottom, and one giant cluster that includes all games at the top.

Here is a simple example of a dendrogram (courtesy of StackExchange):


In this dendrogram, 10 observations (or if you prefer in our case, 10 chess games) are being sequentially grouped. The horizontal x axis indicates the game index (which is arbitrary), and the vertical y axis indicates the dissimilarity between games within a cluster (i.e., the "distance" between games). Starting from the bottom of the dendrogram, we can see that games #2 and #10 are most similar in their pattern (the distance between them is the smallest at < 0.2) and are grouped together first. Next games #5 and #8 are grouped; then games #7 and #6; then game #9 is most similar to the "average" of games #5 and #8 and so is grouped with that cluster; and so on until the cluster formed by games #3, #7, and #6 is combined with the cluster formed by the other games.

Choosing the number of clusters means choosing a maximum difference (or distance) between games within each cluster, as represented by the y axis. So in the simple example above, we could decide that 4 clusters is appropriate by drawing a horizontal line somewhere around a distance of 0.6, thereby yielding the following clusters: (1) game #3; (2) games #7 and #6; (3) games #1 and 4; and (4) games #2, 5, 8, and 10.

Evidently, the current analysis--with 843 chess games rather than 10--yields a much more dense dendrogram (click on the picture to resize):


Based on this dendrogram, a reasonable choice for the number of patterns seems to be 3, and if the three-cluster solution yields patterns that are difficult to interpret, then another reasonable choice is 2. (The Python library that generated this dendrogram seems to suggest 2 clusters, given the color coding.) The dendrogram might also lead us to conclude that 4 clusters is not very reasonable, as the distance between the third and fourth clusters would be very small.

The good news is that nothing prevents us from trying out several alternatives. A convenient way to choose the number of clusters is to try a few reasonable numbers and choose one that yields groupings that make sense given the context (here, time usage patterns) all the while providing groups with a large-enough size (a cluster that includes only a handful of games is not very useful). In some applications, obtaining clusters of approximately equal sizes is desirable, but that's not really the case here. So we can try out 3 clusters, and also examine the two-cluster solution in case it ends up being more interesting or interpretable.


Three Patterns of Time Taken During a 10-Min Game

The graph below shows the three distinct patterns of time usage detected in a three-cluster cluster analysis. The graph shows the median time taken at each move, so that 50% of the games of a given group took less time than is shown on the graph, and 50% of the games took more time than is shown. In total, 158 games were categorized in pattern #1; 438 games were categorized in pattern #2; and 247 games were categorized in pattern #3.


Let's try to describe these patterns. It looks like they are not too surprising and make sense. Pattern #3 (green line at the bottom) represents games that are played rather quickly throughout, with perhaps a slight increase in move time as the game goes on. Pattern #1 (blue line at the top), on the other hand, represents games that are played much more slowly, especially starting around move 7-8, with some peaks a little bit later in the game; these games also seem to pick up the pace again after move 15, but it's difficult to say given that our analysis ends at move 25. Finally, Pattern #2 (orange line in the middle) represents games that are in-between the faster-paced and slower-paced games of Patterns #1 and #3.

Note that the graph showing the average time taken at each move (rather than the median time) is nearly identical to this one.

The three patterns of time usage found above are easy to interpret and make sense. To make sure that we're not missing out on a better solution, let's do as promised and look at the two-pattern solution. We get the following graph, which again shows median time taken at each move in each pattern:


Pattern #2 (orange line on top) is exactly the same as Pattern #1 above (represented by the same 158 games as before). This is because this pattern is the pattern that is most different from the other patterns (see the games in green on the left of the dendrogram above). With the two-pattern solution, the two faster patterns are now combined into Pattern #1 (blue line at the bottom), with a total of 685 games (i.e., the sum of 438 + 247 games from Patterns #2 and #3 in the three-pattern solution). These two patterns are also simple and easy to interpret, but nothing is gained from the three-pattern solution; in fact, we've lost the distinction between the faster-paced games in terms of their pattern of time taken. So I prefer the three-pattern solution here.


Conclusion from the Cluster Analysis on Time Patterns in 10-Min Games

Overall, from the cluster analysis above we could draw the conclusion that there are 3 general patterns that 10-minute online chess games follow in terms of time usage:
  1. Games that are faster-paced in the opening but that slow down noticeably at the end of the opening (around move 8; this is Pattern #1 in the three-cluster cluster analysis);
  2. Games that are relatively fast throughout, though slightly longer after the opening (this is Pattern #3 in the three-cluster cluster analysis);
  3. Games that are neither fast nor slow, and whose moves also tend to slow down slightly after the initial opening moves.


What's Next?

This analysis was limited to some 800 online games played in October 2017 on lichess.org. I doubt that patterns would change much over time (say if we were to do this analysis again in October 2018), but perhaps patterns do change by chess servers (maybe players are more or less impatient on chess.com, for example?), and my guess is that patterns are different in over-the-board games rather than online games. (If you have a good dataset for OTB games, please let me know.) This analysis also doesn't show what happens beyond the 25th move, for example during endgames.

Another unanswered question that seems interesting is the relationship between pattern of time taken and game outcome. However, not only might the effect of time usage on game outcome vary from player to player, but there probably is a very strong relationship between time taken at each move by White and time taken by Black. Nonetheless, one (odd) possibility is that some patterns are associated with more wins for White, while other patterns are associated with more wins for Black... I haven't checked, but I do think it would be very odd!

Friday, January 5, 2018

Transposition Gone Wrong: Attempting to reach the Benko Gambit

Happy New Year! May 2018 bring us the best chess... Also, fame and fortune.

I love playing the Benko Gambit as Black, so I try to reach it whenever I can. But it doesn't always go so smoothly. Normally the Benko Gambit is reached after 1. d4 Nf6 2. c4 c5 3. d5 b5:



Typically White will play cxb5 at some point, and Black will respond with a6 (a pawn gambit), and White can choose to take the pawn with bxa6 (gambit accepted--the most common response) or to push with b6 (gambit declined). When the gambit is accepted, Black will fianchetto the king's bishop, put the knight on f6, play d6 (some players prefer e6), castle kingside; White will typically get e2-e4 in (even though this is not an e4 opening). A position like this might arise:


Black will eventually recapture the a6 pawn, and White will have a passed a-pawn, but Black will have tremendous activity on the queenside, with moves like Bxa6, Nbd7, Qb6, Rfb8 (after Black castles kingside), etc. Note that in the "old main line", which I don't particularly fancy, Black plays Bxa6 before White plays e4, so that when White plays e4, Black responds with Bxf1, and White plays Kxf1 and will manually castle their king with g3 and Kg2.

I've had quite a bit of success playing the Benko as Black. This graph shows the win percentage (in green), draw percentage (in blue), and loss percentage (in red) by opening when I play a rapid or classical game (> 8 minutes) as Black on lichess.org. In fact, my highest win percentage comes from games where I've played the Benko (10 games, win percentage = 80%; see middle of the graph):


So I'm somewhat justified in trying to reach this opening when I can. Against 1. e4, there are a few ways to try to reach this opening. An uncommon one is the St. George defense, with 1. e4 a6 (since both e4 for White and a6 for Black are moves in Benko lines). A more common opening that allows one to hope to reach a Benko position is the Robtasch (or "Modern") defense with 1. e4 g6, which is what I played in this game (incidentally, notice that the Robatsch seems to be one of my worst openings in the graph above). In this game I was facing a stronger opponent (2142 blitz on chess.com), and we reached the following position after 1. e4 g6 2. d4 Bg7 3. Nf3 a6 4. c4 d6 5. Nc3 Nd7 6. Be2 c5:


Up to here I've played all moves that are consistent with the Benko so that I can transpose into it were the opportunity arise. And we're really close to it: If White plays d5, I'll respond with b5 and we will have reached a position consistent with the Benko gambit. But my opponent doesn't have to humor me, and indeed he chose not to, and simply played 7. O-O. But I was not ready to give up though I should have been, and I played 7. ... b5? 8. cxb5:


And here I played a move that would simply be impossible to play in the Benko gambit since White would have played d5 earlier. I played the ridiculous-looking 8. ... cxd4?! 9. Nxd4 axb5?? (9. ... a5) 10. Bxb5:


And once the smoke has cleared, this is not a Benko gambit, and White has two connected passed pawns on the queenside! (With a normal Benko gambit White would only have a passed a-pawn.) Stockfish gives White a two-pawn advantage at this point, and my opponent went on to kick my butt. I later gave up one of my knights for his two pawns but it was already too late.

So, lesson learned: Don't part with all three of the c-, b-, and a-pawns in the Benko Gambit, because White will have two connected passed pawns on the queenside. It seems obvious, but I had to learn the hard way.

The full game is available here:

Friday, December 29, 2017

Problem #10

I found this puzzle on lichess.org, and failed to find the solution. White to play:


[Show solution]