AlphaGo
AlphaGo is a computer program that plays the board game Go. It was developed by DeepMind Technologies which was later acquired by Google. AlphaGo had three far more powerful successors, called AlphaGo Master, AlphaGo Zero and AlphaZero.
In October 2015, the original AlphaGo became the first computer Go program to beat a human professional Go player without handicap on a full-sized 19×19 board. In March 2016, it beat Lee Sedol in a five-game match, the first time a computer Go program has beaten a 9-dan professional without handicap. Although it lost to Lee Sedol in the fourth game, Lee resigned in the final game, giving a final score of 4 games to 1 in favour of AlphaGo. In recognition of the victory, AlphaGo was awarded an honorary 9-dan by the Korea Baduk Association. The lead up and the challenge match with Lee Sedol were documented in a documentary film also titled AlphaGo, directed by Greg Kohs. It was chosen by Science as one of the Breakthrough of the Year runners-up on 22 December 2016.
At the 2017 Future of Go Summit, its successor AlphaGo Master beat Ke Jie, the world No.1 ranked player at the time, in a three-game match. After this, AlphaGo was awarded professional 9-dan by the Chinese Weiqi Association.
AlphaGo and its successors use a Monte Carlo tree search algorithm to find its moves based on knowledge previously "learned" by machine learning, specifically by an artificial neural network by extensive training, both from human and computer play. A neural network is trained to predict AlphaGo's own move selections and also the winner's games. This neural net improves the strength of tree search, resulting in higher quality of move selection and stronger self-play in the next iteration.
After the match between AlphaGo and Ke Jie, DeepMind retired AlphaGo, while continuing AI research in other areas. Starting from a 'blank page', with only a short training period, AlphaGo Zero achieved a 100-0 victory against the champion-defeating AlphaGo, while its successor, the self-taught AlphaZero, is currently perceived as the world's top player in Go as well as possibly in chess.
History
Go is considered much more difficult for computers to win than other games such as chess, because its much larger branching factor makes it prohibitively difficult to use traditional AI methods such as alpha–beta pruning, tree traversal and heuristic search.Almost two decades after IBM's computer Deep Blue beat world chess champion Garry Kasparov in the 1997 match, the strongest Go programs using artificial intelligence techniques only reached about amateur 5-dan level, and still could not beat a professional Go player without a handicap. In 2012, the software program Zen, running on a four PC cluster, beat Masaki Takemiya twice at five- and four-stone handicaps. In 2013, Crazy Stone beat Yoshio Ishida at a four-stone handicap.
According to DeepMind's David Silver, the AlphaGo research project was formed around 2014 to test how well a neural network using deep learning can compete at Go. AlphaGo represents a significant improvement over previous Go programs. In 500 games against other available Go programs, including Crazy Stone and Zen, AlphaGo running on a single computer won all but one. In a similar matchup, AlphaGo running on multiple computers won all 500 games played against other Go programs, and 77% of games played against AlphaGo running on a single computer. The distributed version in October 2015 was using 1,202 CPUs and 176 GPUs.
Match against Fan Hui
In October 2015, the distributed version of AlphaGo defeated the European Go champion Fan Hui, a 2-dan professional, five to zero. This was the first time a computer Go program had beaten a professional human player on a full-sized board without handicap. The announcement of the news was delayed until 27 January 2016 to coincide with the publication of a paper in the journal Nature describing the algorithms used.Match against Lee Sedol
AlphaGo played South Korean professional Go player Lee Sedol, ranked 9-dan, one of the best players at Go, with five games taking place at the Four Seasons Hotel in Seoul, South Korea on 9, 10, 12, 13, and 15 March 2016, which were video-streamed live. Out of five games, AlphaGo won four games and Lee won the fourth game which made him recorded as the only human player who beat AlphaGo in all of its 74 official games. AlphaGo ran on Google's cloud computing with its servers located in the United States. The match used Chinese rules with a 7.5-point komi, and each side had two hours of thinking time plus three 60-second byoyomi periods. The version of AlphaGo playing against Lee used a similar amount of computing power as was used in the Fan Hui match. The Economist reported that it used 1,920 CPUs and 280 GPUs. At the time of play, Lee Sedol had the second-highest number of Go international championship victories in the world after South Korean player Lee Changho who kept the world championship title for 16 years. Since there is no single official method of ranking in international Go, the rankings may vary among the sources. While he was ranked top sometimes, some sources ranked Lee Sedol as the fourth-best player in the world at the time. AlphaGo was not specifically trained to face Lee nor was designed to compete with any specific human players.The first three games were won by AlphaGo following resignations by Lee. However, Lee beat AlphaGo in the fourth game, winning by resignation at move 180. AlphaGo then continued to achieve a fourth win, winning the fifth game by resignation.
The prize was US$1 million. Since AlphaGo won four out of five and thus the series, the prize will be donated to charities, including UNICEF. Lee Sedol received $150,000 for participating in all five games and an additional $20,000 for his win in Game 4.
In June 2016, at a presentation held at a university in the Netherlands, Aja Huang, one of the Deep Mind team, revealed that they had patched the logical weakness that occurred during the 4th game of the match between AlphaGo and Lee, and that after move 78, it would play as intended and maintain Black's advantage. Before move 78, AlphaGo was leading throughout the game, but Lee's move caused the program's computing powers to be diverted and confused. Huang explained that AlphaGo's policy network of finding the most accurate move order and continuation did not precisely guide AlphaGo to make the correct continuation after move 78, since its value network did not determine Lee's 78th move as being the most likely, and therefore when the move was made AlphaGo could not make the right adjustment to the logical continuation.
Sixty online games
On 29 December 2016, a new account on the Tygem server named "Magister" from South Korea began to play games with professional players. It changed its account name to "Master" on 30 December, then moved to the FoxGo server on 1 January 2017. On 4 January, DeepMind confirmed that the "Magister" and the "Master" were both played by an updated version of AlphaGo, called AlphaGo Master. As of 5 January 2017, AlphaGo Master's online record was 60 wins and 0 losses, including three victories over Go's top-ranked player, Ke Jie, who had been quietly briefed in advance that Master was a version of AlphaGo. After losing to Master, Gu Li offered a bounty of 100,000 yuan to the first human player who could defeat Master. Master played at the pace of 10 games per day. Many quickly suspected it to be an AI player due to little or no resting between games. Its adversaries included many world champions such as Ke Jie, Park Jeong-hwan, Yuta Iyama, Tuo Jiaxi, Mi Yuting, Shi Yue, Chen Yaoye, Li Qincheng, Gu Li, Chang Hao, Tang Weixing, Fan Tingyu, Zhou Ruiyang, Jiang Weijie, Chou Chun-hsun, Kim Ji-seok, Kang Dong-yun, Park Yeong-hun, and Won Seong-jin; national champions or world championship runners-up such as Lian Xiao, Tan Xiao, Meng Tailing, Dang Yifei, Huang Yunsong, Yang Dingxin, Gu Zihao, Shin Jinseo, Cho Han-seung, and An Sungjoon. All 60 games except one were fast-paced games with three 20 or 30 seconds byo-yomi. Master offered to extend the byo-yomi to one minute when playing with Nie Weiping in consideration of his age. After winning its 59th game Master revealed itself in the chatroom to be controlled by Dr. Aja Huang of the DeepMind team, then changed its nationality to the United Kingdom. After these games were completed, the co-founder of Google DeepMind, Demis Hassabis, said in a tweet, "we're looking forward to playing some official, full-length games later in collaboration with Go organizations and experts".Go experts were impressed by the program's performance and its nonhuman play style; Ke Jie stated that "After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong... I would go as far as to say not a single human has touched the edge of the truth of Go."
Future of Go Summit
In the Future of Go Summit held in Wuzhen in May 2017, AlphaGo Master played three games with Ke Jie, the world No.1 ranked player, as well as two games with several top Chinese professionals, one pair Go game and one against a collaborating team of five human players.Google DeepMind offered 1.5 million dollar winner prizes for the three-game match between Ke Jie and Master while the losing side took 300,000 dollars. Master won all three games against Ke Jie, after which AlphaGo was awarded professional 9-dan by the Chinese Weiqi Association.
After winning its three-game match against Ke Jie, the top-rated world Go player, AlphaGo retired. DeepMind also disbanded the team that worked on the game to focus on AI research in other areas. After the Summit, Deepmind published 50 full length AlphaGo vs AlphaGo matches, as a gift to the Go community.
AlphaGo Zero and AlphaZero
AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version without human data and stronger than any previous human-champion-defeating version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.In a paper released on arXiv on 5 December 2017, DeepMind claimed that it generalized AlphaGo Zero's approach into a single AlphaZero algorithm, which achieved within 24 hours a superhuman level of play in the games of chess, shogi, and Go by defeating world-champion programs, Stockfish, Elmo, and 3-day version of AlphaGo Zero in each case.
Teaching tool
On 11 December 2017, DeepMind released AlphaGo teaching tool on its website to analyze winning rates of different Go openings as calculated by AlphaGo Master. The teaching tool collects 6,000 Go openings from 230,000 human games each analyzed with 10,000,000 simulations by AlphaGo Master. Many of the openings include human move suggestions.Versions
An early version of AlphaGo was tested on hardware with various numbers of CPUs and GPUs, running in asynchronous or distributed mode. Two seconds of thinking time was given to each move. The resulting Elo ratings are listed below. In the matches with more time per move higher ratings are achieved.Configuration | Search threads | No. of CPU | No. of GPU | Elo rating |
Single p. 10–11 | 40 | 48 | 1 | 2,181 |
Single | 40 | 48 | 2 | 2,738 |
Single | 40 | 48 | 4 | 2,850 |
Single | 40 | 48 | 8 | 2,890 |
Distributed | 12 | 428 | 64 | 2,937 |
Distributed | 24 | 764 | 112 | 3,079 |
Distributed | 40 | 1,202 | 176 | 3,140 |
Distributed | 64 | 1,920 | 280 | 3,168 |
In May 2016, Google unveiled its own proprietary hardware "tensor processing units", which it stated had already been deployed in multiple internal projects at Google, including the AlphaGo match against Lee Sedol.
In the Future of Go Summit in May 2017, DeepMind disclosed that the version of AlphaGo used in this Summit was AlphaGo Master, and revealed that it had measured the strength of different versions of the software. AlphaGo Lee, the version used against Lee, could give AlphaGo Fan, the version used in AlphaGo vs. Fan Hui, three stones, and AlphaGo Master was even three stones stronger.
Versions | Hardware | Elo rating | Date | Results |
AlphaGo Fan | 176 GPUs, distributed | 3,144 | Oct 2015 | 5:0 against Fan Hui |
AlphaGo Lee | 48 TPUs, distributed | 3,739 | Mar 2016 | 4:1 against Lee Sedol |
AlphaGo Master | 4 TPUs, single machine | 4,858 | May 2017 | 60:0 against professional players; Future of Go Summit |
AlphaGo Zero | 4 TPUs, single machine | 5,185 | Oct 2017 | 100:0 against AlphaGo Lee 89:11 against AlphaGo Master |
AlphaZero | 4 TPUs, single machine | 5,018 | Dec 2017 | 60:40 against AlphaGo Zero |
Algorithm
As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network," both implemented using deep neural network technology. A limited amount of game-specific feature detection pre-processing is applied to the input before it is sent to the neural networks.The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves. Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using reinforcement learning to improve its play. To avoid "disrespectfully" wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath a certain threshold; for the match against Lee, the resignation threshold was set to 20%.
Style of play
Toby Manning, the match referee for AlphaGo vs. Fan Hui, has described the program's style as "conservative". AlphaGo's playing style strongly favours greater probability of winning by fewer points over lesser probability of winning by more points. Its strategy of maximising its probability of winning is distinct from what human players tend to do which is to maximise territorial gains, and explains some of its odd-looking moves. It makes a lot of opening moves that have never or seldom been made by humans, while avoiding many second-line opening moves that human players like to make. It likes to use shoulder hits, especially if the opponent is over concentrated.Responses to 2016 victory
AI community
AlphaGo's March 2016 victory was a major milestone in artificial intelligence research. Go had previously been regarded as a hard problem in machine learning that was expected to be out of reach for the technology of the time. Most experts thought a Go program as powerful as AlphaGo was at least five years away; some experts thought that it would take at least another decade before computers would beat Go champions. Most observers at the beginning of the 2016 matches expected Lee to beat AlphaGo.With games such as checkers, chess, and now Go won by computers, victories at popular board games can no longer serve as major milestones for artificial intelligence in the way that they used to. Deep Blue's Murray Campbell called AlphaGo's victory "the end of an era... board games are more or less done and it's time to move on."
When compared with Deep Blue or Watson, AlphaGo's underlying algorithms are potentially more general-purpose and may be evidence that the scientific community is making progress towards artificial general intelligence. Some commentators believe AlphaGo's victory makes for a good opportunity for society to start preparing for the possible future impact of machines with general purpose intelligence. As noted by entrepreneur Guy Suter, AlphaGo only knows how to play Go and doesn't possess general-purpose intelligence; " couldn't just wake up one morning and decide it wants to learn how to use firearms." AI researcher Stuart Russell said that AI systems such as AlphaGo have progressed quicker and become more powerful than expected, and we must therefore develop methods to ensure they "remain under human control". Some scholars, such as Stephen Hawking, warned that some future self-improving AI could gain actual general intelligence, leading to an unexpected AI takeover; other scholars disagree: AI expert Jean-Gabriel Ganascia believes that "Things like 'common sense'... may never be reproducible", and says "I don't see why we would speak about fears. On the contrary, this raises hopes in many domains such as health and space exploration." Computer scientist Richard Sutton said "I don't think people should be scared... but I do think people should be paying attention."
In China, AlphaGo was a "Sputnik moment" which helped convince the Chinese government to prioritize and dramatically increase funding for artificial intelligence.
In 2017, the DeepMind AlphaGo team received the inaugural IJCAI Marvin Minsky medal for Outstanding Achievements in AI. “AlphaGo is a wonderful achievement, and a perfect example of what the Minsky Medal was initiated to recognise”, said Professor Michael Wooldridge, Chair of the IJCAI Awards Committee. “What particularly impressed IJCAI was that AlphaGo achieves what it does through a brilliant combination of classic AI techniques as well as the state-of-the-art machine learning techniques that DeepMind is so closely associated with. It’s a breathtaking demonstration of contemporary AI, and we are delighted to be able to recognise it with this award.”
Go community
Go is a popular game in China, Japan and Korea, and the 2016 matches were watched by perhaps a hundred million people worldwide. Many top Go players characterized AlphaGo's unorthodox plays as seemingly-questionable moves that initially befuddled onlookers, but made sense in hindsight: "All but the very best Go players craft their style by imitating top players. AlphaGo seems to have totally original moves it creates itself." AlphaGo appeared to have unexpectedly become much stronger, even when compared with its October 2015 match where a computer had beaten a Go professional for the first time ever without the advantage of a handicap. The day after Lee's first defeat, Jeong Ahram, the lead Go correspondent for one of South Korea's biggest daily newspapers, said "Last night was very gloomy... Many people drank alcohol." The Korea Baduk Association, the organization that oversees Go professionals in South Korea, awarded AlphaGo an honorary 9-dan title for exhibiting creative skills and pushing forward the game's progress.China's Ke Jie, an 18-year-old generally recognized as the world's best Go player at the time, initially claimed that he would be able to beat AlphaGo, but declined to play against it for fear that it would "copy my style". As the matches progressed, Ke Jie went back and forth, stating that "it is highly likely that I lose" after analysing the first three matches, but regaining confidence after AlphaGo displayed flaws in the fourth match.
Toby Manning, the referee of AlphaGo's match against Fan Hui, and Hajin Lee, secretary general of the International Go Federation, both reason that in the future, Go players will get help from computers to learn what they have done wrong in games and improve their skills.
After game two, Lee said he felt "speechless": "From the very beginning of the match, I could never manage an upper hand for one single move. It was AlphaGo's total victory." Lee apologized for his losses, stating after game three that "I misjudged the capabilities of AlphaGo and felt powerless." He emphasized that the defeat was "Lee Se-dol's defeat" and "not a defeat of mankind". Lee said his eventual loss to a machine was "inevitable" but stated that "robots will never understand the beauty of the game the same way that we humans do." Lee called his game four victory a "priceless win that I not exchange for anything."
Similar systems
has also been working on its own Go-playing system darkforest, also based on combining machine learning and Monte Carlo tree search. Although a strong player against other computer Go programs, as of early 2016, it had not yet defeated a professional human player. Darkforest has lost to CrazyStone and Zen and is estimated to be of similar strength to CrazyStone and Zen.DeepZenGo, a system developed with support from video-sharing website Dwango and the University of Tokyo, lost 2–1 in November 2016 to Go master Cho Chikun, who holds the record for the largest number of Go title wins in Japan.
A 2018 paper in Nature cited AlphaGo's approach as the basis for a new means of computing potential pharmaceutical drug molecules.