Foreword

This time I’m doing a completely different type of post. No numbers or equations, but a lot of very interesting and timely ideas on artificial intelligence (AI). I managed to get in touch with a member of a developer team who are building the next generation artificial intelligence for Magic: the Gathering. If you are aware of the recent developments in gaming AI, you will have heard about DeepMind. They have build AI’s for chess, Go, and StarCraft 2 using deep neural networks with reinforcement learning to train the AI in millions of self-played games. Apparently, the team are building a similar deep neural network AI for Magic, which they are calling the Alpha Alpha (for a rather obvious reason explained below). I was lucky enough to get a short interview with them - without further ado, here it is:

The interview

Note: In a few places below, I have added an editorial comment in [brackets].

Quantitatively Old School: Thank you for arranging the time to take this interview. There is a large community of Old School Magic players who will be very interested in what we are about to discuss.

Alpha Alpha Team: My pleasure. I’m always happy to talk about Magic and AI.

QOS: So, I’ve heard from our mutual friend that you have been developing a new type of artificial intelligence for Magic: the Gathering. What can you tell about it?

AAT: Well, I believe you are aware of the developments in AI systems that have taken place in the recent years. I’m of course talking about the AlphaGo, Alpha Zero and most recently the Alpha Star. These are algorithms that have fundamentally changed the way AI’s are being developed for traditional games such as Chess and Go. Not to mention computer games such as StarCraft 2. What we are working on, is hoping to some day bring the same deep learning approach to Magic: the Gathering.

QOS: That is simply awesome to hear! But before going to Magic: the Gathering, would you briefly remind us what has been so revolutionary about these recent AI algorithms.

AAT: Sure. The basic idea is to use a self-learning neural network as the basis for the AI. The approach is very different from the 90’s Chess algorithms, for instance, which essentially relied on solving the game by brute-force computation to get an edge over the human player. They also relied on the large numbers of heuristics, tabulated moves and pre-determined strategies. The Alpha Zero and Alpha Star are based on deep neural networks that are trained by reinforcement learning. The approach is extremely powerful and generalizes to many kinds of games, as we have already seen.

QOS: So, I guess the idea in your approach is that the AI learns by playing against itself and evolves through getting positive or negative feedback based on the outcome?

AAT: That’s essentially the idea, yes. There is some amount of initial setup done with gathering data from games played by humans to give the neural network some degree of understanding of the game rules and mechanics and the most obvious strategic choices. But most of the learning indeed occurs in agent-vs-agent games, where different instances of the same AI compete against each other.

QOS: And the AI evolves by some kind of optimization of the network based on the game outcomes? Is that based on the outcome of one game or several? What I mean is, because Magic is essentially a stochastic [random] game doesn’t one need several games to actually get significant statistical accuracy to evolve the strategy?

AAT: Well, now we are going to quite specific details about the Magic AI. Maybe I should just say a few words on that in general first.

QOS: Of course.

AAT: To begin with, I should stress that this whole project is not on the same scale as the Alpha Zero or the Alpha Star, for example. At least not yet. We are working from a slightly different angle. We are not trying to create a general AI for all of Magic: the Gathering. Instead, we decided to limit the project scope to just the first Magic set, the Alpha edition. This allows us to focus on the aspects of the AI that we find more interesting, rather than trying to implement all of the rules and mechanics of the complete game.

QOS: That makes a lot of sense. But, if I may ask, why did you choose the first set, instead of some of the standard-legal sets, for example?

AAT: Ah, well, most of us in the team have a history with the game going way back. And we were not terribly acquainted with the new sets so we chose something most of us are at least familiar with. And the name for project was of course simply too good to pass.

QOS: The Alpha Alpha.

AAT: Awesome, right?

QOS: It really is. So, coming back to the question of training the algorithm, how have you set it up?

AAT: We are doing something that’s quite similar to what the Alpha Star Team did. We start by initializing the neural network by reading in a database of human-played games. Luckily the action space in Magic is quite limited so we only need a few hundred of these games to seed the AI with a reasonable strategy. Mind you, it’s honestly quite bad it this stage, but it gets the algorithm across the initial learning barrier. Then we create a large number of AI agents that we pitch against each other in a series of games that we call the Alpha Alpha League. Within the league, each AI agent plays against the other ones and gets its neural network updated based on its performance. We also create new agents by mutating some of the most successful ones and drop some of the worst performing ones from the league.

QOS: So it is essentially the same as the Alpha Star method, then?

AAT: For the part I told you about, yes. The nice thing about Magic is that one game can be simulated in fraction of the time of a StarCraft 2 game, so we are able to evolve the agents much much faster than the Alpha Star AI. This opens up some really interesting avenues of research.

QOS: Such as?

AAT: First we started the league with each agent having a pre-determined deck. After a while, we realized it would be really fascinating to see if the AI could learn how to construct the deck from a pool of cards. So we went and modified the league a little bit, and added several new features to the existing network architecture to support the deck-building part of the AI. Then we gave each agent an equivalent of an Alpha Starter box [I’m assuming ten Starter decks] and saw how the fared against each other.

QOS: That’s incredible! But isn’t the outcome very much based on luck? I mean, the contents of the Starter box was not the same for each agent, was it?

AAT: Absolutely. We quickly realized this and then gave a bigger starting pool for each agent. This made the outcomes much more stable. But this was just the first version of the Alpha Alpha League. Would like to hear what we did next?

QOS: There’s even more?

AAT: Haha, yes there is. First we introduced the ante mechanic into the games to reward the winning agent in terms of further opening up the deck design space. And then we created a marketplace for the agents to trade cards with each other. Because we were still working on a very limited card pool, this final change made it possible for the agents to build much more focused decks by trading away cards that they did not consider essential for their strategies. This is when we noticed that some of the historically established strategies started to emerge.

QOS: … I’m completely awestruck. So you’re saying you basically re-created the birth of the Magic: the Gathering metagame, strategies and economy in a single simulation??

AAT: To a degree. But one has to keep in mind that we made a lot of simplifications in the agent interactions. Like, for example, the agents would sell the cards to a centralized repository in exchange for credits. The credits in turn they would use to purchase other cards from the repository. I don’t think a historical equivalent of such a marketplace existed back in the early 90’s. So this would have some kind of effect on the system via increased liquidity.

QOS: Okay. I’m still trying to wrap my head around all this… I guess, with all this machinery in place, a lot of things will start happening. So was there something especially interesting that came up in this league simulation?

AAT: Well, first of all, what we found really fascinating, were the different playing strategies that the agents adopted. They would vary from league to league [I suppose they ran several of the leagues in parallel], partly because of the available card pool. In some cases, a single dominant strategy arose - like in this one league where an agent started to acquire all the mana producing artifacts from the other agents and won the league simply by having the vastly superior deck.

QOS: Why did only one agent develop this strategy? It’s really well known after all.

AAT: The thing that we need to keep in mind is that the agents do not have the benefit of our historical hindsight. Another factor must have been the way we had implemented the card evaluation part of the AI. The algorithm tended to extrapolate from the few observations [games played] in a somewhat stochastic manner. So at some point an agent came to a very early decision, a realization of sorts, that based on the games it had played, certain cards were extremely desirable. We considered that basing an early evaluation on a relatively few samples with a slight tendency to overestimate the statistical significance was a fairly realistic representation of the way humans evaluate new cards. Technically, of course, it also increases the variance in card evaluation and thus speeds up emergence of new strategies. In some cases this pays off, like the case of the agent acquiring all the Moxen, Sol Rings and Black Lotuses. In other cases, it doesn’t. For example, there was a case where the agent gathered all the cheap enchantment cards, desperately trying to make an Enchantress deck work. We of course know that in the deck tier landscape such a strategy lies in what is called a saddle point - a false extremum of sorts. And so it happened in the league that the agent was eventually beaten and eliminated by its mutated offspring, which had dropped the Verduran Enchantresses from the deck.

QOS: That is really fascinating. Did you observe other historical parallels?

AAT: Quite a few, actually. In another league, an agent developed the quintessential burn strategy, with the Lightning Bolts, Orcish Artilleries, Goblins, Wheels of Fortune, and so on. In this case, a different agent in the same league developed a counter strategy by playing Circles of Protection. These in turn would be answered by an evolved agent bringing in other colors in addition to red, and so on. A sort of stable metagame would evolve, much in the spirit of rock-paper-scissors.

QOS: And did you encounter any of the broken strategies? Like the ones using draw-seven spells and Black Lotuses?

AAT: Do you mean the ones where the whole deck is just Timetwisters, Black Lotuses, and Fireballs? After we merged some of the leagues, those did start appearing because the card pool grew large enough to support them. The agents were not able to develop an effective counterstrategy that could also deal with the rest of the field. They actually became quite a problem because they would hog all the cards in the league via winning the ante. It was basically an unstable runaway scenario. We solved the issue simply by adding a card wear mechanic into the league. A sort douche tax, if you will. This made the strategies based on repeated shuffling of the deck wear out the cards faster, which eventually lead to the strategies being replaced by other ones.

QOS: That is really unbelievable. It’s like re-living the history of the game… So, with all these varied strategies that the agents adopted, did any of them have any success with a straightforward creature-based strategy?

AAT: No, not really.

QOS: Okay. But what if one powers out, say, Juggernauts via Sol Rings, or efficient creatures through fast mana?

AAT: The creatures in Alpha are not really that efficient to begin with, especially when compared to non-creature spells. In essence, there are always better ways to expend the mana than casting creatures. The creature-based strategies simply are not fast enough.

QOS: Umm… Okay. To tell you the truth, I was going to play a fast creature-based deck in the upcoming Alpha-only Wizards’ Tournament. Maybe I need to reconsider my approach, then…

AAT: That is an actual tournament where people only play Alpha? That’s crazy. But aside from that, you should realize that the top decks our agents constructed were the ones that would cast Moxen, Ancestral Recalls and Timetwisters into Black Vise, for example. Against those decks, creature-based strategies didn’t stand a chance. But that was inside our simulated league with its own internal metagame and economics. Surely no one would play that type of deck in a real-life paper Magic tournament in this day and age.

QOS: I suppose not. So, before we wrap up, being a Magic player yourself, do you have favorite strategies, either in the Alpha Alpha league, in paper Magic, or otherwise?

AAT: Well, I am sort of drawn into the more off-the-beaten-path approaches. There were several of them seen in the leagues, including some of the ones we already talked about. Apart from that, I do personally like to tinker with more spicy card choices. Like, for example, currently I have been playing around with the idea of a Lifeblood & Flash Flood, with Blood Moons in the main deck. I am also sure that there exists a herring tribal [I’m assuming knights and fish???] deck in the Atlantic Old School waiting to be discovered.

QOS: That would certainly be something different. Anyway, thank you so much for the exciting and inspiring interview!

AAT: My pleasure.