Prisoner’s Dilemma - Strategies
Prisoner’s dilemma is a two-player game based on a story about two prisoners: two friends commit crimes together until they are caught. The police do not have enough evidence against them and although they suspect them of more crimes, they can punish them only for the last one. The police thus decides to put each of the friends in a different room and offer them a lower punishment for defecting the other.
Each of the prisoners has two possibilities, either to cooperate with the other, or to defect them. In case both of them cooperate, the police do not have enough evidence to convince them from other crimes and they both receive only a small punishment for the last crime. In case one of them cooperates but the other defects, the one that cooperated gets a large punishment and the other is let go free (as a reward for the help). In case both of them defect the other, they are both punished by a bigger punishment, however, it is smaller than the one the cooperating one would receive in case the other defected them (because they both cooperated and helped to solve the case).
Let’s replace the length of the punishment by points (i.e. shorter punishment = more points) and look at the game from the game theoretic point of view. We have a two-player game with the following reward matrix.
C | D | |
---|---|---|
C | 3 | 0 |
D | 5 | 1 |
In the matrix, the rows are my choices and the columns are the opponent’s choices. The numbers are the points I would receive in all possible situations.
When we investigate the matrix, we can determine, how the players will play. Rational player will probably think along these lines:
- I can either cooperate or defect.
- If I assume, the opponent will defect me, I should defect them (1>0).
- If I assume, the opponent will cooperate with me, I should defect them again (5>3).
- Therefore, I will defect my opponent.
The opponent follows the same though process and arrives to the same conclusion. Therefore, they both defect each other and receive one point. If the prisoner’s dilemma is played only once, this is the only outcome we can get, assuming both the prisoners are rational.
Iterated Prisoner’s Dilemma
If the game is played repeatedly (the same players play with each other multiple times and both of them try to maximize their cumulative reward) the situation becomes more interesting. In this case, there is the possibility to cooperate with each other, as any defection can be punished in the next iteration.
However, it is important that the game is either played indefinitely, or that neither of the players know how many iterations there are. In case they knew the number of iterations, it would be better for them to defect in the last iteration. And because both of them knew, they would defect in the last iteration, it would be better for them to defect in the second to last iteration. And so on.
Iterated Prisoner’s Dilemma Strategies
There are a few simple strategies for iterated prisoners dilemma:
- Always Defect - always defects, regardless of the play of the other player
- Always Cooperate - always cooperates, regardless of the play of the other player
- Random - plays randomly
- Tit-for-tat - cooperates in the first round and plays what the other player played in the next rounds, i.e. it defects if the player defected in the previous round (punishes the defection), and cooperates if the other player cooperated
You can of course make much more other strategies. For inspiration, you can have a look at the Axelrod’s paper he published about the tournament he made in the 80s.
Prisoner’s Dilemma Tournament
The source codes contain a simple program to run a tournament of prisoner’s dilemma strategies it finds in the strategies
folder. It expects the strategy to be a Python file with a function create_strategy()
. This function returns the strategy itself, as an object with methods reset
, last_move
, author_name
, and strategy_name
. Choose a unique file name and a unique strategy name. Implement the methods that return the name of the strategy and the authors name (strategy_name
and author_name
respectively).
You also have to implement the reset
method that is called before each game. The most important method is play
that should return your next move. Return either 'C'
or 'D'
for cooperate and defect respectively. The method last_move
is called after each turn to inform you how your opponent played.
The folder with strategies is set in the __main__
block in the tournament.py
file.
Assignment
Today’s assignment is simple - submit a strategy for the prisoners dilemma. If you submit one during the seminar and do not want to change/improve it, you do not need to submit anything else.
Deadline:
Monday, March 4, 2024 at 3 p.m. (I will run the tournament after this time)
Points:
- 7 points for submitting any strategy
- Additional points will be given based on your placement in the tournament
- top 20 percent - 3 points
- top 21-40 percent - 2 points
- top 41-60 percent - 1 point