In my last post I outlined the concept of System 3, what it is and why it matters. In short, System 3 is the mental ability to imagine the future and evaluate how happy you will be in it â based on how pleasurable the process of imagining itself is.
A lot of different research strands have come together to result in the identification of System 3 as a distinct mental process. I summarise the key steps here:
The key step from here is to realise that model-based learning based on an underlying network of stimulus-response relations, the successor representation, causal reasoning, dopamine migration, truncation, anticipation, prospection and empathy can all be seen as different views on a common system or process: System 3.
In the System 3 process, humans maintain a mental representation of the world, structured as a causal graph: an set of beliefs about the cause-and-effect relations between events and objects in the world. They use this causal graph to make decisions. When presented with an new option, they explore mentally the likely consequences of that option: what will happen if I do it, then what will happen next as a result, and so on. (The successor representation is a specific chain of steps within this graph).
Anticipating each of these successive outcomes provides pleasure (or pain, if the outcome is negative). As a result, the decision maker experiences pleasure if the option is a good one, and this encourages them to carry out the action (and correspondingly, not to carry it out if the mental exploration is painful). The amount of reward gained from anticipation is related to the amount of reward gained from the real experience. The experience of present reward in return for future activities is what resolves the âprospection paradoxâ: how can our brains force us to forego present reward in favour of the possibility of future reward?
Observe that the decision maker can get pleasure simply from thinking about possible actions â they do not need to actually do anything! This is the key step that motivates people to prospect â the anticipation of an event, even if it will never happen, provides reward.
The Rescorla-Wagner formula pops up again now. Letâs say I think about an outcome and I am rewarded for thinking about it. My brain is rewarding me because it âwantsâ me to take the action Iâm thinking about, because that action in turn is likely to lead to another reward (that chocolate bar). Moreover, the act of thinking about the chocolate is, most likely, statistically linked to getting and eating the chocolate â so the brain is quite right to have learned this association. But if I keep thinking about it and never actually eat the chocolate, the anticipated reward will be less than expected, and my motivation to keep imagining it will diminish. In the long run, one might expect the motivation to think about chocolate to rapidly disappear altogether: but in practice, truncation stops this from happening. The brain goes off to do other things before the reward has been fully extinguished.
So the brain is motivated to keep imagining, and ruminating over, rewarding activities. Motivation can be seen as both the fuel, and the prize, for this process; over time the fuel is metaphorically âused upâ and motivation diminishes. It is likely that the motivation and reward for imagined events will move towards a stable equilibrium state. As the brain wanders around this network of imagined outcomes, it is indirectly testing out the reward levels of each event and its successors. As it simulates a chain of events and realises that the consequent reward is less, or more, than expected, it adjusts the reward assignments to reflect its more accurate prediction of how positive those events would be.
When the brain learns new rewards, it fits them into this network; and when it encounters new situations it tries to map the existing network onto the new landscape. This also happens when we watch a TV show (containing its own fictional world, of which we develop a new mental representation), or imagine the life of another person.
In all of these cases, we are rewarded for imagining what might happen â in the future, in a fictional world, or to another person â even though we gain no direct, immediate reward for any of these events. System 3 is what links the future to the present; fiction to the real world; and other peopleâs lives to our own.
System 3 provides the mechanism by which we come to care about, and be rewarded or punished for, purely symbolic outcomes. Typical examples are the success or failure of a work project (which we may care intensely about even if it is unlikely to affect our job security or income), a political attachment to symbols such as a national flag or a signature policy, the experience of turning 30, 40, 50 or 60 (or even 25, galling though this idea may be to many readers), the results of sporting events and many other non-material experiences. It is the impact of these experiences on our causal networks, or the mental simulations that they trigger, that provide reward or pain.
Is System 3 definitely distinct from System 1 and 2? This is a matter for judgement rather than evidence, but I would argue that:
I believe System 3 offers a good description of a class of decisions that are not well-explained by existing theory, and a strong foundation for understanding the economic valuation of mental states (at the heart of the emerging field of cognitive economics).
A lot of different research strands have come together to result in the identification of System 3 as a distinct mental process. I summarise the key steps here:
- The fundamental building block of System 3 is the stimulus-response relationship. It has been known for a long time that people easily learn stimulus-response relationships when they are rewarded for the response. The classic examples come from Pavlov (who rewarded dogs with food and discovered that they would start to get excited when they saw the experimenterâs white coat â as any pet owner will recognise), and Skinner (who trained pigeons to learn that pressing a lever was associated with getting fed). Although these original experiments were done on animals, there is plenty of evidence that the same principle applies to people. A typical example would be seeing the wrapper of a chocolate bar and hungrily anticipating the taste of the chocolate inside. (Stimulus-response is also the foundation of System 1 â but System 3 grows out of the same roots).
- The next step is the idea of successor representations. Neuroscientists (e.g. Stachenfeld et al 2017) have shown that we store in our brains the whole sequence of steps required to get to a goal. Each of these steps can be considered in turn to be an individual stimulus-response relationship. In other words, a chain of stimulus-response relations can be linked together, where the response of one step becomes the stimulus for the next.
- Schultz, Dayan and Montague (1997) showed that the motivational response can migrate along this chain as it becomes more familiar. Imagine the chain A->B->C represents a stimulus A that predicts response B, and B in turn predicts response C. C is the actual ârewardâ. For instance, A might be the logo of a chocolate manufacturer; B the packaging of a chocolate bar; and C the actual chocolate. As you see more chocolate packaging and open it up to discover chocolate, the packaging itself will start to motivate you before you even get to the chocolate. Then in turn, the logo might become motivating.
- The way that motivation changes with reward is governed by the Rescorla-Wagner model (Rescorla and Wagner 1972): if the reward experienced from an event is more (or less) than was expected, the decision makerâs brain learns to strengthen (weaken) the causal connection and is motivated to repeat (or avoid) the action.
- More recent work from Dayan discusses the idea of truncation: that we mentally plan the steps in a process, but we donât plan all the way to the end. Instead we stop at some point, and base our decision on how good things look at this point. For example, a chess player might look three or four moves ahead and make a judgement about how good the position looks at that point, rather than trying to work through all the possibilities to the end of the game, which would be impossible.
- This in turn relates to work by on causal representations. A causal representation can be thought of as a complex network made up of individual stimulus-response âedgesâ. Sloman and Lagnado (2015) discuss how causal representations can support mental simulation and the development of narratives about the world.
- A separate set of discoveries was developing in parallel within the psychology literature. The idea of prospection was described in 2009 by Gilbert and Wilson in a Science article. They had observed that people think about the future, and get pleasure from doing so. This can have implications for psychological health, and more generally appears to be a common human activity.
- Pezzulo and Rigoli in 2011 published in Frontiers of Neuroscience âThe value of foresight: how prospection affects decision makingâ. They worked out a model to explain how decision makers can imagine their future motivations and use these to work out what actions they will want to take in the future â and to act accordingly in the present.
- This work in turn builds on two core ideas. The first is the idea of model-based decision making (as distinct from model-free decisions). Model-free learning (like those early Pavlovian experiments) starts from an external stimulus and learns the corresponding action or behaviour. See a lever â press the lever. There is no meaningful representation of what the lever might mean, or why pressing it is a good thing. Model-based learning introduces an intermediate step. You see the stimulus, and in your mind you consider what this might mean, and update your mental model of the world. Model-based learning and decisions turn out to be much more powerful, especially in more complex situations, and it is likely that people use model-based representations because it would be impossible to learn enough combinations of stimulus and behaviour to reflect all of our knowledge in a model-free way.
- The other line of research they draw on is the idea of utility from anticipation and dread. Anticipation is when we enjoy thinking about positive events in the future; dread is when we find it painful to think about negative future events. George Loewenstein has studied this extensively (Loewenstein 1987) and determined that people do enjoy the process of anticipation, and are sometimes willing to put off a pleasurable activity in order to extend the pleasure of anticipating it.
- Thomas Schelling, in The Mind As a Consuming Organ (1983) had asked why we shed a tear when watching Lassie. Do we think Lassie is real, or that the things that happen to her in the show are genuine? Of course not. But we still enjoy the program: our imagination provides us with reward for âpretending to believeâ in this fictional world. This is likely to be connected with the psychological capacity for empathy (Ainslie and Monterosso 2002).
- Neuroscience work in the mid-2000s (Padoa-Schippoa and Assad, 2006; Kable and Glimcher, 2007) discovered that the brain represents reward values when we make goal-directed decisions. Rather than being rewarded for taking certain actions, we (or, at least, monkeys) are rewarded for consuming specific goods. The representation of these goods in the mind provides evidence for the idea of model-based reasoning.
- Recent computational learning research (Hamrick 2018, Reichert 2018) shows that mental simulation is a powerful way to solve problems, and software algorithms which use this method show similarities to human decision making. This does not directly prove that human minds decide things in the same way, but it does offer support for the plausibility of this idea.
The key step from here is to realise that model-based learning based on an underlying network of stimulus-response relations, the successor representation, causal reasoning, dopamine migration, truncation, anticipation, prospection and empathy can all be seen as different views on a common system or process: System 3.
In the System 3 process, humans maintain a mental representation of the world, structured as a causal graph: an set of beliefs about the cause-and-effect relations between events and objects in the world. They use this causal graph to make decisions. When presented with an new option, they explore mentally the likely consequences of that option: what will happen if I do it, then what will happen next as a result, and so on. (The successor representation is a specific chain of steps within this graph).
Anticipating each of these successive outcomes provides pleasure (or pain, if the outcome is negative). As a result, the decision maker experiences pleasure if the option is a good one, and this encourages them to carry out the action (and correspondingly, not to carry it out if the mental exploration is painful). The amount of reward gained from anticipation is related to the amount of reward gained from the real experience. The experience of present reward in return for future activities is what resolves the âprospection paradoxâ: how can our brains force us to forego present reward in favour of the possibility of future reward?
Observe that the decision maker can get pleasure simply from thinking about possible actions â they do not need to actually do anything! This is the key step that motivates people to prospect â the anticipation of an event, even if it will never happen, provides reward.
The Rescorla-Wagner formula pops up again now. Letâs say I think about an outcome and I am rewarded for thinking about it. My brain is rewarding me because it âwantsâ me to take the action Iâm thinking about, because that action in turn is likely to lead to another reward (that chocolate bar). Moreover, the act of thinking about the chocolate is, most likely, statistically linked to getting and eating the chocolate â so the brain is quite right to have learned this association. But if I keep thinking about it and never actually eat the chocolate, the anticipated reward will be less than expected, and my motivation to keep imagining it will diminish. In the long run, one might expect the motivation to think about chocolate to rapidly disappear altogether: but in practice, truncation stops this from happening. The brain goes off to do other things before the reward has been fully extinguished.
So the brain is motivated to keep imagining, and ruminating over, rewarding activities. Motivation can be seen as both the fuel, and the prize, for this process; over time the fuel is metaphorically âused upâ and motivation diminishes. It is likely that the motivation and reward for imagined events will move towards a stable equilibrium state. As the brain wanders around this network of imagined outcomes, it is indirectly testing out the reward levels of each event and its successors. As it simulates a chain of events and realises that the consequent reward is less, or more, than expected, it adjusts the reward assignments to reflect its more accurate prediction of how positive those events would be.
When the brain learns new rewards, it fits them into this network; and when it encounters new situations it tries to map the existing network onto the new landscape. This also happens when we watch a TV show (containing its own fictional world, of which we develop a new mental representation), or imagine the life of another person.
In all of these cases, we are rewarded for imagining what might happen â in the future, in a fictional world, or to another person â even though we gain no direct, immediate reward for any of these events. System 3 is what links the future to the present; fiction to the real world; and other peopleâs lives to our own.
System 3 provides the mechanism by which we come to care about, and be rewarded or punished for, purely symbolic outcomes. Typical examples are the success or failure of a work project (which we may care intensely about even if it is unlikely to affect our job security or income), a political attachment to symbols such as a national flag or a signature policy, the experience of turning 30, 40, 50 or 60 (or even 25, galling though this idea may be to many readers), the results of sporting events and many other non-material experiences. It is the impact of these experiences on our causal networks, or the mental simulations that they trigger, that provide reward or pain.
Is System 3 definitely distinct from System 1 and 2? This is a matter for judgement rather than evidence, but I would argue that:
- System 1 is primarily about fast, nonconscious processes â while System 3, though automatic, is slower and can be quite conscious
- System 2 processes are about accurately recreating, then symbolically and logically manipulating the material laws of the real world. For example, System 2 can tell you that if you save $1000 today, you will have $1030 this time next year. It canât tell you how you will feel about that, or which is better. System 3, however, lets you try out the feeling of spending $1000, the smug satisfaction of not spending it, and the pleasure you may get next year from that extra $30.
- System 3 involves a specific and distinctive mental process that is dissimilar to the instant, model-free leaps of System 1 and the emotionless, rule-based, non-causal reasoning of System 2.
I believe System 3 offers a good description of a class of decisions that are not well-explained by existing theory, and a strong foundation for understanding the economic valuation of mental states (at the heart of the emerging field of cognitive economics).