From Computational Cognitive Neuroscience Wiki
Contents[hide]
|
Action Selection and Probabilistic Reinforcement Learning in the Basal Ganglia
- The project file: BG.proj (click and Save As to download, then open in Emergent)
- Additional files for pre-trained Go/NoGo activation data:
Project Documentation
This simplified basal ganglia (BG) network considers learning and action selection among just two alternative responses (but see Frank (2006) for a model with four alternative choices and also explores the function of the subthalamic nucleus and the 'hyperdirect' pathway).
? It is recommended that you click here to undock this document from the main project window. Use the Window menu to find this window if you lose it, and you can always return to this document by browsing to this document from the docs section in the left browser panel of the project's main window. |
The BG circuitry is notoriously complex and counterintuitive, involving a combination between excitatory and inhibitory projections and multiple pathways. Given a sensory stimulus, the pre/motor cortex generates candidate motor actions, and then the BG selectively \"gates\" one of these actions to be executed while suppressing the other action. This gating process occurs via a sequence of steps from the Striatum through two opposing pathways (the direct and indirect pathways) to the output nucleus, the Thalamus, and back up to cortex. The network learns which actions to select and which to suppress as a function of reinforcement signals encoded by dopamine. That is, unlike the error-driven *supervised* learning mechanisms in other task learning simulations, this network learns to make responses based only on an unsupervised reinforcement signal following actions, and is never 'told' which response it should have made.
? As usual begin by inspecting the pattern of weights in the network. |
The Striatum is divided into two halfs, with \"Go\" units in the left half, and \"NoGo\" units in the right, and separate columns for each response. All striatal neurons receive input from sensory cortex (the Input layer). However, inputs from frontal cortex are topographic: The first column of Go units receives input from the first column of motor units representing Response 1 (R1), whereas the second column receives input from the second column (R2), and similarly for NoGo units (columns 3 and 4 are NoGo-R1 and NoGo-R2). Each Go column of Striatal units projects to its corresponding column of the globus pallidus internal segment (GP_Int), which in turn projects to the corresponding column of Thalamus. In contrast, each NoGo column projects to the GP_Ext, which in turn projects to GP_Int. Remember that projections between striatum, GP_Int, GP_Ext and Thalamus are all inhibitory (GABA is the primary neurotransmitter). Also examine patterns of weights between motor cortex and Thalamus and back(these projections are excitatory).
? To get a sense of how the dynamics of action selection work in more detail, step through a single trial a few cycles at a time. Click Init, and then Step: Cycle on the MasterControl panel. (You can choose to step 5 Cycles at a time instead of clicking this button for every cycle by checking off the '5' box next to Step). |
During the initial cycles of settling you can observe the basic \"default\" function of the BG to suppress responses. A stimulus input pattern is presented, and at first only the neurons in the globus pallidus (internal and external segments; GP_Int and GP_Ext respectively) are active. Neurons in these areas are tonically active with high baseline firing rates (here this is due to a reverse leak current in which positive ions leak into the cell rather than out of it as usual). After a few more cycles, you will see that both competing responses become noisily activated (or \"considered\") in pre/motor cortex. (This noise is helpful for producing exploratory actions before learning has occurred). However, neither one of these responses receives sufficient excitation to elicit a response. Importantly, the projections from GP_Int units to the Thalamus are inhibitory, so that tonic GP_Int activation chronically suppresses the Thalamus. Because bottom-up thalamic-cortical activity is required for a motor response to become sufficiently activated, this thalamic inhibition prevents all responses from getting executed, leading to only noisy cortical activity and no action selection.
? Click through a few more Step: Cycle until you see activity in the Striatum in response to the Input pattern. |
You should see that the particular Go or NoGo units within a column that get active depend not only on the motor cortex activity but also on the sensory stimulus Input. Thus the striatal units encode conjunctions between stimulus input and actions that are considered in motor cortex, so that some units represent Go-R1 for a particular stimulus, whereas others may become active for another stimulus. If a particular column of Go units is more active than the NoGo units, it will tend to inhibit the corresponding column of the GP_Int, which ultimately will allow that action to be executed.
? Step through a few more cycles until you see this inhibition of a column in GP_Int. |
You should see that as the GP_Int units become inhibited, the Thalamus is no longer suppressed. This process is referred to as \"disinhibition\" because the effect of the striatal Go activation is not to directly excite, but only to remove inhibition of GP_Int onto Thalamus. The corresponding column of Thalamus will become excited only if it also receives top down activation from motor units in the same column. It is this property that makes the BG contribution to action selection a *gating* process: noisy striatal activity would not by itself select an action unless the motor cortex was already 'considering' that action as a plausible candidate. Similarly, in some cases both responses may get a similar level of Go activity in striatum initially, and in this case only the one also having greater cortical activity will be gated.
Once a Thalamic unit is active, the corresponding column of cortical motor units quickly becomes maximally active, while the competing motor column is inhibition. (There is lateral inhibition between the competing motor responses, and the thalamic activation gives the winning column sufficient activity so that the other column is completely inhibited).
? Once you are finished watching the network activations evolve in this trial, switch to using Step: Settle instead of Cycle. Step through a few trials in the same way. |
You may also see some NoGo activity (in the right half of striatal units), which would serve to prevent selection of the corresponding response, because NoGo units project to and inhibit the GP_Ext, which in turn send focused inhibitory projections to GP_Int. Thus whereas Go activity disinhibits the Thalamus, NoGo activity has the opposite effect, further activating GP_Int so that the Thalamus remains inhibited. Before learning. whether Go or NoGo activity for a given response predominates is somewhat arbitrary (related to random synaptic weights and overall dopamine levels), but informative differences will emerge with learning.
So far you have been stepping through single trials in which a stimulus is presented and the network gates a response, with no feedback. The first 20 trials for each network are run as a test this way with no weight changes, in order to get an initial baseline performance measure before learning occurs for each network (this will contribute to the statistic for how many errors the network makes with 0 epochs of training). One the training phase begins, each trial consists of two settling phases: one in which the network selects a response, and another in which it receives dopaminergic feedback about whether the outcome was good or bad. Once this phase begins, you should also see that depending on whether the network responded correctly or not (in the task described below), there will be either a dopamine burst (SNc units completely activated) or dip (complete inhibition). This reinforcement forms the basis for learning in the model, as described next.
Learning
Dopamine (DA) from the SNc modulates the relative balance of activity in Go versus NoGo units via simulated D1 and D2 receptors. Dopamine effects are greatest on those striatal units that are already activated by corticostriatal glutamatergic input. Go units activated by the current stimulus and motor response are further excited by D1 receptor stimulation. In contrast, DA is uniformally inhibitory on NoGo units via D2 receptors. This differential effect of DA on Go and NoGo units, via D1 and D2 receptors, affects performance (i.e more tonic DA leads to more Go and associated response vigor, faster reaction times) and, critically, learning.
Specifically, when the network selects the 'correct' response, a dopamine burst subsequently reinforces the response, further exciting Go units and inhibiting NoGo units. Learning occurs during this phasic DA signal, so that synapses between active cortical and striatal Go units are strengthened, whereas those for NoGo units are weakened. This learning allows the striatum to facilitate selection of the rewarding response in future presentations of the same stimulus. In contrast, if the network selects the incorrect response, DA units cease firing, and the associated dip in DA activity allows some NoGo units (which were previously inhibited by DA) to become more excited. The resulting increase in activity is also associated with strengthening of synapses from active cortical units. With learning, NoGo units differentially respond to stimulus-response combinations that have negative value, so that non-rewarding responses are likely to be suppressed. The mechanisms by which DA affects activity and plasticity are motivated by several biological experiments and are consistent with effects of dopamine D1 and D2 pharmacological agents on activity and long term plasticity (see Wiecki & Frank, 2010 for a recent review).
The net result is that the BG selects one response if a particular \"Go\" signal in the striatum is stronger than its corresponding \"NoGo\" signal, while concurrently suppressing alternative responses. Because direct and indirect pathway cells compete at BG output, the action most likely to be gated is a function of the difference in activity in these pathways for each action in parallel.
Training the model in probabilistic reinforcement tasks
Now that we've seen how the dynamics of action selection and learning work, let's put the mechanisms to a test, and see how they are sensitive to dopamine manipulations as in empirical studies. Specifically, many studies implicate the basal ganglia dopamine system in reinforcement learning in probabilistic environments. In these tasks, multiple stimuli are presented on different trials, and participants have to learn by trial and error which response to make. The difficulty is not only that there is no clear underlying 'rule' for determining which response to make (it is determined arbitrarily), but also that there is no absolute answer that will always work. However, certain responses are reinforced on a greater proportion of trials than others, so that people have to learn to integrate the reinforcement history across multiple instances to determine the optimal response.
To see if this basic BG network is capable of learning in such environments, the model is presented with four stimuli (A,B,C,D) each represented by a column of four input units (see the Train_Prob8070 InputData table). When stimulus A is presented, the model is rewarded (DA burst) for selecting R1 on 80% of trials (8/10 per epoch), and for selecting R2 on 20% of trials, and is punished (DA dip) otherwise. The opposite contingencies are simulated for stimulus B: R2 is correct on 80% of trials, while R2 is incorrect on these trials. Stimuli C and D are associated with 70% positive reinforcement for choosing R1 and R2 respectively. Thus this task tests the network's ability to learn multiple stimulus-response associations in probabilistic environments. Although this basic network simulates only 2 responses, the same mechanisms can support robust learning to select among multiple alternatives, maximizing selection of the action that is most rewarded and minimizing selection of actions that are least rewarding (see below for links to projects that support this).
(Technical note: phasic dopamine bursts and dips are controlled by the DA_Rew_Punish program, depending on the network's actions, so ignore the value of SNc in the data table as they are overwritten by this program).
? Do Batch Init and Batch Run on the MasterControl control panel, and switch to looking at the EpochOutputData tab or turn off the Display on the network tab to speed up processing. |
This will run a batch of 25 networks for 10 epochs of 10 trials per stimulus (for a total of 100 trials per stimulus, similar to that in a human study of this nature). This may take some time -- if you don't want to wait, you can simply examine the learning curves in the associated graphs indicated below, which should reflect those of the intact network which was run before this project was saved.
? Examine the learning curves in the EpochOutputData graph. |
These reflect the degree of optimal responding (i.e. in this analysis, responses are counted as 'correct' if they correspond to the most reinforced response, and as an 'error' otherwise). The number of errors out of a total of 20 test trials is plotted across all four stimuli (multiple trials per stimulus are presented to get a reliable estimate of accuracy: there is noise in unit activity that could lead to spurious correct or incorrect responding). After the batch of networks is finished, the average learning curve is plotted in the EpochOutputData_Group graph.
Accuracy at the beginning of learning should be on average 10/20 errors (chance). You can analyze accuracy separately for the different levels of difficulty (80 vs 70% reinforcement contingencies) on the TrialOutputData_Group graph. There you can also see a measure of response time (RT, in terms of the number of cycles before a response was selected) for the different conditions, plotted in red (note the alternate y axis for RT in cycles). If you want to make sure that the results for different difficulty levels matches your intuition (they are not labeled on the graph), you can look at the 'TrialOutputData_Group' data table in the left panel under data\\AnalysisData\\TrialOutputData_Group, which shows the same data but where you can see the labels in the first column for which condition is which (70 or 80%).
Question BG.1 : What is the average performance level of the networks after training (number of errors out of 20 in the EpochOutputData graph and/or mean sse in the Output as indicated on the TrialOutputData graph)? Do networks tend to learn the optimal response most of the time? |
Optional: : Explain any differences between 70% and 80% conditions, in terms of accuracy and RT. (this question may be optional depending on your instructor) |
Assessing Go and NoGo learned associations
Now let's see the nature of Go and NoGo associations the networks learn to solve the task. After training, all stimuli are presented one at a time, and we then compute Activation Based Receptive fields in the Striatum. These assess the activation of each striatum unit as a function of specific input patterns. (See http://grey.colorado.edu/emergent/index.php/Activation_Based_Receptive_Field for more detailed info). In this case, we are interested in the degree to which Striatal Go and NoGo units preferentially response to stimulus-response combinations that had been rewarding or non-rewarding during learning. Thus we measure the extent to which Go units coding for response R1 (in the first column) are activated when stimulus A is presented in the input, and similarly Go units for R2 when stimulus B is presented; A-R1 and B-R2 are rewarding stimulus-response combinations. Conversely, we can also assess activity in NoGo units for non-rewarding ('negative') stimulus-response combinations (A-R2 and B-R1).
After the receptive fields for each unit are calculated, we then sum these over all Go units for the relevant stimuli and responses, so that we can calculate total striatal Go activity for 'good' (most often rewarded) responses and subtract the total NoGo activity for these responses. This measure gives us the relative Go-NoGo activities, which are proportional to the probabilty that these rewarding stimulus-response combinations will be gated. Similarly, we can compute the total Go and NoGo activity for bad (least often rewarded) responses, and perform the same subtraction. In this case we expect relatively greater NoGo than Go activity after learning (so that the Go-NoGo difference should be negative, as an index of the probability that negative s-r combinations will be suppressed). The resulting statistics are reported in the Go_NoGo
data table, as follows:
gn_pos computes relative Go-NoGo striatal activity for positive stimulus-response conjunctions (R1 for A and R2 for B). Networks should learn greater Go than NoGo representations for these positive associations, and if so the value should be positive.
gn_neg computes relative Go-NoGo striatal activity for stimulus-response conjunctions that had been predominantly associated with negative outcomes (R2 for A and R1 for B). Networks should learn greater NoGo representations for these negative associations, and if so the value should be negative.
Because the data are noisy (probabilistic rewards, noisy unit activity, combined with random initial synaptic weights), we average across multiple networks with different sets of initial weights. This is done automatically at the end of running a batch of networks, with resulting mean and standard error of gn_pos and gn_neg reported in the Go_NoGo_Group
data table and plotted in the associated graph on the right.
? If you didn't already run a batch of intact networks above, make sure the number of intact SNc units is set to 4 (out of 4 units), This value is set on the MasterControl control panel. Do Batch Init and Batch Run on the MasterControl control panel. This will run a batch of 25 networks. (or again, if you don't want to wait, the saved data when you open the project should reflect that of an intact network, so simply switch to the graph below). |
You should see in the Go_NoGo_Group
graph that, on average, gn_pos is positive showing that positively rewarded stimulus-response combinations are associated with relatively greater Go than NoGo activity, wheras the opposite is true for responses that are more often associated with negative outcomes (i.e., gn_neg should be negative). This pattern will not hold for every individual network (due to random initial weights and noisy unit activity that might happen to favor, e.g., Go-B), just like it doesn't hold for every individual human participant. But you should see relatively more positive gn_pos than gn_neg in virtually every intact network, even if in some cases they are both positive or both negative. To see gn_pos and gn_neg for individual networks look at the Go_NoGo
data table and graph on the right, which plots these measures across networks on the x-axis (twice for each network; see notes at the bottom)- the black gn_pos dots should usually be positive and the red gn_neg ones negative. Note that any differences between gn_pos and gn_neg must be due to prior learning, even though dopamine can also influence the overall balance of Go vs NoGo activity during choice itself. (One active area of interest is the degree to which dopamine affects motivational incentive during decision making, or the degree to which one learns about actions leading to rewards or punishments, or both).
Simulating Parkinson's disease and dopamine medications
Parkinson's disease (PD) is associated with death of dopamine neurons in the SNc and associated slowness of movement. However, PD is not purely a motor selection disorder. Empirical studies testing the predictions of this model about the role of dopamine in BG learning have shown that PD patients show different patterns of Go or NoGo learning impairments depending on whether they are medicated or not (Frank et al, 2004; 2007; Palminteri et al, 2009, Bodi et al, 2010, etc).
? To simulate Parkinson's disease (PD), set the number of intact DA units to 2 in the MasterControl control panel. This sets the number of SNc units that are connected to the Striatum to 2 (out of 4). (For a more extreme example (e.g. severe PD), set num_intact_snc_units to 1). This reduction in number of intact DA units leads to effectively reduced tonic and phasic DA levels in the Striatum. |
You can step through a few trials to examine network dynamics under this dopamine depleted state. You might see overall greater levels of NoGo activation, which also produces general slowing in action selection (i.e., akinesia and bradykinesia, symptoms of Parkinson's disease). However, these effects become more prevalent with learning (and in fact, simulations have also shown that PD-like symptoms can progress due to an asymmetric learning bias even without further dopaminergic damage, see Wiecki & Frank 2010 and Beeler, 2011).
? Do Batch Init and Batch Run on the control panel. When it's done, look at the Go_NoGo_Group graph on the right again. If you don't want to wait, you can load the saved data from a batch of PD networks. To do so, open the Go_NoGo_Group data table in the middle panel and click \"Load Any Data\", and then select 'pd2_gn.dat' which you should have saved with the project file. For severe PD (only 1 intact SNc unit) select 'pd1_gn.dat'. |
Question BG.2 : Report gn_pos and gn_neg values. How do the Go and NoGo activations to rewarding and non-rewarding stimulus-response combinations differ from the intact networks? |
Optional: : Explain this pattern and how it might have emerged. Also look at the accuracy and RT's in the TrialOutputDataGraph (if you didn't run the network you'll have to load that too: e.g., pd2_trl.dat). How do these data compare to the intact case? Explain. (this question may be optional depending on your instructor) |
Parkinson's patients typically take medications that increase dopamine synthesis and release, and also in some cases medications that directly stimulate D2 receptors. These drugs improve motor function, but their effects on cognition are mixed and can even sometimes impair cognitive function (see Frank 2005, Cools 2006, or Wiecki & Frank 2010 for review). In fact, some patients will become pathological gamblers or compulsive shoppers after taking medication. One explanation for this effect is that the medications continually stimulate dopamine receptors, even when they shouldn't (i.e., when a dip in dopamine should occur, thereby making patients insensitive to losses).
? To simulate these effects in the model, set the number of intact DA units back to 4 units (this captures the increase in DA synthesis and release) and simply check off the \"meds\" checkbox in the MasterControl control panel. |
? Run a Batch of medicated networks. If you don't want to wait, you can load the saved data from a batch of medicated networks. To do so, open the Go_NoGo_Group data table in the middle panel and click \"Load Any Data\", and then select 'meds_gn.dat' which you should have saved with the project file. |
Question BG.3 : Report gn_pos and gn_neg values. Is there a learning bias in these networks? |
Optional: : Why might this be the case? In other experiments (e.g. Cools et al, 2001) it has been shown that these 'medicated' networks are preferentially impaired in the reversal phase of a probabilistic reversal task (e.g. 70/30 reward contingencies for a stimulus-action pair, and then suddenly these contingencies reverse to 30/70 so that the other response is now optimal). This model also shows these preferential impairments in reversal (see Frank 2005, simulations available at link at bottom of this page). Based on these gn_pos and gn_neg values, can you intuit why networks would show greater difficulty with reversal than initial learning? (this question may be optional depending on your instructor) |
Note although in this exploration we have focused on the asymmetry of internal value (Go/NoGo) representations as a function of dopamine status, other simulations with this network confirm that these effects also translate to the observed dissociations on actual action selection (choice) as observed in the empirical behavioral studies. (However in the current network choice of the rewarding action is synonymous with avoidance of the non-rewarding one, so a demonstration of this effect requires networks with more than two alternative responses - available as one of the supplementary demonstrations on the website linked below).
Role of the DA Pause
Some have argued that while phasic DA bursts encode positive predictions errors, DA dips may not be functionally effective, due to already low baseline firing rates of DA cells. However, the smaller range of DA dips is likely compensated for by a counteracting asymmetry in the receptor sensitivity to dips versus bursts. In particular, because DA binds much more strongly to D2 than D1 class of receptors, high-affinity D2 receptors are very sensitive to low tonic DA levels, whereas large increases in DA are required to functionally stimulate D1 receptors (e.g., Goto & Grace, 2005). Thus D2 receptors may be very sensitive once DA levels drop to low enough levels during DA dips, leading to NoGo learning in our model. Indeed, synaptic plasticity studies in rodents show that corticostriatal NoGo synapses are potentiated in the absence of D2 receptor stimulation, and this effect is exaggerated by DA depletion in Parkinson's disease, but reversed by D2 agonist administration (Shen et al, 2008) -- precisely the same pattern of results seen as in human Parkinson's patients on and off DA medications on NoGo learning in the probabilistic task. Further, in healthy subjects, genetic factors controlling striatal D2 receptor are strongly predictive of individual differences in probabilistic NoGo learning (Frank et al, 2007; 2009, etc).
Moreover, recent evidence shows that the magnitude of negative prediction errors is correlated with the duration of DA pauses, rather than in the change in firing rates (Bayer, Lau & Glimcher, 2007). The BG model provides a plausible explanation for why this might be the case: in order to learn from DA dips there has to be sufficient time not just for DA neurons to stop firing, but for DA levels to be cleared from the synapse so that NoGo neurons can be disinhibited. Thus the longer the pause, the greater probability that a given D2 receptor will be unoccupied, and the stronger the learning signal.
? To see this in the model, in the MasterControl control panel, change the \"burst/pause duration\" from 15 cycles to 8 cycles. This reduces the number of processing cycles in the reinforcement phase during which simulated DA levels change to their phasic value and drive Go or NoGo learning relative to tonic DA levels. Make sure all other settings are default (intact): all 4 SNc units connected, no meds. (or if you prefer, load the \"pause10_gn.dat\" file into the GoNoGo_Group graph). |
With this shorter burst/pause duration, the otherwise intact model exhibits spared Go learning but impaired NoGo learning. This is because there has to be sufficient time for the SNc DA units to deactivate, and for the NoGo units to become disinhibited. The time course of this is dependent on the integration of membrane potentials and sluggishness of the neurons in the model, which is a crude approximation of the temporal dynamics associated with DA reuptake and NoGo unit disinhibition. In contrast, although this setting also shortens the duration of DA bursts, you should see that Go learning is preserved. This is because the Go units that participated in selecting the response were already activated and simply got an additional boost of activity from the phasic burst. Thus together with the above simulations regarding the effects of diminishing the magnitude of DA bursts, the model demonstrates that large changes in DA firing rates during bursts are necessary for robust Go learning, but that long duration DA pauses are required for NoGo learning. It is interesting that the DA signals seem to follow this profile empirically!
Learning habits in the cortico-cortical pathway
The corticostriatal pathway and reinforcement learning is not the only form of learning in this model. There is also unsupervised Hebbian learning that occurs directly from sensory to motor cortex. This pathway learns the statistics of its own actions -- when a particular stimulus is presented, on average which action did it select in the past? As long as this learning is slower than that in the BG, the actions that had been most frequently selected given this stimulus will be those that were most often reinforced (Frank, 2005; Frank & Claus 2006). This learning provides a mechanism by which the development and learning of habits is initially dependent on BG and dopamine function, but their later expression is not: once these mappings are strong enough, the network can rapidly activate only the most appropriate response and is not dependent on BG gating. This provides a natural explanation for the following observations: (i) While learning of simple instrumental actions are initially BG and DA-dependent, their later expression is not (Smith-Roe & Kelley, 2000), (ii) in well-learned tasks, striatal activation is sometimes seen *after* motor unit activation and the onset of movement (Alexander & Crutcher, 1990) and (iii) Parkinson's patients have much less difficulty executing well learned motor actions.
? To see this in the model, increase the number of training epochs from 10 to 30 or 40 on the MasterControl panel. Make sure the network is in the intact state (4 SNc units). You don't have to run a full batch; simply stop the network near the end of training and just step through a few trials and observe the network activity in the motor cortex. |
You should see that well before the BG gates an action, the motor cortex now preferentially activates one response over the other, depending on the stimulus (i.e., it will tend to be that response it had selected most often for the stimulus in the past). This provides a simple demonstration that the motor cortex can first generate the candidate actions based on their prior probability of selection in the current sensory context, and if more than one of these actions is a suitable candidate, the BG selects between them. But if there one candidate is much stronger than the others, less 'help' is needed from the BG.
These mechanisms may play a key role in addiction. While brief exposure to addicting drugs (e.g. alcohol, nicotine, heroin) does not lead to a loss of controlled behavior, repeated intake often results in a lack of control (i.e. addiction) after a \"point of no return\" is passed. Interestingly, virtually all addicting drugs lead to increased levels of striatal DA (in one way or another).
Depth Question BG.4 : Can you relate the process underlying addiction formation to learning in the model? (this question may be optional depending on your instructor) |
Of course, people can also over-ride these kind of habitual responses, and the current model would not be so flexible without further extensive training. Other prefrontal cortical mechanisms are thus needed to make the simple BG-cortical model more sensitive to changes in outcome contingencies (Frank & Claus, 2006).
For several other demonstrations using this model and extensions thereof to capture data across a range of tasks and manipulations, see the BG Projects page of the Frank lab website href=\"http://ski.clps.brown.edu/BG_Projects/
Additional technical note on the activation based receptive fields
Note that for the activation based recepive field analysis we measure activity during the action selection process, i.e. prior to a response being actually gated. This is because we are interested in the striatal Go/NoGo activations that affect future choice. Once a response is gated and selected (ie. fully active in motor cortex), lateral inhibition within cortex completely suppresses activation of the competing motor cortical response, such that its corresponding striatal (Go and NoGo) representations also vanish (as these depend on excitatory cortical input). Thus while Go activity for the selected response would be evident at the end of settling, NoGo activity for the alternative response would only be seen earlier during the selection process. We thus measure Go/NoGo activities early on during selection, and to make sure the measurement is not overly sensitive to the particular cycle of settling, we measure it twice -- once after 20 cycles and once after 25 cycles -- and then take the average. This time course gives enough time to process the incoming stimulus and leads to differential striatal activation while both responses are still being \"considered\" (activated) in cortex, and taking the measurement twice ensures we are more likely to capture both activities associated with rejecting a negative option and those with choosing a positive one. Similar results are obtained if we just record the entire trajectory of activation values in all striatal Go and NoGo units, and just take the maximum value of activation within each unit throughout the course of settling. Also as mentioned earlier, similar dissociations of dopamine manipulations are seen on actual action selection (in addition to these internal valuation signals), but because selection of the rewarding response is equivalent to avoiding the non-rewarding one in this network, this demonstration requires a network with multiple alternative responses.
? You may now close the project (use the window manager close button on the project window or File/Close Project menu item) and then open a new one, or just quit emergent entirely by doing Quit emergent menu option or clicking the close button on the root window. |