# Methods¶

## Sequential Sampling Models¶

SSMs generally fall into one of two classes: (i) diffusion models
which assume that *relative* evidence is accumulated over time
and (ii) race models which assume independent evidence accumulation
and response commitment once the first accumulator crossed a boundary
([LaBerge62], [Vickers70]). Currently, HDDM includes two of the most
commonly used SSMs: the drift diffusion model (DDM)
([RatcliffRouder98], [RatcliffMcKoon08]) belonging to the
class of diffusion models and the linear ballistic accumulator (LBA)
([BrownHeathcote08]) belonging to the class of race models.

### Drift Diffusion Model¶

The DDM models decision making in two-choice tasks. Each choice is
represented as an upper and lower boundary. A drift-process
accumulates evidence over time until it crosses one of the two
boundaries and initiates the corresponding response
([RatcliffRouder98], [SmithRatcliff04]). The speed with
which the accumulation process approaches one of the two boundaries is
called drift-rate *v* and represents the relative evidence for or
against a particular response. Because there is noise in the drift
process, the time of the boundary crossing and the selected response
will vary between trials. The distance between the two boundaries
(i.e. threshold *a*) influences how much evidence must be accumulated
until a response is executed. A lower threshold makes responding
faster in general but increases the influence of noise on decision
making and can hence lead to errors or impulsive choice, whereas a
higher threshold leads to more cautious responding (slower, more
skewed RT distributions, but more accurate). Response time, however,
is not solely comprised of the decision making process – perception,
movement initiation and execution all take time and are lumped in the
DDM by a single non-decision time parameter *t*. The model also allows
for a prepotent bias *z* affecting the starting point of the drift
process relative to the two boundaries. The termination times of this
generative process gives rise to the reaction time distributions of
both choices.

An analytic solution to the resulting probability distribution of the termination times was provided by [Feller68]:

Since the formula contains an infinite sum, HDDM uses an approximation provided by [NavarroFuss09].

Later on, the DDM was extended to include additional noise parameters capturing inter-trial variability in the drift-rate, the non-decision time and the starting point in order to account for two phenomena observed in decision making tasks, most notably cases where errors are faster or slower than correct responses. Models that take this into account are referred to as the full DDM ([RatcliffRouder98]). HDDM uses analytic integration of the likelihood function for variability in drift-rate and numerical integration for variability in non-decision time and bias. More information on the model specifics can be found in :cite: SoferWieckiFrank.

### Linear Ballistic Accumulator¶

The Linear Ballistic Accumulator (LBA) model belongs to the class of race models ([BrownHeathcote08]). Instead of one drift process and two boundaries, the LBA contains one drift process for each possible response with a single boundary each. Thus, the LBA can model decision making when more than two responses are possible. Moreover, unlike the DDM, the LBA drift process has no intra-trial variance. RT variability is obtained by including inter-trial variability in the drift-rate and the starting point distribution. Note that the simplifying assumption of a noiseless drift-process simplifies the math significantly leading to a computationally more efficient likelihood function for this model.

In a simulation study it was shown that the LBA and DDM lead to similar results as to which parameters are affected by certain manipulations ([DonkinBrownHeathcoteEtAl11]).

## Hierarchical Bayesian Estimation¶

Statistics and machine learning have developed efficient and versatile Bayesian methods to solve various inference problems [Poirier06]. More recently, they have seen wider adoption in applied fields such as genetics [StephensBalding09] and psychology [ClemensDeSelenEtAl11]. One reason for this Bayesian revolution is the ability to quantify the certainty one has in a particular estimation. Moreover, hierarchical Bayesian models provide an elegant solution to the problem of estimating parameters of individual subjects and groups of subjects, as outlined above. Under the assumption that participants within each group are similar to each other, but not identical, a hierarchical model can be constructed where individual parameter estimates are constrained by group-level distributions ([NilssonRieskampWagenmakers11], [ShiffrinLeeKim08]).

Bayesian methods require specification of a generative process in form of a likelihood function that produced the observed data \(x\) given some parameters \(\theta\). By specifying our prior beliefs (which can be informed or non-informed) we can use Bayes formula to invert the generative model and make inference on the probability of parameters \(\theta\):

Where \(P(x|\theta)\) is the likelihood of observing the data (in this case choices and RTs) given each parameter value and \(P(\theta)\) is the prior probability of the parameters. In most cases the computation of the denominator is quite complicated and requires to compute an analytically intractable integral. Sampling methods like Markov-Chain Monte Carlo (MCMC) [GamermanLopes06] circumvent this problem by providing a way to produce samples from the posterior distribution. These methods have been used with great success in many different scenarios [GelmanCarlinSternEtAl03] and will be discussed in more detail below.

As noted above, the Bayesian method lends itself naturally to a hierarchical design. In such a design, parameters for one distribution can themselves be drawn from a higher level distribution. This hierarchical property has a particular benefit to cognitive modeling where data is often scarce. We can construct a hierarchical model to more adequately capture the likely similarity structure of our data. As above, observed data points of each subject \(x_{i,j}\) (where \(i = 1, \dots, S_j\) data points per subject and \(j = 1, \dots, N\) for \(N\) subjects) are distributed according to some likelihood function \(f | \theta\). We now assume that individual subject parameters \(\theta_j\) are normally distributed around a group mean with a specific group variance (\(\lambda = (\mu, \sigma)\), where these group parameters are estimated from the data given hyper-priors \(G_0\)), resulting in the following generative description:

Another way to look at this hierarchical model is to consider that our fixed prior on \(\theta\) from above is actually a random variable (in our case a normal distribution) parameterized by \(\lambda\) which leads to the following posterior formulation:

Note that we can factorize \(P(x|\theta)\) and \(P(\theta|\lambda)\) due to their conditional independence. This formulation also makes apparent that the posterior contains estimation of the individual subject parameters \(\theta_j\) and group parameters \(\lambda\).

## Hierarchical Drift-Diffusion Models used in HDDM¶

HDDM includes several hierarchical Bayesian model formulations for the DDM and LBA. For illustrative purposes we present the graphical model depiction of a hierarchical DDM model with informative priors and group only inter-trial variablity parameters. Note, however, that there is also a model with non-informative priors.

Individual graphical nodes are distributed as follows.

where \(x_{i, j}\) represents the observed data consisting of reaction time and choice and \(F\) represents the DDM likelihood function as formulated by [NavarroFuss09]. \(\mathcal{N}\) represents a normal distribution parameterized by mean and standard deviation, \(\mathcal{HN}\) represents a half-normal parameterized standard-deviation, \(\mathcal{G}\) represents a Gamma distribution parameterized by mean and rate, \(\mathcal{B}\) represents a Beta distribution parameterized by \(\alpha\) and \(\beta\). Note that in this model we do not attempt to estimate individual parameters for inter-trial variabilities. The reason is that the influence of these parameters onto the likelihood is often so small that very large amounts of data would be required to make meaningful inference at the individual level.

These priors are created to roughly match parameter values reported in the literature and collected by [MatzkeWagenmakers09]. In the below figure we overlayed those empirical values with the prior distribution used for each parameter.

HDDM then uses MCMC to estimate the joint posterior distribution of all model parameters.

Note that the exact form of the model will be user-dependent; consider
as an example a model where separate drift-rates *v* are estimated for
two conditions in an experiment: easy and hard. In this case, HDDM
will create a hierarchical model with group parameters
\(\mu_{v_{\text{easy}}}\), \(\sigma_{v_{\text{easy}}}\),
\(\mu_{v_{\text{hard}}}\), \(\sigma_{v_{\text{hard}}}\),and individual subject parameters \(v_{j_{\text{easy}}}\), and \(v_{j_{\text{hard}}}\).