HDDM 0.8.0 documentation

hddm Package

model Module

class hddm.models.HDDMBase(data, bias=False, include=(), wiener_params=None, p_outlier=0.0, **kwargs)

HDDM base class. Not intended to be used directly. Instead, use hddm.HDDM.

Methods

append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model
approximate_map([individual_subjs, ...]) Set model to its approximate MAP.
create_knodes()
create_model([max_retries]) Set group level distributions.
create_nodes_db()
draw_from_prior([update])
find_starting_values(*args, **kwargs) Find good starting values for the different parameters by optimization.
gen_stats([fname, print_hidden]) print statistics of all variables
get_average_model([quantiles])
get_data_nodes(idx)
get_group_nodes([stochastic])
get_group_traces() Returns a DataFrame containing traces of all stochastic
get_non_observeds()
get_observeds()
get_stochastics()
get_subj_nodes([stochastic])
get_traces() Returns a DataFrame containing traces of all stochastic
iter_group_nodes(**kwargs)
iter_non_observeds()
iter_observeds()
iter_stochastics()
iter_subj_nodes(**kwargs)
load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g.
map([runs, warn_crit, method]) Find MAP and set optimized values to nodes.
mcmc([assign_step_methods]) Returns pymc.MCMC object of model.
optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2.
plot_posterior_predictive(*args, **kwargs)
plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles.
plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors
plot_posteriors_conditions(*args, **kwargs) Plot all group posteriors listed in depends_on on individual graphs.
pre_sample()
print_stats([fname, print_hidden]) print statistics of all variables
sample(*args, **kwargs) Sample from posterior.
sample_emcee([nwalkers, samples, ...])
save(fname) Save model to file.
set_values(new_values) set values of nodes according to new_values
plot_posterior_quantiles(*args, **kwargs)

Plot posterior predictive quantiles.

Arguments :

model : HDDM model

Optional :
value_range : numpy.ndarray

Range over which to evaluate the CDF.

samples : int (default=10)

Number of posterior samples to use.

alpha : float (default=.75)

Alpha (transparency) of posterior quantiles.

hexbin : bool (default=False)

Whether to plot posterior quantile density using hexbin.

data_plot_kwargs : dict (default=None)

Forwarded to data plotting function call.

predictive_plot_kwargs : dict (default=None)

Forwareded to predictive plotting function call.

columns : int (default=3)

How many columns to use for plotting the subjects.

save : bool (default=False)

Whether to save the figure to a file.

path : str (default=None)

Save figure into directory prefix

class hddm.models.HDDM(*args, **kwargs)

Create hierarchical drift-diffusion model in which each subject has a set of parameters that are constrained by a group distribution.

Arguments :
data : pandas.DataFrame

Input data with a row for each trial. Must contain the following columns:

  • ‘rt’: Reaction time of trial in seconds.
  • ‘response’: Binary response (e.g. 0->error, 1->correct)
  • ‘subj_idx’: A unique ID (int) of each subject.
  • Other user-defined columns that can be used in depends_on keyword.
Optional :
informative : bool <default=True>

Whether to use informative priors (True) or vague priors (False). Information about the priors can be found in the methods section. If you run a classical DDM experiment you should use this. However, if you apply the DDM to a novel domain like saccade data where RTs are much lower, or RTs of rats, you should probably set this to False.

is_group_model : bool

If True, this results in a hierarchical model with separate parameter distributions for each subject. The subject parameter distributions are themselves distributed according to a group parameter distribution.

depends_on : dict

Specifies which parameter depends on data of a column in data. For each unique element in that column, a separate set of parameter distributions will be created and applied. Multiple columns can be specified in a sequential container (e.g. list)

Example:
>>> hddm.HDDM(data, depends_on={'v': 'difficulty'})

Separate drift-rate parameters will be estimated for each difficulty. Requires ‘data’ to have a column difficulty.

bias : bool

Whether to allow a bias to be estimated. This is normally used when the responses represent left/right and subjects could develop a bias towards responding right. This is normally never done, however, when the ‘response’ column codes correct/error.

p_outlier : double (default=0)

The probability of outliers in the data. if p_outlier is passed in the ‘include’ argument, then it is estimated from the data and the value passed using the p_outlier argument is ignored.

default_intervars : dict (default = {‘sz’: 0, ‘st’: 0, ‘sv’: 0})

Fix intertrial variabilities to a certain value. Note that this will only have effect for variables not estimated from the data.

plot_var : bool

Plot group variability parameters when calling pymc.Matplot.plot() (i.e. variance of Normal distribution.)

trace_subjs : bool

Save trace for subjs (needed for many statistics so probably a good idea.)

std_depends : bool (default=False)

Should the depends_on keyword affect the group std node. If True it means that both, group mean and std will be split by condition.

wiener_params : dict

Parameters for wfpt evaluation and numerical integration.

Parameters:
  • err: Error bound for wfpt (default 1e-4)
  • n_st: Maximum depth for numerical integration for st (default 2)
  • n_sz: Maximum depth for numerical integration for Z (default 2)
  • use_adaptive: Whether to use adaptive numerical integration (default True)
  • simps_err: Error bound for Simpson integration (default 1e-3)
Example :
>>> data, params = hddm.generate.gen_rand_data() # gen data
>>> model = hddm.HDDM(data) # create object
>>> mcmc.sample(5000, burn=20) # Sample from posterior

Methods

append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model
approximate_map([individual_subjs, ...]) Set model to its approximate MAP.
create_knodes()
create_model([max_retries]) Set group level distributions.
create_nodes_db()
draw_from_prior([update])
find_starting_values(*args, **kwargs) Find good starting values for the different parameters by optimization.
gen_stats([fname, print_hidden]) print statistics of all variables
get_average_model([quantiles])
get_data_nodes(idx)
get_group_nodes([stochastic])
get_group_traces() Returns a DataFrame containing traces of all stochastic
get_non_observeds()
get_observeds()
get_stochastics()
get_subj_nodes([stochastic])
get_traces() Returns a DataFrame containing traces of all stochastic
iter_group_nodes(**kwargs)
iter_non_observeds()
iter_observeds()
iter_stochastics()
iter_subj_nodes(**kwargs)
load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g.
map([runs, warn_crit, method]) Find MAP and set optimized values to nodes.
mcmc([assign_step_methods]) Returns pymc.MCMC object of model.
optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2.
plot_posterior_predictive(*args, **kwargs)
plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles.
plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors
plot_posteriors_conditions(*args, **kwargs) Plot all group posteriors listed in depends_on on individual graphs.
pre_sample([use_slice])
print_stats([fname, print_hidden]) print statistics of all variables
sample(*args, **kwargs) Sample from posterior.
sample_emcee([nwalkers, samples, ...])
save(fname) Save model to file.
set_values(new_values) set values of nodes according to new_values
class hddm.models.HDDMStimCoding(*args, **kwargs)

HDDM model that can be used when stimulus coding and estimation of bias (i.e. displacement of starting point z) is required.

In that case, the ‘resp’ column in your data should contain 0 and 1 for the chosen stimulus (or direction), not whether the response was correct or not as you would use in accuracy coding. You then have to provide another column (referred to as stim_col) which contains information about which the correct response was.

Arguments :
split_param : {‘v’, ‘z’} <default=’z’>

There are two ways to model stimulus coding in the case where both stimuli have equal information (so that there can be no difference in drift): * ‘z’: Use z for stimulus A and 1-z for stimulus B * ‘v’: Use drift v for stimulus A and -v for stimulus B

stim_col : str

Column name for extracting the stimuli to use for splitting.

drift_criterion : bool <default=False>

Whether to estimate a constant factor added to the drift-rate. Requires split_param=’v’ to be set.

Methods

append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model
approximate_map([individual_subjs, ...]) Set model to its approximate MAP.
create_knodes()
create_model([max_retries]) Set group level distributions.
create_nodes_db()
draw_from_prior([update])
find_starting_values(*args, **kwargs) Find good starting values for the different parameters by optimization.
gen_stats([fname, print_hidden]) print statistics of all variables
get_average_model([quantiles])
get_data_nodes(idx)
get_group_nodes([stochastic])
get_group_traces() Returns a DataFrame containing traces of all stochastic
get_non_observeds()
get_observeds()
get_stochastics()
get_subj_nodes([stochastic])
get_traces() Returns a DataFrame containing traces of all stochastic
iter_group_nodes(**kwargs)
iter_non_observeds()
iter_observeds()
iter_stochastics()
iter_subj_nodes(**kwargs)
load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g.
map([runs, warn_crit, method]) Find MAP and set optimized values to nodes.
mcmc([assign_step_methods]) Returns pymc.MCMC object of model.
optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2.
plot_posterior_predictive(*args, **kwargs)
plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles.
plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors
plot_posteriors_conditions(*args, **kwargs) Plot all group posteriors listed in depends_on on individual graphs.
pre_sample([use_slice])
print_stats([fname, print_hidden]) print statistics of all variables
sample(*args, **kwargs) Sample from posterior.
sample_emcee([nwalkers, samples, ...])
save(fname) Save model to file.
set_values(new_values) set values of nodes according to new_values
class hddm.models.HDDMRegressor(data, models, group_only_regressors=True, keep_regressor_trace=False, **kwargs)

HDDMRegressor allows estimation of the DDM where parameter values are linear models of a covariate (e.g. a brain measure like fMRI or different conditions).

Methods

append_stats_to_nodes_db(*args, **kwargs) smart call of MCMC.stats() for the model
approximate_map([individual_subjs, ...]) Set model to its approximate MAP.
create_knodes()
create_model([max_retries]) Set group level distributions.
create_nodes_db()
draw_from_prior([update])
find_starting_values(*args, **kwargs) Find good starting values for the different parameters by optimization.
gen_stats([fname, print_hidden]) print statistics of all variables
get_average_model([quantiles])
get_data_nodes(idx)
get_group_nodes([stochastic])
get_group_traces() Returns a DataFrame containing traces of all stochastic
get_non_observeds()
get_observeds()
get_stochastics()
get_subj_nodes([stochastic])
get_traces() Returns a DataFrame containing traces of all stochastic
iter_group_nodes(**kwargs)
iter_non_observeds()
iter_observeds()
iter_stochastics()
iter_subj_nodes(**kwargs)
load_db(dbname[, verbose, db]) Load samples from a database created by an earlier model run (e.g.
map([runs, warn_crit, method]) Find MAP and set optimized values to nodes.
mcmc([assign_step_methods]) Returns pymc.MCMC object of model.
optimize(method[, quantiles, n_runs, ...]) Optimize model using ML, chi^2 or G^2.
plot_posterior_predictive(*args, **kwargs)
plot_posterior_quantiles(*args, **kwargs) Plot posterior predictive quantiles.
plot_posteriors([params, plot_subjs, save]) plot the nodes posteriors
plot_posteriors_conditions(*args, **kwargs) Plot all group posteriors listed in depends_on on individual graphs.
pre_sample([use_slice])
print_stats([fname, print_hidden]) print statistics of all variables
sample(*args, **kwargs) Sample from posterior.
sample_emcee([nwalkers, samples, ...])
save(fname) Save model to file.
set_values(new_values) set values of nodes according to new_values

generate Module

hddm.generate.add_outliers(data, n_fast, n_slow, seed=None)

add outliers to data. outliers are distrbuted randomly across condition. Input: data - data n_fast/n_slow - numberprobability of fast/slow outliers

hddm.generate.gen_rand_data(params=None, n_fast_outliers=0, n_slow_outliers=0, **kwargs)

Generate simulated RTs with random parameters.

Optional :
params : dict <default=generate randomly>

Either dictionary mapping param names to values.

Or dictionary mapping condition name to parameter dictionary (see example below).

If not supplied, takes random values.

n_fast_outliers : int <default=0>

How many fast outliers to add (outlier_RT < ter)

n_slow_outliers : int <default=0>

How many late outliers to add.

The rest of the arguments are forwarded to kabuki.generate.gen_rand_data

Returns :

data array with RTs parameter values

Example :

# Generate random data set >>> data, params = hddm.generate.gen_rand_data({‘v’:0, ‘a’:2, ‘t’:.3},

size=100, subjs=5)

# Generate 2 conditions >>> data, params = hddm.generate.gen_rand_data({‘cond1’: {‘v’:0, ‘a’:2, ‘t’:.3},

‘cond2’: {‘v’:1, ‘a’:2, ‘t’:.3}})

Notes :

Wrapper function for kabuki.generate.gen_rand_data. See the help doc of that function for more options.

hddm.generate.gen_rand_params(include=(), cond_dict=None, seed=None)

Returns a dict of DDM parameters with random values.

Optional :
include : tuple

Which optional parameters include. Can be any combination of:

  • ‘z’ (bias, default=0.5)
  • ‘sv’ (inter-trial drift variability)
  • ‘sz’ (inter-trial bias variability)
  • ‘st’ (inter-trial non-decision time variability)

Special arguments are: * ‘all’: include all of the above * ‘all_inter’: include all of the above except ‘z’

cond_dict : dictionary

cond_dict is used when multiple conditions are desired. the dictionary has the form of {param1: [value_1, ... , value_n], param2: [value_1, ... , value_n]} and the function will output n sets of parameters. each set with values from the appropriate place in the dictionary for instance if cond_dict={‘v’: [0, 0.5, 1]} then 3 parameters set will be created. the first with v=0 the second with v=0.5 and the third with v=1.

seed: float

random seed

Output: if conditions is None:

params: dictionary

a dictionary holding the parameters values

else:
cond_params: a dictionary holding the parameters for each one of the conditions,

that has the form {‘c1’: params1, ‘c2’: params2, ...} it can be used directly as an argument in gen_rand_data.

merged_params:

a dictionary of parameters that can be used to validate the optimization and learning algorithms.

hddm.generate.gen_rts(size=1000, range_=(-6, 6), dt=0.001, intra_sv=1.0, structured=True, subj_idx=None, method='cdf', **params)

A private function used by gen_rand_data Returns a DataFrame of randomly simulated RTs from the DDM.

Arguments :
params : dict

Parameter names and values to use for simulation.

Optional :
size : int

Number of RTs to simulate.

range_ : tuple

Minimum (negative) and maximum (positve) RTs.

dt : float

Number of steps/sec.

intra_sv : float

Intra-trial variability.

structured : bool

Return a structured array with fields ‘RT’ and ‘response’.

subj_idx : int

If set, append column ‘subj_idx’ with value subj_idx.

method : str
Which method to use to simulate the RTs:
  • ‘cdf’: fast, uses the inverse of cumulative density function to sample, dt can be 1e-2.
  • ‘drift’: slow, simulates each complete drift process, dt should be 1e-4.
hddm.generate.gen_single_params_set(include=())

Returns a dict of DDM parameters with random values for a singel conditin the function is used by gen_rand_params.

Optional :
include : tuple

Which optional parameters include. Can be any combination of:

  • ‘z’ (bias, default=0.5)
  • ‘sv’ (inter-trial drift variability)
  • ‘sz’ (inter-trial bias variability)
  • ‘st’ (inter-trial non-decision time variability)

Special arguments are: * ‘all’: include all of the above * ‘all_inter’: include all of the above except ‘z’

hddm.generate.pdf_with_params(rt, params)

Helper function that calls full_pdf and gets the parameters from the dict params.

hddm.generate.rand(d0, d1, ..., dn)

Random values in a given shape.

Create an array of the given shape and propagate it with random samples from a uniform distribution over [0, 1).

Parameters :

d0, d1, ..., dn : int, optional

The dimensions of the returned array, should all be positive. If no argument is given a single Python float is returned.

Returns :

out : ndarray, shape (d0, d1, ..., dn)

Random values.

See also

random

Notes

This is a convenience function. If you want an interface that takes a shape-tuple as the first argument, refer to np.random.random_sample .

Examples

>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  #random
       [ 0.37601032,  0.25528411],  #random
       [ 0.49313049,  0.94909878]]) #random

likelihoods Module

hddm.likelihoods.add_quantiles_functions_to_pymc_class(pymc_class)

add quantiles methods to a pymc class Input:

pymc_class <class>
hddm.likelihoods.general_WienerCont(err=0.0001, n_st=2, n_sz=2, use_adaptive=1, simps_err=0.001)
hddm.likelihoods.generate_wfpt_stochastic_class(wiener_params=None, sampling_method='cdf', cdf_range=(-5, 5), sampling_dt=0.0001)

create a wfpt stochastic class by creating a pymc nodes and then adding quantile functions. Input:

wiener_params <dict> - dictonary of wiener_params for wfpt likelihoods sampling_method <string> - an argument used by hddm.generate.gen_rts cdf_range <sequance> - an argument used by hddm.generate.gen_rts sampling_dt <float> - an argument used by hddm.generate.gen_rts
Ouput:
wfpt <class> - the wfpt stochastic
hddm.likelihoods.wiener_like_contaminant(value, cont_x, v, sv, a, z, sz, t, st, t_min, t_max, err, n_st, n_sz, use_adaptive, simps_err)

Log-likelihood for the simple DDM including contaminants

utils Module

hddm.utils.EZ(pc, vrt, mrt, s=1)

Calculate Wagenmaker’s EZ-diffusion statistics.

Parameters :
pc : float

probability correct.

vrt : float

variance of response time for correct decisions (only!).

mrt : float

mean response time for correct decisions (only!).

s : float

scaling parameter. Default s=1.

Returns :
(v, a, ter) : tuple

drift-rate, threshold and non-decision time

The error RT distribution is assumed identical to the correct RT distrib.

Edge corrections are required for cases with Pc=0 or Pc=1. (Pc=.5 is OK)

Assumptions :
  • The error RT distribution is identical to the correct RT distrib.
  • z=.5 – starting point is equidistant from the response boundaries
  • sv=0 – across-trial variability in drift rate is negligible
  • sz=0 – across-trial variability in starting point is negligible
  • st=0 – across-trial range in nondecision time is negligible
Reference :

Wagenmakers, E.-J., van der Maas, H. Li. J., & Grasman, R. (2007).

An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14 (1), 3-22.

Example :
>>> EZ(.802, .112, .723, s=.1)
(0.099938526231301769, 0.13997020267583737, 0.30002997230248141)
See Also:

EZ_data

hddm.utils.EZ_data(data, s=1)

Calculate Wagenmaker’s EZ-diffusion statistics on data.

Arguments :
data : numpy.array

Data array with reaction time data. Correct RTs are positive, incorrect RTs are negative.

s : float

Scaling parameter (default=1)

Returns :
(v, a, ter) : tuple

drift-rate, threshold and non-decision time

See Also:

EZ

hddm.utils.data_quantiles(data, quantiles=(0.1, 0.3, 0.5, 0.7, 0.9))

compute the quantiles of 2AFC data

Output:
q_lower - lower boundary quantiles q_upper - upper_boundary_quantiles p_upper - probability of hitting the upper boundary
hddm.utils.flip_errors(data)

Flip sign for lower boundary responses.

Arguments :
data : numpy.recarray

Input array with at least one column named ‘RT’ and one named ‘response’

Returns :
data : numpy.recarray

Input array with RTs sign flipped where ‘response’ == 0

hddm.utils.gen_ppc_stats(quantiles=(10, 30, 50, 70, 90))

Generate default statistics for posterior predictive check on RT data.

Returns :OrderedDict mapping statistic name -> function
hddm.utils.hddm_parents_trace(model, obs_node, idx)

Return the parents’ value of an wfpt node in index ‘idx’ (the function is used by ppd_test)

hddm.utils.plot_posterior_quantiles(model, **kwargs)

Plot posterior predictive quantiles.

Arguments :

model : HDDM model

Optional :
value_range : numpy.ndarray

Range over which to evaluate the CDF.

samples : int (default=10)

Number of posterior samples to use.

alpha : float (default=.75)

Alpha (transparency) of posterior quantiles.

hexbin : bool (default=False)

Whether to plot posterior quantile density using hexbin.

data_plot_kwargs : dict (default=None)

Forwarded to data plotting function call.

predictive_plot_kwargs : dict (default=None)

Forwareded to predictive plotting function call.

columns : int (default=3)

How many columns to use for plotting the subjects.

save : bool (default=False)

Whether to save the figure to a file.

path : str (default=None)

Save figure into directory prefix

hddm.utils.plot_posteriors(model, **kwargs)

Generate posterior plots for each parameter.

This is a wrapper for pymc.Matplot.plot()

hddm.utils.post_pred_stats(data, sim_datasets, **kwargs)

Calculate a set of summary statistics over posterior predictives.

Arguments :

data : pandas.DataFrame

sim_data : pandas.DataFrame

Optional :
bins : int

How many bins to use for computing the histogram.

evals : dict

User-defined evaluations of the statistics (by default 95 percentile and SEM). :Example: {‘percentile’: scoreatpercentile}

plot : bool

Whether to plot the posterior predictive distributions.

progress_bar : bool

Display progress bar while sampling.

Returns :

Hierarchical pandas.DataFrame with the different statistics.

hddm.utils.qp_plot(x, groupby=None, quantiles=(0.1, 0.3, 0.5, 0.7, 0.9), ncols=None, draw_lines=True, ax=None)

qp plot Input:

x : either a HDDM model or data

grouby: <list>
a list of conditions to group the data. if x is a model then groupby is ignored.
quantiles : sequence
sequence of quantiles
ncols : int
number of columns in output figure
draw_lines: boolean (default: True)
draw lines to connect the same quantiles across conditions