Introduction
2024-11-08
Answer: -136.7
Define a variable for each construct of the VAST display. These can be measured variables or unmeasured (mediating) variables.
You can either refer to an actual measurement procedure or simply define a variable. In both cases you should explicitly define the following properties:
In the Google doc, below your Construct Source Table, create a new Variables Table with the following columns:
Example:
Construct in VAST display | Shortname | Type | Range/values | Anchors |
---|---|---|---|---|
Affective tone of instruction | aff_tone | Continuous | [-1; 1] | -1 = maximally negative 0 = neutral +1 = maximally positive |
Anxiety | anxiety | Continuous | [0; 1] | 0 = no anxiety 1 = maximal anxiety |
Kohlberg’s Stages of Moral Development | moral_stage | Ordinal | {1; 2; 3} | 1=Pre-conventional 2=Conventional 3=Post-Conventional |
… | … | … | … |
Note: This resembles a codebook; but for theoretical variables, not only for measured variables. As a heuristic, list all concepts that are not higher-order concepts (because these usually have no single numerical representation).
We want to model the following phenomenon (a specific version of the bystander effect):
Task: Sketch a first functional relationship that could model this phenomenon. Use the variables you defined in the previous step (including their labels and ranges).
Every causal path needs to be implemented as a mathematical function, where the dependent variable/output \(y\) is a function of the input variable(s) \(x_i\).
\(y = f(x_1, x_2, ..., x_i)\)
This can, for example, be a linear function, \(y = \beta_0 + \beta_1x_1\).
\(\color{red} y = \color{forestgreen} \beta_0 \color{black} + \color{forestgreen} \beta_1 \color{blue} x\)
→ \(\color{red} y\) = output variable, \(\color{forestgreen} \beta\)s = parameters, \(\color{blue} x\) = input variable.
Two types of parameters:
Note
Virtually all parameters (except natural constants) could be imagined as being free. It is a choice to fix some of them in order to simplify the model.
Fixing a parameter:
\(\color{forestgreen} \beta_0 \color{black} = 1 \rightarrow \color{red} y = \color{forestgreen} 1 \color{black} + \color{forestgreen} \beta_1 \color{blue} x\)
That means, the slope \(\color{forestgreen} \beta_1\) still can vary, but the intercept is fixed to 1.
Free parameters give flexibility to your function: If you are unsure about the exact relationship between two variables, you can estimate the best-fitting parameters from the data.
For example, sometimes a theory specifies the general functional form of a relationship (e.g., “With increasing \(x\), \(y\) is monotonously decreasing”), but does not tell how fast this decrease happens, where \(y\) starts when \(x\) is minimal, etc. These latter decisions are then made by the free parameters.
As a linear function is unbounded, it can easily happen that the computed \(y\) exceeds the possible range of values.
If \(y\) has defined boundaries (e.g., \([0; 1]\)), a logistic function can bound the values between a lower and an upper limit (in the basic logistic function, between 0 and 1):
\(y = \frac{1}{1 + e^{-x}}\)
With the 4PL* model from IRT, you can adjust the functional form to your needs, by:
*4PL = 4-parameter logistic model
(basic logistic function as dotted grey line)
The d
, a
, min
, and max
parameters can be used to “squeeze” the S-shaped curve into the range of your variables. For example, if your \(x\) variable is defined on the range \([0; 1]\), the following function parameters lead to a reasonable shift:
Of course, the logistic function and the beta distribution are just a start - you can use the full toolbox of mathematical functions to implement your model!
Note
These considerations about functional forms, however, are typically not substantiated by psychological theory or background knowledge - at least at the start of a modeling project. We choose them, because we are (a) acquainted to it, and/or (b) they are mathematically convenient and tractable.
Empirical evidence can inform both your choice of the functional form, and, in a model-fitting step, the values of the parameters.
my_function
arg1
, arg2
return(return_variable)
. If no explicit return()
statement is given, the last evaluated expression is returned by default.Tips:
roxygen2
documentation standard)R
function implements exactly one functional relationship of your model.#' Compute the updated expected anxiety
#'
#' The expected anxiety at any given moment is a weighted average of
#' the momentary anxiety and the previous expected anxiety.
#'
#' @param momentary_anxiety The momentary anxiety, on a scale from 0 to 1
#' @param previous_expected_anxiety The previous expected anxiety, on a scale from 0 to 1
#' @param alpha A factor that shifts the weight between the momentary anxiety (alpha=1)
#' and the previous expected anxiety (alpha=0).
#' @return The updated expected anxiety, as a scalar on a scale from 0 to 1
get_expected_anxiety <- function(momentary_anxiety, previous_expected_anxiety, alpha=0.5) {
momentary_anxiety*alpha + previous_expected_anxiety*(1-alpha)
}
roxygen2
comments start with #'
and are placed directly above the function definition.
@param parameter_name Description
. Provide the range of possible values if applicable.@return Description
Check out roxygen2 and document our exponential decay function with:
R
with proper roxygen2
documentation.Connect all functions to one “super-function”, which takes all exogenous variables as input and computes the focal output variable(s).
Test the super-function:
We can tune our free parameters to fit the model as good as possible to empirical data. This is called model fitting.
See the Find-a-fit app for an example of a simple linear regression with two parameters (intercept and slope) that are fitted by an optimization algorithm:
Create a design matrix for all possible combinations of experimental factors (i.e., those variables that you control/manipulate at specific levels).
The expand.grid()
function in R
comes as a handy tool for fully crossed factors (the first factor varies fastest, the last factor varies slowest):
To create a virtual sample, add one additional variable with a participant ID (this also determines the size of your sample):
pID valence speed
1 1 pos slow
2 2 pos slow
3 3 pos slow
4 1 neg slow
5 2 neg slow
6 3 neg slow
7 1 pos fast
8 2 pos fast
9 3 pos fast
10 1 neg fast
11 2 neg fast
12 3 neg fast
We have 12 participants overall: 3 participants in the pos/slow
condition, 3 in the neg/slow
condition, and so forth. Note that, although the participant ID repeats within each condition, these could be different (independent) participants if we assume a between-person design.
Add observed interindividual differences or situational variables. These are not experimentally fixed at specific levels, but vary randomly between participants:
pID valence speed age extra
1 1 pos slow 17 1.1
2 2 pos slow 17 2.8
3 3 pos slow 23 1.5
4 1 neg slow 28 0.6
5 2 neg slow 25 0.7
6 3 neg slow 29 0.5
7 1 pos fast 23 0.1
8 2 pos fast 27 -2.0
9 3 pos fast 26 1.0
10 1 neg fast 32 -0.5
11 2 neg fast 22 0.2
12 3 neg fast 25 -1.4
Once all input variables have been simulated, submit them to your model function and compute the outcome variable \(y\):
pID valence speed age extra y
1 1 pos slow 17 1.1 6.2
2 2 pos slow 17 2.8 10.5
3 3 pos slow 23 1.5 12.0
4 1 neg slow 28 0.6 9.3
5 2 neg slow 25 0.7 8.1
6 3 neg slow 29 0.5 9.7
7 1 pos fast 23 0.1 8.2
8 2 pos fast 27 -2.0 7.5
9 3 pos fast 26 1.0 8.2
10 1 neg fast 32 -0.5 8.1
11 2 neg fast 22 0.2 8.4
12 3 neg fast 25 -1.4 12.1
Make sure that the psi()
function can handle vectors as input (i.e., you can submit the entire data frame of input variables to the function).
Not every person is identical, so in reality there probably are interindividual differences at several places:
We can model these interindividual differences - or we assume that some of them are constant for all participants.
Furthermore, our models are always simplifications of reality. We can never model all relevant variables; and measurement error adds further noise. Consequently there is some unexplained variability (aka. random noise) in the system.
All additional sources of variation could be modeled as a single random error term pointing to the final outcome variable. This summarizes all additional sources of variation that are not explicitly modeled:
Add additional (summative) error variance to output variable:
pID valence speed age extra y y_obs
1 1 pos slow 17 1.1 6.2 7.8
2 2 pos slow 17 2.8 10.5 14.8
3 3 pos slow 23 1.5 12.0 10.4
4 1 neg slow 28 0.6 9.3 10.4
5 2 neg slow 25 0.7 8.1 8.5
6 3 neg slow 29 0.5 9.7 10.7
7 1 pos fast 23 0.1 8.2 11.0
8 2 pos fast 27 -2.0 7.5 10.1
9 3 pos fast 26 1.0 8.2 6.8
10 1 neg fast 32 -0.5 8.1 8.8
11 2 neg fast 22 0.2 8.4 11.0
12 3 neg fast 25 -1.4 12.1 12.2
The size of the error variance (in combination with upstream sources of interindividual variance) determines the effect size that can be observed in a simulated study. The more error variance, the lower the effect size.
R
project, create a design matrix with expand.grid()
and the following fully crossed experimental factors (\(n=30\) participants per condition):
DoI
) with the levels 0.2 and 0.8NOPB
) with the levels 0, 1, and 4baseResp
.neuro
) and age (age
). (Note: We don’t need them for our model, just for practice.)* Reminder: Exogenous variables are those variables in your model that have no arrows pointing towards them.
A handy distribution for the \([0; 1]\) range is the beta distribution. With its two parameters \(\alpha\) (also called \(a\) or shape1
) and \(\beta\) (also called \(b\) or shape2
), it can take many different forms:
How to choose \(\alpha\) and \(\beta\)? Asking ChatGPT/Wolfram Alpha for assistance
Assume that you intuitively started with a normal distribution with \(M=0.2\) and \(SD=0.1\) (rnorm(mean=0.2, sd=0.1)
) to simulate your values. But this function can generate values < 0 and > 1.
How can you achieve a beta distribution that approximates the properties of this normal distribution? You can work out the math yourself (e.g., by looking at the formula for the mean and the variance of the beta distribution) - or you can ask ChatGPT. After all, we only use this function as a tool to get some plausible values.
“We have a normal distribution with mean=0.2 and SD=0.1. But the results should be bounded on a scale from 0 to 1. Create a beta-distribution that mimics the properties of the described normal distribution.”
“To mimic the properties of a specified normal distribution (with a mean and standard deviation) using a beta distribution within a bounded interval (in this case, 0 to 1), we need to find the parameters of the beta distribution (alpha \(\alpha\) and beta \(\beta\)) that match the mean and variance of the normal distribution as closely as possible.
[snip]
The parameters for the beta distribution that mimic the properties of the described normal distribution (with mean = 0.2 and standard deviation = 0.1, bounded between 0 and 1) are \(\alpha = 3\) and \(\beta = 12\).
This beta distribution should closely match the shape and spread of the specified normal distribution within the bounds of 0 to 1.”
You can generate random values in R
with the rbeta
function. Here’s a comparison of a normal distribution and a matched beta distribution that respects the boundaries \([0; 1]\):
If you start simulating data for your virtual participants, you draw random values from a distribution. For example, the virtual participants might differ in their anxiety, which you previously defined on the range \([0; 1]\).
How can you generate random values that roughly look like a normal distribution, but are bounded to the defined range?
For simulations, it is good to know some basic distributions. Here are three interactive resources for choosing your distribution:
Based on your design matrix from the previous exercise:
ggplot2
).* Reminder: Exogenous variables are those variables in your model that have no arrows pointing towards them.
Simulate this design, analog to Fischer et al. (2006): “A 2 (bystander: yes vs. no) x 2 (danger: low vs high) factorial design was employed.”
“one could conduct this simulation with very large sample sizes and use a statistical function for an effect size, like Cohen’s d, to express the results. Any effect size that would be detectable with “realistic sample sizes” could then count as production of the statistical pattern and as such be used to evaluate robustness.”
Formal modeling in psychology - Empirisches Praktikum, Ludwig-Maximilians-Universität München