This document illustrates how the R package BayesTwin can be used to analyse item-level twin data.

First steps

Before you can actually use BayesTwin, you need to take a few steps:

Install and load package

Then you will need to install and load the package in R:

#install.packages("BayesTwin")
#library(BayesTwin)
library(devtools)
install_github("ingaschwabe/BayesTwin")
library(BayesTwin)

Data-simulation

We first simulate some data to show how the main function of BayesTwin works:

simulated_data = simulatetwin(n_mz = 2000, n_dz = 5000, var_a = 0.5, var_c = 0.3, 
                              model = "ACE", n_items = 40, ge = TRUE,
                              ge_beta0 = log(0.2), ge_beta1 = 0,
                              irt_model = "1PL")

The code above simulates item-level data of 2000 MZ twin pairs and 5000 DZ twin pairs under the ACE model. We just simulated item-level data of 2000 MZ twin pairs. For the generation of the item data, the one parameter logistic model (1PL) is used, meaning that we simulated dichotomous (0=FALSE,1=TRUE) item data as is typical for for example cognitive performance data (e.g., mathematical ability). The item data is saved in the object “simulated_data”.

Two matrices are returned, y_mz for MZ twin pairs and y_dz for DZ twin pairs. To analyse the data using the main function IRTtwin of BayesTwin, the data of MZ and DZ twins needs to be stored in two different matrices:

The data matrix for the MZ twins, itemdata_mz (simulated_data$y_mz) consists of n_mz (i.e., 100) rows for the i-th MZ family and of 2n_items (i.e., 80) columns with the item answers of the first (these are columns 1:n_items, i.e., 1:40) and second twin (columns n_items + 1:2n_items) of a family. For example, y_mz[1,22] is the response of the first twin from family 1 to item 22 and y_mz[1,23] is the response of the seocnd twin2 from family 1 to item 1 if n_items = 22. For example, these are the first 5 item responses for the first twin of the first 2 MZ twin families:

head(itemdata_mz[1:5], 2)
## [1] 0 0

The same logic applies to the data of DZ families. For example, these are the item responses for the second twin of the first 2 DZ twins families:

head(itemdata_dz[41:80], 2)
## [1] 1 0

Run the analysis

Now we can run the analysis using the main function of the BayesTwin package and save it in an object:

results = IRTtwin(data_mz = itemdata_mz, data_dz = itemdata_dz, twin1_datacols_p = 1:40, twin2_datacols_p = 41:80, decomp_model = "ACE", irt_model = "1PL", ge = TRUE, n_iter = 7000, n_burnin = 5000, n_chains = 1)

The code above analyses the data under an ACE and one parameter logistic (1PL) model including a genotype-environment interaction (GxE). As a burn-in, 5000 iterations are used, followed by an additional 7000 iterations.

Before looking at the results, it is important to check that the MCMC algorithm has converged to the posterior distribution. To check convergence, we use the posterior samples and the bayestwinplot function with t = “trace”.

We first have a look at the variance components:

plotbayestwin(results$samples_var_a, t = "trace")

plotbayestwin(results$samples_var_c, t = "trace")