Contents

1 Introduction

We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in (Palowitch et al. 2017), involves an additive effect of allele count and multiplicative component random noise (hence “ACME”: Additive-Contribution, Multiplicative-Error), and is defined as

\[y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i\]

where

We estimate the model using a fast iterative algorithm.
The algorithm estimates the model which is nonlinear only with respect to \(\eta = \beta_1 / \beta_0\)

\[y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i\]

2 Installation

ACMEeqtl can be installed with the following command.

install.packages("ACMEeqtl")

3 Using the Package

ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.

library(ACMEeqtl)

3.1 Testing a Single Gene-SNP Pair

First we generate sample gene expression, SNP allele counts, and a set of covariates.

# Model parameters
beta0 = 10000
beta1 = 50000

# Data dimensions
nsample = 1000
ncvrt = 19

### Data generation
### Zero average covariates
cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt)
cvrt = t(t(cvrt) - colMeans(cvrt))

# Generate SNPs
s = rbinom(n = nsample, size = 2, prob = 0.2)

# Generate log-normalized expression
y = log(beta0 + beta1 * s) + 
    cvrt %*% rnorm(ncvrt) + 
    rnorm(nsample)

We provide two equivalent functions for model estimation.

  • effectSizeEstimationR – fully coded in R
  • effectSizeEstimationC – faster version with core coded in C.
z1 = effectSizeEstimationR(s, y, cvrt)
z2 = effectSizeEstimationC(s, y, cvrt)

pander(rbind(z1,z2))
  beta0 beta1 nits SSE SST Ftest eta SE_eta
z1 9956 47308 6 963 1760 810 4.75 0.366
z2 9956 47308 6 963 1760 810 4.75 0.366

Variables z1, z2 show that the estimation was done in 6 iterations, with estimated parameters

  • \(\hat\beta_0\) = 9955.8 (true parameter is 10000)
  • \(\hat\beta_1\) = 47307.8 (true parameter is 50000)

3.2 Testing All Local Gene-SNP Pairs

First we generate a eQTL dataset in filematrix format (see filematrix package).

tempdirectory = tempdir()
z = create_artificial_data(
    nsample = 100,
    ngene = 100,
    nsnp = 501,
    ncvrt = 1,
    minMAF = 0.2,
    saveDir = tempdirectory,
    returnData = FALSE,
    savefmat = TRUE,
    savetxt = FALSE,
    verbose = FALSE)
## Loading required namespace: RSQLite

In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.

multithreadACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    cisdist = 1.5e+06,
    threads = 2,
    workdir = file.path(tempdirectory, "filematrices"),
    verbose = FALSE)

Now the filematrix ACME holds estimations for all local gene-SNP pairs.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

pander(t(TenResults))
geneid snp_id beta0 beta1 nits SSE SST F eta SE
1 1 125 -43.5 6 88.5 104 16.8 -0.347 0.0453
1 2 84.1 9.72 6 103 104 0.508 0.115 0.176
1 3 89.4 2.26 5 104 104 0.0264 0.0253 0.159
1 4 78.7 20.6 5 102 104 2.15 0.262 0.211
2 4 57.1 -8.94 4 154 156 0.957 -0.157 0.135
2 5 52.9 -3.1 4 156 156 0.115 -0.0586 0.163
2 6 81.9 -36.4 10 112 156 37.5 -0.444 0.0207
2 7 46.8 8.59 4 155 156 0.722 0.184 0.244
2 8 49.2 2.82 3 156 156 0.091 0.0573 0.199
2 9 54.2 -4.28 5 155 156 0.235 -0.079 0.151

3.3 Testing Multi-SNP Model for All Local Gene-SNP Pairs

Now we can estimate multi-SNP ACME models for each gene:

multisnpACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    workdir = file.path(tempdirectory, "filematrices"),
    genecap = Inf,
    verbose = FALSE)

Now the filematrix ACME_multiSNP holds estimations for all multi-SNP models.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

pander(t(TenResults))
geneid snp_id beta0 betas forward_adjR2
1 1 111 -45.7 0.139
1 4 111 14.5 0.149
1 2 111 14 0.151
2 6 91.4 -36.7 0.272
2 8 91.4 5.14 0.284
2 9 91.4 -8.36 0.285
2 4 91.4 -7.75 0.291
3 11 121 162 0.218
3 12 121 -24.3 0.22
4 17 57 41.4 0.0669

Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R\(^2\) value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model’s adjusted-R\(^2\) is greater than that from the previous combined model.

References

Palowitch, John, Andrey Shabalin, Yi-Hui Zhou, Andrew B Nobel, and Fred A Wright. 2017. “Estimation of Cis-eQTL Effect Sizes Using a Log of Linear Model.” Biometrics. Wiley Online Library.