7 Main data download functions:
Finally, the meat of why we’re here.
magma has two main data output functions,
magmaR provides methods for both.
7.1 retrieve() & retrieveJSON()
retrieve() is probably the main workhorse function of
magmaR. If your goal is to download “subject” data for a specific patient of a project, or for all patients of the project, this is the function to start with.
The basic structure is to provide which project,
projectName and which model,
modelName, that you want data for.
df <- retrieve( target = prod, projectName = "example", modelName = "subject") head(df)
## name project biospecimen group ## 1 EXAMPLE-HS1 example EXAMPLE-HS1-WB1 g1 ## 2 EXAMPLE-HS10 example EXAMPLE-HS10-WB1 g3 ## 3 EXAMPLE-HS11 example EXAMPLE-HS11-WB1 g3 ## 4 EXAMPLE-HS12 example EXAMPLE-HS12-WB1 g3 ## 5 EXAMPLE-HS2 example EXAMPLE-HS2-WB1 g1 ## 6 EXAMPLE-HS3 example EXAMPLE-HS3-WB1 g1
Optionally, a set of
attributeNames can be given as well to grab a more specific subset of data from the given project-model pair.
df <- retrieve( target = prod, projectName = "example", modelName = "subject", recordNames = c("EXAMPLE-HS1", "EXAMPLE-HS2"), attributeNames = "group") head(df)
## name group ## 1 EXAMPLE-HS1 g1 ## 2 EXAMPLE-HS2 g1
(You can use the
retrieveAttributes() functions described above in the Helper functions section to determine options for the
attributeNames inputs, respectively.)
Unfortunately, for certain attribute data types,
table, the literal data are not actually given via magma/retrieve when
format = "tsv". Instead only a pointer is returned. For such attributes, the
retrieveJSON() function can retrieve such data (via a magma/retrieve call with
format = "json") and a wrapper that makes efficient use of
retrieveJSON() specifically for matrix data retrieval is also included. Users should not typically need to make use of
retrieveJSON() directly, as when the desired data is a matrix,
retrieveMatrix() is recommended instead. More details on that function follow.
json <- retrieveJSON( target = prod, projectName = "example", modelName = "rna_seq", recordNames = c("EXAMPLE-HS1-WB1-RSQ1", "EXAMPLE-HS2-WB1-RSQ1"), attributeNames = "gene_counts")
Because matrices are a very common and important data structure, but are not accessible via
retrieve(), we provide this function. For a single matrix-type attribute, it will obtain data from magma in the required json structure, and then automatically reorganize said data into the matrix structure that a user would typically expect.
In the example below, we obtain the transcripts-per-million(-reads) normalized counts data for all records/samples of the example project. In this matrix, columns will be the individual records, and rows will be features. Specifically, for the example data here, those row names are “gene1”, “gene2”, and so on, but for real rna_seq data, those row names would typically be the Ensembl gene ids that each row of the matrix represents.
mat <- retrieveMatrix( target = prod, projectName = "example", modelName = "rna_seq", recordNames = "all", attributeNames = "gene_tpm") head(mat, n = c(6,3))
## EXAMPLE-HS10-WB1-RSQ1 EXAMPLE-HS11-WB1-RSQ1 EXAMPLE-HS12-WB1-RSQ1 ## gene1 0.5187 7.9960 4.8278 ## gene2 29.9572 42.9785 31.2540 ## gene3 111.6587 154.9225 114.0897 ## gene4 269.3555 426.7866 302.6299 ## gene5 0.3891 1.9990 0.0000 ## gene6 0.0000 0.0000 0.0000
Most user need not worry about the internal method, but for those that are curious: Under the hood, data is grabbed via
retrieveJSON() for 10 records at a time. The relevant data are then extracted from the complex list output of this retrieval route, then they are converted into a matrix structure where column names are the
recordNames. Row names are then grabbed from the model’s template for what this data should represent.
The Magma Query API lets you pull data out of Magma through an expressive query interface. Often, if you want a specific set of data from model-X, but only, say, for records where linked records of model-Y have data for attribute-Z, then this is the endpoint you want.
But note: the format of
query() calls can be a bit complicated, so it is recommended to check if
retreiveMetadata() might better serve your purposes first. We’ll describe that function a bit later.
For guidance on how to format
query() calls, see
?query and https://mountetna.github.io/magma.html#query.
query_out <- query( target = prod, projectName = "example", queryTerms = list('rna_seq', '::all', 'biospecimen', '::identifier') )
Details: The default output of this function is a list conversion of the direct json output returned by magma/query. This list will contain either 2 or 3 parts:
##  "answer" "type" "format"
answer, type (optional), and format.
Alternatively, the output can be reformatted as a dataframe if
format = "df" is given.
subject_ids_of_rnaseq_records <- query( target = prod, projectName = "example", queryTerms = list('rna_seq', '::all', 'biospecimen', '::identifier'), format = "df" ) head(subject_ids_of_rnaseq_records)
## example::rna_seq#tube_name example::biospecimen#name ## 1 EXAMPLE-HS10-WB1-RSQ1 EXAMPLE-HS10-WB1 ## 2 EXAMPLE-HS11-WB1-RSQ1 EXAMPLE-HS11-WB1 ## 3 EXAMPLE-HS12-WB1-RSQ1 EXAMPLE-HS12-WB1 ## 4 EXAMPLE-HS1-WB1-RSQ1 EXAMPLE-HS1-WB1 ## 5 EXAMPLE-HS2-WB1-RSQ1 EXAMPLE-HS2-WB1 ## 6 EXAMPLE-HS3-WB1-RSQ1 EXAMPLE-HS3-WB1
format = "df" is added, the list output will be converted to a data.frame where data comes from the
answer and column names come from the
This function attempts to simplify the process of obtaining “metadata” from model X for “target data” of model Y. For example, this function could be used to extract “subject”-model data from the “example” project that is linked to “rna_seq”-model records.
meta <- retrieveMetadata( target = prod, projectName = "example", meta_modelName = "subject", meta_attributeNames = "all", target_modelName = "rna_seq", target_recordNames = "all") head(meta, n = c(6,10))
## subject rna_seq biospecimen project group ## 1 EXAMPLE-HS1 EXAMPLE-HS1-WB1-RSQ1 EXAMPLE-HS1-WB1 example g1 ## 2 EXAMPLE-HS10 EXAMPLE-HS10-WB1-RSQ1 EXAMPLE-HS10-WB1 example g3 ## 3 EXAMPLE-HS11 EXAMPLE-HS11-WB1-RSQ1 EXAMPLE-HS11-WB1 example g3 ## 4 EXAMPLE-HS12 EXAMPLE-HS12-WB1-RSQ1 EXAMPLE-HS12-WB1 example g3 ## 5 EXAMPLE-HS2 EXAMPLE-HS2-WB1-RSQ1 EXAMPLE-HS2-WB1 example g1 ## 6 EXAMPLE-HS3 EXAMPLE-HS3-WB1-RSQ1 EXAMPLE-HS3-WB1 example g1
General Details: The function determines how
meta_modelName models relate to each other, then obtains data for
meta_attributeNames-attributes, from the
meta_modelName-model, for records of this model that are linked to
target_recordName-records of the
target_modelName-model. Data is then output as a data.frame with one row per
Specific Details: The function first determines the model -> model path for navigating between the meta and target models. (At the moment, ONLY parent links are used for this purpose, but utilization of link attributes is planned for the future.) Then,
query()s based on these paths are utilized to obtain how target-model and meta-model recordNames are linked. Data is then
retrieve()d from the
meta_attributeNames-attributes of records that are linked to
target_recordNames-records of the
target_modelName-model. Next, if there is more than 1:1 mapping between meta-model records to target-model records, the metadata is reorganized rightwards in order to have one output row per “target”-record. Finally, this data is output as a dataframe with rows =
target_recordNames and columns of linkage record identifiers followed by columns of each requested meta-model attribute.