Build Status AppVeyor Build Status codecov [Note codecov does not check MS Windows-only code]

Slack Join Slack channel and discuss

ctrdata for aggregating and analysing clinical trials

The package ctrdata provides functions for retrieving (downloading) information on clinical trials from public registers, and for aggregating and analysing such information. It can be used for the European Union Clinical Trials Register (“EUCTR”, and for (“CTGOV”, Development of ctrdata started mid 2015 and was motivated by the wish to understand trends in designs and conduct of trials and their availability for patients. The package is to be used within the R system.

Last edit 2019-04-29 for version 0.18.2, with bug fixes and new features:

Main features:

Remember to respect the registers’ copyrights and terms and conditions (see ctrOpenSearchPagesInBrowser(copyright = TRUE)). Please cite this package in any publication as follows: Ralf Herold (2019). ctrdata: Retrieve and Analyze Information on Clinical Trials from Public Registers. R package version 0.18.1,

Package ctrdata has been used for example for:

Overview of functions used in sequence:

Overview workflow
Overview workflow


1. Install package in R

Within R, use the following commands to get and install package ctrdata:

# Release version:

# Development version from
# Note build_opts is emptied so that vignettes are built:
devtools::install_github("rfhb/ctrdata", build_opts = "")

Package ctrdata can be found here on CRAN.

2. Command line tools perl, sed, cat and php (5.2 or higher)

These command line tools are only required for ctrGetQueryUrlFromBrowser(), a main function of package ctrdata. In Linux and macOS, these are usually already installed.

For MS Windows, install cygwin: In R, run ctrdata::installCygwinWindowsDoInstall() for an automated minimal installation into c:\cygwin (installations in folders corresponding to c:\cygw* will also be recognised and used). Alternatively, install manually cygwin with packages perl, php-jsonc and php-simplexml. This installation will consume about 160 MB disk space; administrator credentials not needed.

3. Mongo database

A remote or a local mongo database server can be used with the package ctrdata. Suggested installation instructions for a local database server are here.

A remote mongo database server such as here could be used; this is shown in the examples vignette.

Overview of functions in ctrdata

Name Function
ctrOpenSearchPagesInBrowser Open search pages of registers or execute search in web browser
ctrFindActiveSubstanceSynonyms Find synonyms and alternative names for an active substance
ctrGetQueryUrlFromBrowser Import from clipboard the URL of a search in one of the registers
ctrLoadQueryIntoDb Retrieve (download) or update, and annotate, information on clinical trials from register and store in database collection
dbQueryHistory Show the history of queries that were downloaded into the database collection
dbFindFields Find names of fields in the database collection
dbFindIdsUniqueTrials Produce a vector of de-duplicated identifiers of clinical trial records in the database collection
dbGetFieldsIntoDf Create a data.frame from records in the database collection with the specified fields
dfMergeTwoVariablesRelevel Merge two variables into a single variable, optionally map values to a new set of values
installCygwinWindowsDoInstall Convenience function to install a cygwin environment (MS Windows only)

Example workflow

The aim is to download protocol-related trial information and tabulate the trials’ status.

#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
#> Information on this package and how to use it:
#> Please respect the requirements and the copyrights of the
#> clinical trial registers when using their information. Call
#> ctrOpenSearchPagesInBrowser(copyright = TRUE) and visit
#> Testing helper binaries:
#> completed.

# Please review and respect register copyrights:
ctrOpenSearchPagesInBrowser(copyright = TRUE)
q <- ctrGetQueryUrlFromBrowser()
# * Found search query from EUCTR.

#                                  query-term query-register
# 1 query=cancer&age=under-18&phase=phase-one          EUCTR

If no parameters are given for a database connection: mongodb is used on localhost, port 27017, database “users”, collection “ctrdata”.

Under the hood, scripts and xml2json.php (in ctrdata/exec) transform EUCTR plain text files and CTGOV xml files to json format, which is imported into the database.

# Retrieve trials from public register:
# Alternative: 
# ctrLoadQueryIntoDb(q)

Tabulate the status of deduplicated trials

# Get all records that have values in all specified fields.
# Note that b31_... is an element within the array b1_...
result <- dbGetFieldsIntoDf(c("b1_sponsor.b31_and_b32_status_of_the_sponsor", 
                              "p_end_of_trial_status", "a2_eudract_number"))

# Eliminate trials records duplicated by EU Member State: 
uniqueids <- dbFindIdsUniqueTrials()
result    <- result[ result[["_id"]] %in% uniqueids, ]

# Tabulate the status of the clinical trial on the date of information retrieval
# Note some trials have more than one sponsor and values are concatenated with /.
with (result, table (p_end_of_trial_status, b1_sponsor.b31_and_b32_status_of_the_sponsor))
#                     b1_sponsor.b31_and_b32_status_of_the_sponsor
# p_end_of_trial_status    Commercial  Non-Commercial  Non-Commercial / Non-Commercial
#   Completed                      81              32                                0
#   Ongoing                       205             239                               12
#   Prematurely Ended              15              12                                0
#   Restarted                       0               1                                0
#   Temporarily Halted              4               1                                0

Representation in mongodb, as JSON

Example JSON representation
Example JSON representation

Features in the works


Issues and notes