DATASUS is the IT department of SUS – the Brazilian Unified Health System. They provide data on health establishments, mortality, access to health services and several health indicators nationwide. This function allows for an easy download of several DATASUS raw datasets, and also cleans the data in a couple of datasets. The sections below explains each avaliable dataset.
"datasus_sim_do"has SIM-DO mortality data
"datasus_sih"has SIH hospitalization data.
"datasus_cnes_lt"has data on the number of hospital beds.
raw_data: there are two options:
TRUE: if you want the data as it is originally.
FALSE: if you want the treated version of the data. Only effective for SIM-DO and subsets, SIH, and CNES-LT.
keep_all: only applies when raw_data is FALSE. There are two options:
TRUE: keeps all original variables, adding variable labels and possibly constructing extra variables.
FALSE: aggregates data at the municipality, thereby losing individual-level data, and only keeping aggregate measures.
time_period: picks the years for which the data will be downloaded
states: a vector of states by which to filter the data. Only works for datasets whose data is provided in separate files by state.
language: you can choose between Portuguese
("pt") and English
Each original SIM data file contains rows corresponding to a
declaration of death (DO), and columns with several characteristics of
the person, the place of death, and the cause of death. The data comes
from the main SIM-DO (Declarations of Death) dataset, which goes by the
"datasus_sim_do". There are also 4 subsets of
SIM-DO, namely SIM-DOFET (Fetal), SIM-DOMAT (Maternal), SIM-DOINF
(Children), and SIM-DOEXT (External Causes), with corresponding dataset
"datasus_sim_dofet", "datasus_sim_domat", "datasus_sim_doinf", "datasus_sim_doext".
Note that only SIM-DO provides separate files for each state, so all
other dataset options always contain data from the whole country.
Below is an example of downloading the raw data, and also using the
raw_data = FALSE option to obtain treated data. When this
option is selected, we create several variables for deaths from each
cause, which are encoded by their CID-10 codes. The function then
returns, by default, the aggregated data of mortality sources at the
municipality level. In this process, all the individual information such
as age, sex, race, and schooling are lost, so we also offer the option
keep_all = TRUE, which creates all the indicator
variables for cause of death, adds variable labels, and does not
aggregate, thereby keeping all individual-level variables.
library(datazoom.amazonia) # download raw data for the year 2010 in the state of AM. <- load_datasus(dataset = "datasus_sim_do", data time_period = 2010, states = "AM", raw_data = TRUE) # download treated data with the number of deaths by cause in AM and PA. <- load_datasus(dataset = "datasus_sim_do", data time_period = 2010, states = c("AM", "PA"), raw_data = FALSE) # download treated data with the number of deaths by cause in AM and PA # keeping all individual variables. <- load_datasus(dataset = "datasus_sim_do", data time_period = 2010, states = c("AM", "PA"), raw_data = FALSE, keep_all = TRUE)
Provides information on health establishments, avaliable hospital beds, and active physicians. The data is split into 13 datasets: LT (Beds), ST (Establishments), DC (Complimentary data), EQ (Equipment), SR (Specialized services), HB (License), PF (Practitioner), EP (Teams), RC (Contractual Rules), IN (Incentives), EE (Teaching establishments), EF (Philanthropic establishments), and GM (Management and goals).
Raw data is avaliable for all of them using the dataset option
datasus_cnes_lt, datasus_cnes_st, and so on, and treated
data is only avaliable for CNES - LT. When
raw_data = FALSE
is chosen, we return data on the number of total hospital beds and the
ones avaliable through SUS, which can be aggregated by municipality
keep_all = FALSE) or keeping all original
keep_all = TRUE).
library(datazoom.amazonia) # download treated data with the number of avaliable beds in AM and PA. <- load_datasus(dataset = "datasus_cnes_lt", data time_period = 2010, states = c("AM", "PA"), raw_data = FALSE)
Contains data on hospitalizations. Treated data only gains variable labels, with no extra manipulation. Beware that this is a much heavier dataset.
library(datazoom.amazonia) # download raw data <- load_datasus(dataset = "datasus_sih", data time_period = 2010, states = "AM", raw_data = TRUE) # download data in a single tibble, with variable labels <- load_datasus(dataset = "datasus_sih", data time_period = 2010, states = "AM", raw_data = FALSE)