This vignette goes through the basics of getting started with
The basic setup is documented on the home page of adobeanalyticsr.com, which reviews the details for:
AW_CLIENT_SECRETenvironment variables (the recommended mechanism for this is through the
.Renvironfile). If these environment variables are not available, then the
aw_token()need to be populated with the client ID and secret.
AW_REPORTSUITE_IDenvironment variables (again, this is recommended to be done via a
adobeanalyticsr functions require a company ID (the Adobe account being accessed) and a report suite ID (the specific report suite to pull data or information from) to run. As such, these are arguments (
rsid) within those functions that default to the values for the
AW_REPORTSUITE_ID environment variables.
This approach has some important ramifications:
AW_REPORTSUITE_IDenvironment variables means you will not have to set them on every function call, which can make for shorter and cleaner code.
rsidfrom what is specified as an environment variable, all you need to do is call them explicitly within the function calls. The values you specify for either of these (or both of them) directly in a function call will take precedence over any environment variables that are set up (this is core R behavior: function arguments often have default values built into them, but any value specified in the code will be used in place of those defaults; this is not anything unique to
AW_REPORTSUITE_IDenvironment variables created and you do not specify values for them in the
adobeanalyticsrfunction calls, then the functions will fail.
In this vignette, both
AW_REPORTSUITE_ID environment variables have been set up (but not shown).
Once the package is loaded, authenticate with
aw_token(). If you have not authenticated ever, or if you have not authenticated within the last 24 hours, then a browser should open requiring you to log in to your Adobe account, and it will then redirect you to a web page that displays a lengthy token that you can copy and paste back into an Enter your authorization code: prompt in the R console.
The token will look something like this (with different letters and numbers and characters):
Confirm that the authorization worked by running
get_me(), which will return two things if the authorization was successful:
Your data is now available!
globalCompanyId) and company name (
companyName) for all of the companies (accounts) to which you have access based on the login you used when you called
Because the Adobe Analytics API works with the variable and segment IDs rather than the plain English names of dimensions, metrics, and segments, it is often useful, at least in the initial development of a project, to create data frames that contain all of the available values for these three types of variables.
dims_df <- aw_get_dimensions() #> Auto-refreshing stale OAuth token. #> Warning: Unable to refresh token: unauthorized_client #> Request failed . Retrying in 1 seconds... #> Auto-refreshing stale OAuth token. #> Warning: Unable to refresh token: unauthorized_client #> Request failed . Retrying in 1 seconds... #> Auto-refreshing stale OAuth token. #> Warning: Unable to refresh token: unauthorized_client #> Error in aw_call_api(req_path = urlstructure, debug = debug, company_id = company_id): Unauthorized (HTTP 401). head(dims_df, 10) %>% select(id, name, type, category) %>% kable()
|averagepagetime||Time Spent on Page - Bucketed||ordered-enum||Metrics|
|browserheight||Browser Height - Granular||int||Audience|
|browserheightbucketed||Browser Height - Bucketed||ordered-enum||Audience|
|browserwidth||Browser Width - Granular||int||Audience|
|browserwidthbucketed||Browser Width - Bucketed||ordered-enum||Audience|
|campaign||Tracking Code||string||Traffic Sources|
|campaign.1||Campaign Source||string||Traffic Sources|
|campaign.2||Campaign Medium||string||Traffic Sources|
This data frame includes:
[classified variable].[num]and they have a non-
parentcolumn–not shown above–has the name of the classified variable)
The data frame can then be searched and filtered for specific dimensions for use in subsequent data calls.
|averagepagedepth||Average Page Depth||int||Traffic|
|averagetimespentonpage||Average Time Spent on Page (seconds)||int||Traffic|
|averagetimespentonsite||Average Time Spent on Site (seconds)||int||Traffic|
|averagevisitdepth||Average Visit Depth||int||Traffic|
|campaigninstances||Campaign Click-throughs||int||Traffic Sources|
This data frame includes:
aw_get_metrics() does not return calculated metrics, so those require a separate function call.
|cm300003965_557fc577e4b07e827d177ad0||Crash Rate (Mobile)||percent|
|cm300003965_557fc577e4b07e827d177ad3||Average Session Length (Mobile)||time|
|cm300003965_557fc578e4b0094eea4f5201||Single Access (Calculated)||decimal|
|cm300003965_557fc578e4b0094eea4f5204||Crash Rate (Mobile)||percent|
|cm300003965_557fc578e4b0416196926008||Average Session Length (Mobile)||time|
|cm300003965_557fc656e4b0adadba08f337||Avg. Order Value||currency|
|cm300003965_557fc656e4b0094eea4f58b6||Average Order Value||decimal|
|cm300003965_557fc790e4b0094eea4f6222||Crash Rate (Mobile)||percent|
|cm300003965_557fc790e4b0416196926fac||Average Session Length (Mobile)||time|
Calculated metric IDs start with
cm and are assigned by Adobe when the calculated metric is created.
These two metrics data frames can be searched and filtered for specific metrics for use in subsequent data calls.
The default limit for the number of segments returned by the
aw_get_segments() function is 10, so, depending on the number of segments you have, the limit value can be increased to return a complete list of segments.
Similar to the dimensions and metrics,
aw_get_segments() returns a data frame of available segments that can be searched and filtered to identify the specific
id values to use in subsequent data calls.
Just as in Analysis Workspace, the freeform table is the workhorse of
adobeanalyticsr, and, as such, is the focus of the rest of this vignette.
The following is a breakdown of visits by device category for the last 30 full days for the report suite that is specified in the
AW_REPORTSUITE_ID environment variable.
# Specify a start date and end date. These can be specified as Date # objects or as string objects in YYYY-MM-DD format. start_date <- Sys.Date() - 30 end_date <- Sys.Date() - 1 df <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = "mobiledevicetype", metrics = "visits") #> A total of 3 rows have been pulled. # Output the results as a formatted table df %>% kable()
Note that the
top argument for
aw_freeform_table() defaults to
5, so only the top 5 values will be shown unless that argument is passed a higher value.
Getting multiple metrics is simply a matter of passing a vector to the
metrics argument rather than a single string:
The example above used standard metrics, but custom metrics (e.g., “event10”) and calculated metrics (e.g., “cm300003965_557fc578e4b0094efa4f5204”) can also be included in the vector of metrics as needed.
The default for the function is to return the API field names, but the “pretty names” can be returned instead by setting the
prettynames argument to
|Mobile Device Type||Visits||Page Views||Unique Visitors|
While the pretty names are, indeed, “prettier,” they can add downstream complexity to the code. And, since custom metrics and calculated metrics can have their names changed at any point, if subsequent code references columns using these pretty names, the code is subject to break in the future if the metric(s) it references gets renamed. So, it is generally considered a best practice to work with the API field names (
prettynames = FALSE) and then only convert to more readable names at the point of the data being presented to the end user.
What is time? This could be a deeply philosophical question, but, instead, we’ll treat it as a pragmatic one: time is a dimension. It’s just a dimension that gets some special treatment in this package.
The first thing to note is that the
id values for time values are all prepended with
To get trended data for one or more metrics, simply use the appropriate date dimension as a dimension. The first “special” thing that happens here is that you don’t need to worry about setting the
top argument to include all of the values. The package will assume that you want to include all of the date values in the range and will do this automatically.
What will not (yet) be done automatically is for the package to return the results ordered by the date value (it will sort them by the first metric or whatever is specified in the
metricSort argument), so we’ll need to do that sorting with the results:
To include a non-date dimension and specify a
top value for that dimension while also returning all of the date values, use
0 in the position of the date dimension when specifying the
The following code breaks down each day by Mobile Device Type, but only includes the top 2 values for each breakdown:
df <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = c("daterangeday", "mobiledevicetype"), metrics = "visits", top = c(0, 2)) %>% arrange(daterangeday) #> Estimated runtime: 24.8sec./0.41min. #> 1 of 31 possible data requests complete. Starting the next 30 requests. #> Request failed . Retrying in 6 seconds... #> A total of 60 rows have been pulled. # Output the first few rows as a formatted table head(df, 10) %>% kable()
daterange... dimension is not the first dimension (and, for speed reasons, it may make sense to not have it there, as is discussed in the next section), the
0 can simply be re-located:
df <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = c("mobiledevicetype", "daterangeday"), metrics = "visits", top = c(2, 0)) %>% arrange(mobiledevicetype, daterangeday) #> Estimated runtime: 2.4sec./0.04min. #> 1 of 3 possible data requests complete. Starting the next 2 requests. #> A total of 60 rows have been pulled. # Output the first few rows as a formatted table head(df) %>% kable()
Working with multiple dimensions is much easier once you understand two fundamental aspects of the 2.0 API, which may seem contradictory at first:
The trick to reconciling these two statements is that any “single dimension” call can be filtered by an unlimited number of other dimensions.
A thought experiment to explain how this works is to imagine that you are in Analysis Workspace (or, actually go into Analysis Workspace and try this directly to make it a real experiment rather than an experiment of the mind):
Each drag-and-drop with your mouse triggers a new API call. To build a freeform table that has Marketing Channel broken down by Mobile Device Type might look something like the following:
At this point, you’re cursing the constraint of not being able to use the
<Shift> key! But, each of those steps is actually an API call that Analysis Worskpace is making to the 2.0 API:
These API calls all happen relatively quickly (wildly faster than API calls using the older v1.4 API), but they still all have to happen, and they happen one after another (serially rather than in parallel).
To push this experiment one step farther, think about what you would have to do if you wanted to drill down to a third dimension: Entry Page. You would have to repeatedly drag the Entry Page dimension onto each of:
For the first call above, the API call for the single dimension Entry Page filtered to only include results where Marketing Channel = Direct AND Mobile Device Type = Mobile Phone.
To build a freeform table in Analysis Workspace that has three dimensions fully broken down would require dozens of drag-and-drop actions! It’s possible that that tedium is one of the reasons that you’re looking to
adobeanalyticsr in the first place. As well you should!
aw_freeform_table() handles these multiple API calls for you!
On the one hand, all of the exposition above may seem like overkill, because all you have to do with
aw_freeform_table() to pull multiple dimensions is to…pass a vector of dimensions to the
Where things get a little trickier is when it comes to specifying how many rows to include for each dimension level, and, more importantly, for constructing the function calls so that they run as quickly as possible. To illustrate, we’ll explore a scenario where we want to get Visits broken down by Mobile Device Type and Browser.
First, let’s query each of the dimensions independently to see how many unique dimension values we’re dealing with. This isn’t something that is required, but we’re going to do a little math to illustrate the differences that dimension order can make.
device_types <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = "mobiledevicetype", metrics = "visits", # The default of 5 is probably going to get all of them, # but set a higher cutoff just in case. top = 10) #> A total of 3 rows have been pulled. browsers <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = "browser", metrics = "visits", # We want to get all of the browsers, so set top as a high # value rather than the default of "5" top = 1000) #> A total of 113 rows have been pulled.
When making our actual call to get both dimensions at once, we have two options for the
dimensions argument value:
dimensions = c("mobiledevicetype", "browser")
dimensions = c("browser", "mobiledevicetype")
Assuming we set
top appropriately to include all possible values, we should get the same data, ultimately. But, the number of API calls required behind the scenes and, therefore, the time it will take for the function to run, will vary quite a bit between these two!
dimensions = c("mobiledevicetype", "browser"), the API calls will be:
mobiledevicetypevalues to get each value broken down by
This will result in a total of 4 API calls.
Let’s try it out, including logging how long the function takes to run.
start_time <- Sys.time() df <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = c("mobiledevicetype", "browser"), metrics = "visits", top = c(10, 1000)) #> Estimated runtime: 8.8sec./0.15min. #> 1 of 11 possible data requests complete. Starting the next 3 requests. #> A total of 125 rows have been pulled. end_time <- Sys.time() # Show the summary for the resulting df for comparison to the next approach. summary(df) #> mobiledevicetype browser visits #> Length:125 Length:125 Min. : 1.00 #> Class :character Class :character 1st Qu.: 1.00 #> Mode :character Mode :character Median : 3.00 #> Mean : 56.66 #> 3rd Qu.: 11.00 #> Max. :3488.00 # Output how long it took to run the query end_time - start_time #> Time difference of 4.994079 secs
Now, instead, let’s consider what happens if we swap the order of our dimensions and, instead, use
dimensions = c("browser", "mobiledevicetype"). Now, the API calls will be:
browservalues to get each value broken down by
This will result in a total of 114 API calls!!!
Let’s try it out:
start_time <- Sys.time() df <- aw_freeform_table(date_range = c(start_date, end_date), dimensions = c("browser", "mobiledevicetype"), metrics = "visits", top = c(1000, 10)) #> Estimated runtime: 800.8sec./13.35min. #> 1 of 1001 possible data requests complete. Starting the next 113 requests. #> A total of 125 rows have been pulled. end_time <- Sys.time() # Show the summary for the resulting df for comparison to the next approach. summary(df) #> browser mobiledevicetype visits #> Length:125 Length:125 Min. : 1.00 #> Class :character Class :character 1st Qu.: 1.00 #> Mode :character Mode :character Median : 3.00 #> Mean : 56.66 #> 3rd Qu.: 11.00 #> Max. :3488.00 # Output how long it took to run the query end_time - start_time #> Time difference of 1.615242 mins
This is a BIG difference in run-time even though, aside from the column order being slightly different, the resulting data is identical.
If you followed the Analysis Workspace experiment in the previous section, then another way to think about this is that (without the
<Shift> key), breaking down Mobile Device Type by Browser would require a lot less clicking and dropping than breaking down Browser by Mobile Device Type. In the latter, you would have to drag Mobile Device Type onto each of the numerous Browser values one at a time!
The good (great?) news is that you can string as many dimensions together as you would like and then let R just take it from there and get a resulting data frame with multiple dimensions! The order of your dimensions just can have a dramatic effect on how long you have to wait for the results to be returned.