Many business decisions require analysis of data from transactional, operational, and other data sources. For example, an SEO analyst may want to compare two versions of a web page to determine which leads to more conversions (commonly referred to as A/B testing or split testing). The analyst may wish to explore the relationship between page version A, page version B, and weekday to determine if any combination of these result in greater conversions. To perform this analysis, the analyst must aggregate sales and website data then group it into two categories: dimensions and measures. The analyst must then calculate and compare summary measures for each combination of these to determine if a statistically significant difference exists.
The above is very simple example of multidimensional analysis. It is also an example of analytics-oriented processing, a term used to describe analytics oriented activities such as gathering data, classifying and summarizing it, performing calculations on it and, finally, analyzing and presenting the results.
Flow is a platform for building and running workflows that automate analytics-oriented processes. The Flow platform provides end-to-end analytics-oriented processing via three primary components: datasets, hypercubes, and workflows (see sidebar).
In this post, I'll provide a simple example of multidimensional analysis using Flow. I'll show a step-by-step walk through that loads a set of data, builds a hypercube, performs summary calculations and, finally, produces several pivot table results using the summary calculations. This example will also provide a basic illustration of analytics-oriented processing.
Note - The data used in this example is one of the sample datasets available to registered users of Flow. To try this example, just add the sample data to your Flow account then follow the steps outlined below.
Using Flow to Perform Multi-Dimensional Analysis
Some Definitions - Datasets, Measures, Dimensions, and Hypercubes
A dataset is simply a collection of related data. Analytical datasets often consist of one or more columns grouped into one or more rows. A column represents a named subset of data elements of a particular data type (for example string, integer, or date). Each column element contains a data point value allowed by its data type, and each row contains one of each column.
Measures and Dimensions
A measure, also referred to as a fact, represents an observed, derived, or recorded quantity such as a count, distance, or ratio.
A dimension is a property that can be used to group and classify measures. Dimensions are often date/time, geographic, or demographic values. For example, day, year, city, state, and male or female. When naming dimensions, it is good practice to:
- Use meaningful singular nouns or present tense verbs
- Use names that are descriptive and self-documenting
- Use names that are easily distinguishable
Performing multidimensional analysis on two-dimensional data sets is exceedingly complex. Flow provides a hypercube data structure that handles the underlying complexity of organizing and managing measures and dimensions. Hypercubes also facilitate and optimize any computational operations applied to measures.
The Sample Data
Our sample data is intended to represent a simple (and admittedly contrived) set of A/B test data. It contains five data points: Visit, Conversion, Day, Year, and Site. There are two measures: Visit and Conversion, and three dimensions: Day, Year, and Site. Also, each of our dimensions has a set of unique members values; for example, Year has the member values 2016 and 2017, and Site has member values A and B.
Primary Components of Flow
Dataset - A dataset is a self-describing, generic data container designed to hold data from any external source.
Hypercube - A data container designed to hold, and provide optimized access to, multidimensional data.
Workflow - A user defined series of steps that perform analytics-oriented processing.
Note - The term workflow, as used in the context of Flow, requires a bit more explanation. In Flow, a workflow is a high-level functional language optimized for analytics-oriented processing. Workflow actions, functions, and expressions provide a level of abstraction beyond lower level languages such as R, SQL or Python. This layer of abstraction enables non-technical end users to perform analytics-oriented processing via a series of user- configured actions.
Sample A/B Test Data
Walk-Through: A Four-Step Analytics Workflow
In this hands-on example, I'll walk through the development of a four step workflow that will do the following:
- Load the sample dataset.
- Build a hypercube from the sample dataset.
- Apply hypercube expressions to compute sums, averages, and also find the maximum number of Conversions for any given Site, Day, or Year or combination there of.
- Create and save a pivot table report result.
To get started, you'll need to add the sample data to your Flow user account.
Add Sample Data
After logging into Flow, you'll first need to add the "Sample AB Test Data" dataset to your Flow account. Click on the down arrow button in the top menu bar then click on the sample data icon to open the Add Sample Dataset dialog, as follows:
Open Cloud Connect
Flow provides an integrated development environment for building analytics-oriented workflows called Cloud Connect.
Cloud Connect is a ClickOnce deployed desktop application for building and managing workflows. To launch Cloud Connect, open the Workflows menu in the left sidebar of the Flow portal then click the Launch Cloud Connect, see below:
Depending upon your browser, you may need to add an extension to enable ClickOnce and launch Cloud Connect. Click on the View Cloud Connect Requirements link beneath the Launch Cloud Connect button to learn the requirements for your particular browser.
If this is your first time using Cloud Connect, it may require a minute or so to install. Once the installation has completed, the application will open automatically.
Add a New Workflow
To get started, we'll need to add a new workflow. From the top menu, click the Add Workflow button. The Add New Workflow dialog will appear as shown below:
Enter the information as shown above then click OK. The new workflow will appear in the Workflowslist on the left-side of your screen. To open the workflow for editing, simply double click its name. Alternatively, you can right-click the name and then select Open from the context menu.
Add Workflow Steps
From the Workflows list, open the workflow for editing by double-clicking its name or right-clicking the name then selecting Open.
Step 1 - Add a Load Data Workflow Step
Add a Load Dataset step by clicking on the menu and selectingLoad Dataset from the drop-down menu, the Load Dataset Dialog will display as follows:
Click OK to add the Load Dataset workflow step.
The Cloud Connect development environment provides a run-time engine that allows users to run workflow steps and view results. To run the workflow now. Click the run button in Workflow Menu, the following prompt will appear:
Click Yes or press Enter to run the workflow. When the workflow completes, the Sample AB Test Data dataset will display under the Working Data tab, see below:
Step 2 - Add a Build Hypercube Step
With our working data loaded into memory, we can now add a Build Hypercube step. From the Actions drop-down menu, select Hypercubes then Build Hypercube. The Build Hypercube dialog box will appear as shown below.
After clicking OK, a Build Hypercube workflow step will be added to the workflow Action List.
Run the workflow again from the start by clicking the button in Workflow Menu bar and answering Yes when prompted. The AB Test Cube will be created and displayed under the Working Data tab, as shown below:
Note - The Working Data tab displays a flattened, or two-dimensional, view of a hypercube to make it easier to read.
Step 3 - Add a Hypercube Expression Evaluation Workflow Step
At this point, we've loaded our Sample AB Test Data dataset (a dataset loaded into Flow is called working data) and built a hypercube from it. We now want to compute summary values for specified measures in the AB Test Cube. We can specify measures using Hypercube Expressions. Flow provides an extensive library of functions and operations for computing summary values across hypercube dimensions. We are going to add an Expression Evaluation step that will sum the Visit and Conversion measures across each of their dimensions.
From the Actions drop-down menu, select Expression Builder, the Expression Builder dialog will load, see below:
When you are done adding expressions, click OK. The Expression Evaluation workflow step will be added to the workflow steps list.
Run the workflow again and view the results. Click the button in the workflow menu bar and answer Yes to the prompt. When the workflow completes, the working data tab will display a flattened view of hypercube which includes our expression evaluation results, see below:
Here are the workflow steps we have added thus far:
In just three steps we've loaded our data, created a hypercube, and computed summary values for a set of measures.
We now want to view and/or share the results of our effort. To accomplish this, we'll add a Result Step to our workflow. A Result Step creates and updates workflow results, which include datasets, reports, and charts, among others. Results can be used to create dashboards, delivered to internal or external users via email or subscription, or as input to other workflows or external applications.
Step 4 - Add a Pivot Table Result Step
We will add a workflow step that generates a Pivot Table Result showing Total Visits by Day and Site.
From the Results drop-down menu, select the Pivot Table menu item to display the Pivot Table Result dialog box, as shown below:
Pivot Table Result Viewed in the Flow Portal
This post provides a basic, hands-on introduction to multidimensional analysis using Flow. A brief introduction to multidimensional analysis and analytics-oriented processing was given using the analysis of A/B testing web pages as an example. A definition datasets, measures, dimensions, and hypercubes is provided as background. Finally, a hands-on, step-by-step working example of a four-step analytics workflow is developed.