1/4/2018 - How To Denormalize (Join) Datasets

How to Join Data in Flow

The join function is an essential operation in data analytics. If you are working with data, chances are at some point you are going to have to join two tables together.

Flow's Turing-complete computing framework provides a set-theoretic denormalization engine which enables multiple datasets to be linked and flattened easily. To execute joins in the system, you use the denormalization workflow function.

In this blog post, I will demonstrate how to configure and implement the denormalize function to join multiple datasets together.

The example we will explore in this post will focus on sample products and orders data loaded from various delimited files. Each of these files will have a unique identifier data point which relates the sets to one another.

We will see an example of how to load these files into Flow and use the denormalization function to join the disconnected datasets together. The denormalization process will flatten the separate files into a single master data set for analysis.

Once we have joined the datasets together, we will compute hypercubes across the flattened data set and perform some data summarization.

If you do not have a Flow account - register here. Flow is free for personal use.

The video below shows a worked example of how to implement the denormalization function in Flow. The video provides a walk-through of how to perform the joins and hypercube computation scenario outlined above. Check out the video here: