Flow Data Automation & Analytics Blog

Doing Data Quality With Flow

In this article, I provide an introduction to measuring and evaluating data quality using Flow. I briefly discuss data quality dimensions and data quality assessment. Then I examine how a schema-on-write approach increases the time and cost required to assess data quality along with a brief discussion of schema-on-read technology. I then introduce Flow's "Generic Data" technology as a solution to the deficiencies of schema-on-write and schema-on-read for data quality. Finally, I provide a hands-on working example of doing data quality in Flow using some sample name and address data.

How to Perform a Cognitive Keyword Extraction

This post demonstrates how to perform a cognitive keyword extraction against natural language text data in Flow. In this worked example I show how to use the artificial intelligence actions to process unstructured text values. The artificial intelligence actions are used to deduce all important keywords, analyze sentiment towards those keywords, and compute emotion distribution scores for each string extracted from the natural language text. The concepts examined in this post teach a powerful technique which can be used to develop advanced cognitive workflows against any data source.

How To Use Flow + Watson AI for SEO Keyword Research

This article demonstrates how to use artificial intelligence in order to perform keyword research for search engine optimization. Watson cognitive actions are leveraged in order to decompose keywords from competitor websites. Keywords are compiled into a dataset in order to provide better insight into potential seo strategy.

How to Deduplicate a Dataset

This blog post demonstrates how to identify and remove duplicate records from a dataset. A worked example is provided which shows how to configure and implement the deduplicate function against some sample customer data. The deduplicate function is a key action which allows the workflow developer to create rich data validation and transformation rules.

How To Denormalize (Join) Datasets

This blog post demonstrates how to configure the denormalize function in order to join disconnected data sets together. A worked example is provided which shows how to import and merge various delimited files. The denormalize action is used to join the data from the seperate files together in order to consolidate them into a single set for analysis. Once the data is joined, we learn how to use hypercubes to aggregate and summarize the data.

How to Perform a Word Count Analysis

This article demonstrates how to perform a word count analysis in Flow. In this blog post, I provide a worked example showing how to take in unstructured natural language data and compute a unigram language model against that data. The result of the language analysis returns a new profile data set which holds each unique token present in our natural text and the count of times each word occurred. This blog post teaches a quick one-step technique for doing an initial exploratory analysis on natural text data.

How to Analyze Blank / Missing Values in a Dataset

In this blog post, I provide a worked example demonstrating how to perform an analysis of blanks on a target dataset. When analyzing data a typical first step is to get an understanding of where there are missing values. Identifying where there are missing values in your data can help you make more informed decisions about your analysis approach.

How to Import and Analyze MS Access Data

This blog post provides a worked example of how to import and analyze Microsoft Access Data. We learn how to use the Access Database integration interface to consume the sample northwind database into Flow. A step-by-step walkthrough is provided which details how to denormalize the various relational tables into a consolidated flattened set for analysis. We learn how to apply generic expressions to compute new data points on the fly. Finally, we learn how to leverage Flow's multidimensional analysis engine to compute hypercubes and summarize the data.

How to Import and Analyze JSON Data

In this blog post, I provide a worked example demonstrating how to import and analyze data from JSON based sources. Flow allows for the consumption of JSON data into a tabular form for analysis without requiring any knowledge of structure or schema. I demonstrate how to leverage this functionality to read and flatten JSON from a web-based resource into a dataset. I then show how to apply transformations to the data by using the expression builder to calculate new data points on the fly. I show how to compute hypercubes against the flattened data and perform a simple language analysis, highlighting the ability to wrangle and analyze the data. Finally, I demonstrate how to export the transformed data to various file formats allowing us to persist the flattened set for use elsewhere.

How to Use Flow + Artificial Intelligence to Analyze the News

In this blog post, I provide a worked example demonstrating how to design a workflow which extracts and analyzes cryptocurrency news articles using artificial intelligence. I explain how to use the HTML integration interface to extract links for all top news stories from a target website into data. I show how to use generic expressions to transform and clean the raw links, preparing them for processing. Flow is used to loop through each of the structured links and invoke the built-in Watson artificial intelligence functions to perform advanced cognitive analytics against the text of each news article. Flow collects the results of the cognitive analysis and compiles an aggregate dataset of sentiments, emotions, concepts, topics, keywords, and named entities for all of the supplied articles. I finish the example by showing how to compute hypercubes against the cognitive output to summarize the results and generate various multidimensional views.

A Reusable Benford Analysis Flow

In this post, we build a reusable eight-step Flow that performs a basic Benford's Analysis on a sample data set. This Flow loads the sample data set then obtains the first digit from each observation, builds a hypercube and uses it to count the first digits, extracts a dataset containing the distribution and, finally, computes the expected distribution and compares it to the observed distribution by taking the difference.

A Basic Introduction to Multidimensional Analysis Using Flow

This article presents a basic introduction to multi-dimensional analysis and analytics-oriented processing using Flow. It discusses datasets, measures, dimensions, and hypercubes; then it provides a step-by-step example of building a workflow to analyze some fictional A/B test data.

Introduction to Building Dashboards in Flow

Flow enables you to build dashboards containing a variety of elements including tables, charts, reports, and data summaries, among others. This post focuses on two methods you can use to build, populate, and update dashboards. I show how to add a new dashboard, then how to create and add chart result using one of the sample datasets provided. Next, I provide an in-depth discussion of adding workflow generated results to a dashboard.

Tables and Pivot Tables in Flow

In this post, we'll build a six-step workflow that produces Pivot Table and Table results. It shows how to load data, use expressions to derive time-period values from a date field, build a hypercube using those time-period values as dimensions and, finally, how to create and view pivot table and table results using the hypercube.

Grouped Reports in Flow

Here is the second in a series of posts focusing on building reports in Flow. A grouped report is an advanced report produced by Flow. Grouped Reports organize records into one or more nested groups where each group is is a collection of records with a common column data value. There are two basic methods you can employ to create grouped reports in Flow. The first is to add a Grouped Report action to a new or existing workflow. The second way is to open a hypercube within the Flow portal then click on the report icon Create Report button in the toolbar located at the top of the hypercube view. This post will cover the first method.

Tabular Reports in Flow

Flow enables you to build many types of reports, such as tabular, grouped, pivot tables, tables, and data summaries. This is the first in a series of posts focusing upon building reports in Flow. You can learn more about these different types of reports in the Flow online help. A tabular report is the most basic type of report you can build in Flow, it is organized in a multicolumn, multirow format, with each column corresponding to a column in a dataset.

Cognitive Computing in Flow

This post provides a hands on introduction to cognitive computing applications in Flow. It introduces the IBM Watson cognitive actions for unstructured text analytics a...