Transform and Cleanse Data Overview

You need to acquire, transform, cleanse, and enrich data that exists across many different data sources or applications. These data sources include flat file sources, databases, CRM systems, web APIs and web applications, social platforms, and other specialized business systems. You need to consolidate, prepare, and enforce integrity rules which reshape this data for your specific objectives. The transformed data needs to be processed through a highly custom analysis workflow to deliver results and insights critical to your operations.

With Flow Analytics workflows you can design a solution to any data cleansing or transformation challenge. Flow can universally acquire data from any source, structure, or format and provides a robust data transformation and preparation engine which allows you to apply any integrity rule or cleansing action to your data. Data can be linked, joined, and analyzed through a powerful toolbelt of hypercubes, AI, machine learning, predictive algorithms, and more. Once you develop your workflow solution, it can be deployed to Flow's agent framework to fully automate the data cleansing and transformation process so that you never have to do the work again.

Get Started With Flow Analytics Workflows

Completely free - no trial period or credit card required.

Fully functional - access all the core features of the Flow platform.

Develop custom workflows - use our advanced, configure-not-code development environment to build custom solutions.

Free help and support - full documentation and how-to videos along with free online, telephone or email support.

Example Data Transformation and Cleansing Data Workflow

Workflow Example 1 - Cleaning Customer Data

This workflow demonstrates an example data cleansing and transformation solution developed through Flow. First, we import dirty customer data from various flat file data sources. The datasets have numerous data quality issues that we need to address to prepare it for analysis. These issues include a lack of standardization, missing values, and data misalignments which cause for inconsistencies that obstruct our business objectives. Workflow actions are configured to apply sequences of transformations which correct these data quality issues. We apply operations to create new data points through various computations and expressions. The cleansed data is finally summarized into a hypercube report and delivered to the cloud for distribution.

Watch Video

Data Transformation and Cleansing Actions Overview

Action Type Description  
Build Hypercube The Build HyperCube function creates a multi-dimensional analysis object from a target data collection. The HyperCube allows for powerful data summariziation and multi-dimensional statistics to be computed. view actions →
Hypercube Include The HyperCube Include function promotes an internal hypercube data point to the surface of the hypercube groupings. view actions →
Slice Hypercube The Slice HyperCube function subsets a portion of a hypercube into a target result data collection. view actions →
Hypercube Expressions HyperCube Expressions allow for the design of multi-dimensional statistical computations across the hypercube analysis space. HyperCube Expressions allow for data to be summarized across many different dimensions at once and for computation to be weaved in and out of the hypercube. view actions →
Add Data Points The Add Data Points function allows you to add new data points of a target type with an optional default value to a target working data collection. view actions →
Remove Data Points The Remove Data Points function removes target data points from a target data collection as part of a workflow. view actions →
Remove Blank Values The Remove Blank Values function isolates and removes records from a target data collection based on an evaluation condition checking if a set of data point values are blank or missing. view actions →
Replace Data Point Values The Replace Data Point Values function performs a conditional replace operation to replace the values in a target data point within a working data collection view actions →
Rename Data Points The Rename Data Points function performs a rename of data points in a target working data collection as part of a workflow. view actions →
Copy Data Points The Copy Data Points function copies the values of a target data point into another data point. The Copy Data Point function can be used to copy conditionally into another data point only when the values in the copy into data point are blank or null. view actions →
Change Data Types The Change Data Types function performs a type conversion on the values a target data point as part of a workflow. view actions →
Index Data Points The Index Data Points function creates an index data point which assigns either an integer or guid unique identifier as values to each record in a target data collection. view actions →
Categorize Data The Categorize Data function assigns a data record to a classification grouping based on a set of logical conditions, fuzzy match criterias, text similiarity evaluations and other classification rule criteria. view actions →
Filter Data The Filter Data function allows you to create compound filter expressions which isolate records that meet a set of conditional evaluations. The results of the filter can be stored off into a new data collection or the original data collection can be updated to hold the result. view actions →
Top N Data Filter The Top N Data Filter function allows you to filter the top or bottom N or N% of values from a target data point. This function can be used to generate Top N reports or identify outliers in a target data collection. view actions →
Index Based Row Filter The Index Based Row Filter function allows you to filter out records from a target data collection based on their position using a given start and end index. view actions →
Sample Data The Sample Data function allows you to perform a random sample of records in a target data collection as part of a workflow. view actions →
Pop Chunk The Pop Chunk function is used to pop the top N values off of a target generic data collection and append the records into a result data collection. view actions →
Profile Blank Data The Profile Blank Data function creates a profile of the missing values across all data points in a target data collection. The function evaluates the count of missing values, % of missing values, and distribution of missing values across each data point in a target collection. The result of the Profile Blank Data function is stored as a data collection into your working data container to support further analysis. view actions →
Language Summary The Language Summary function creates a word count language model profile for a target text data point. The Language Summary function can be used to summarize the frequency of the occurances of words across all values in a target data point as well as the frequency of the occurance of joint word pairs. The result of the Language Summary function is stored as a data collection into your working data container to support further analysis. view actions →
Bin Data Iems The Bin Data Items function clusters rows into different "bins" of a fixed number of elements. These bin groupings can be used to sample and summarize data as well as for partionining operations across large data sets. view actions →
Denormalize Data The Denormalize Data function allows for the joining and subtraction of data sets by performing a match operation on a set of single or compound keys. The Denormalize Data operation can join datasets together as well as remove records of data in a parent data collection which exist in the child. view actions →
Perform Lookups The Lookup function allows you to perform a targeted join operation which looks up data points from a child data collection into a parent data collection based on matching across single or compound keys. view actions →
Append Datasets The Append Datasets function allows you to append and merge two datasets together by stacking the "Append From" data collection onto the bottom of the "Append Into" data collection. The Append Function supports jagged generic data append operations. view actions →
Add Running Total The Add Running Total function computes a moving sum downwards across a target numeric data point value. The result values of the cumulative summation operation are stored in a target result data point. view actions →

Additional Data Transformation and Cleansing Information & Resources

Blog Posts
Post Description  
A Powerful Way to Normalize / Flatten Any JSON Data for Analysis Flattening JSON data is often a difficult task. In this blog post, I demonstrate how to consume any JSON based data source into structured data for analysis. This blog post focuses on using the normalization adapter to automatically create relational tables from web and file-based JSON based sources. The technique outlined in this blog post will allow you to integrate tens of thousands of data sources on-demand and with no code. view post →
A Quick Data Cleansing Tutorial This blog post provides a worked example of an end-to-end data cleansing workflow. This blog post demonstrates how to load in dirty data, analyze the different data quality issues present, apply series of transformations to correct the data quality issues, and how to build a hypercube to summarize the tidy data. view post →
An End-to-End Automated Data Analytics Solution This blog post provides a worked example of an end-to-end automated business intelligence solution. This blog post demonstrates how to load data from different sources, join the data together, cleanse and correct data quality issues, compute hypercubes, and design a dashboard. I then demonstrate how to deploy the solution to the autonomous agent framework for continuous execution. view post →
How to Import and Analyze Common File Data Sources In this blog post, I provide a worked example demonstrating how to import and analyze different types of file-based data sources. File-based data sources are ubiquitous in business data analytics. This blog post focuses on how to work with data stored in delimited files, Excel workbooks, and XML documents. I provide an overview of each of these three data source types as well as a detailed explanation of the challenges of working with XML. I demonstrate how Flow is capable of automatically normalizing any XML irrespective of hierarchical complexity into structured data sets for analysis. I explain the advantages of the Flow approach to XML over traditional alternatives such as XPath. I finish the example by demonstrating basic analytics against the consolidated data by constructing and computing across various hypercubes. view post →
Flow Crash Course - Part 2 - Basic Data Point Actions This is the second blog post in our crash course series on Flow. In this blog post, I provide an introduction to Basic Data Point Actions in the Flow Computing Framework. view post →
Denormalize - Join Datasets using Flow Analytics This blog post demonstrates how to configure the denormalize function to join disconnected data sets together. I provide a worked example that shows how to first import then join a collection of delimited files. After denormalizing the data, I show how to build and use a hypercube to aggregate and summarize the data. view post →
How to Perform a HyperCube Time Series Analysis This blog post provides a worked example of a hypercube time series analysis. Date expressions are used to extract and compute summary time dimensions from a raw datetime data point value. A hypercube is built on the extracted datetime dimensions. Time series computations are then performed on the hypercube to summarize changes in values over time. view post →
Flow Crash Course - Part 4 - Basic Data Summary Functions This is the fourth blog post in our crash course series on Flow. In this blog post, I provide an introduction to Basic Data Summary Functions in the Flow Computing Framework. view post →
How to Deduplicate a Dataset This blog post demonstrates how to identify and remove duplicate records from a dataset. I provide a worked example showing how to configure and implement the deduplicate function against some sample customer data. The deduplicate function is an important action which allows the workflow developer to create rich data validation and transformation rules. view post →
A Quick Introduction to the Five Types of Filters in Flow In this blog post, I provide an introduction to the five filter actions in Flow. Filter actions are functions which select a specific subset of records from a designated data collection based on some target match criteria. This blog post introduces the different types of filters and provides a comprehensive worked video example demonstrating how to configure and implement these filters against a sample data collection. The filter actions are some of the most elementary and integral operations in the Flow computing framework. Mastering the different types of filters is key to data processing, data analytics, and business intelligence workflow design. view post →
Flow Crash Course - Part 3 - Working Data Actions This is the third blog post in our crash course series on Flow. In this blog post, I provide an introduction to Working Data Point Actions in the Flow Computing Framework. view post →
Doing Data Quality with Flow Analytics In this article, I provide an introduction to measuring and evaluating data quality using Flow. I briefly discuss data quality dimensions and data quality assessment. Then I examine how a schema-on-write approach increases the time and cost required to assess data quality along with a brief discussion of schema-on-read technology. I then introduce Flow's "Generic Data" technology as a solution to the deficiencies of schema-on-write and schema-on-read for data quality. Finally, I provide a hands-on working example of doing data quality in Flow using some sample name and address data. view post →
How to Build Automated HyperCube-Based Business Intelligence Dashboards In this blog post I provide a worked example demonstrating the development of hypercube BI dashboards. I show how to compute hypercubes from raw data and use those hypercubes to create different views of our data. I then demonstrate the power of Flow's data visualization and pivot engine by creating components which allow for multiple levels of filtering and interactivity. I finish the example by designing an interactive dashboard with the results and showing how to distribute the finished reports across an organization. Finally, I cover how to deploy the developed workflow to Flow's agent framework to continuously and autonomously execute our reporting tasks on a schedule. view post →
Videos
Video Title Video Description  
Flow Tutorials - Contains Non Alphanumeric FunctionThis video provides a worked example demonstrating how to configure and implement the ContainsNonAlphanumeric operator in the expression builder. The ContainsNonAlphanumeric is used to evaluate if target string values contain non-alphanumeric characters.view video →
Flow Tutorials - Remove SymbolsThis video provides a worked example demonstrating how to configure and implement the Remove Symbols operator in the expression builder.view video →
Flow Tutorials - Remove Alphabetic FunctionThis video provides a worked example demonstrating how to configure and implement the Remove Alphabetic operator in the expression builder.view video →
Flow Tutorials - Contains Alphabetic FunctionThis video provides a worked example demonstrating how to configure and implement the Contains Alphabetic operator in the expression builder.view video →
Flow Tutorials - Contains Numeric FunctionThis video provides a worked example demonstrating how to configure and implement the Contains Numeric operator in the expression builder.view video →
Flow Tutorials - First N Characters FunctionThis video provides a worked example demonstrating how to configure and implement the First N Characters operator in the expression builder.view video →
Flow Tutorials - Remove Numeric FunctionThis video provides a worked example demonstrating how to configure and implement the Remove Numeric operator in the expression builder.view video →
Flow Tutorials - Validate Email FunctionThis video provides a worked example demonstrating how to configure and implement the Validate Email operator in the expression builder.view video →
Flow Tutorials - Trim FunctionThis video provides a worked example demonstrating how to configure and implement the Trim operator in the expression builder.view video →
Flow Tutorials - Default Blanks FunctionThis video provides a worked example demonstrating how to configure and implement the Default Blanks operator in the expression builder.view video →
Flow Tutorials - EndsWith FunctionThis video provides a worked example demonstrating how to configure and implement the EndsWith operator in the expression builder.view video →
Flow Tutorials - StartsWith FunctionThis video provides a worked example demonstrating how to configure and implement the StartsWith operator in the expression builder.view video →
Flow Tutorials - Substring to End FunctionThis video provides a worked example demonstrating how to configure and implement the SubstringToEnd operator in the expression builder.view video →
Flow Tutorials - Last Index Of FunctionThis video provides a worked example demonstrating how to configure and implement the Last Index Of operator in the expression builder.view video →
Flow Tutorials - Word Count FunctionThis video provides a worked example demonstrating how to configure and implement the Word Count operator in the expression builder.view video →
Flow Tutorials - Remove Non-Numeric FunctionThis video provides a worked example demonstrating how to configure and implement the Remove Non-Numeric operator in the expression builder.view video →
Flow Tutorials - Format Phone FunctionThis video provides a worked example demonstrating how to configure and implement the Format Phone Number operator in the expression builder.view video →
Flow Tutorials - Replace HTML Tags FunctionThis video provides a worked example demonstrating how to configure and implement the Replace HTML Tags operator in the expression builder.view video →
Flow Tutorials - Contains HTML Tags FunctionThis video provides a worked example demonstrating how to configure and implement the Contains HTML Tags operator in the expression builder.view video →
Flow Tutorials - Remove Non Alpha Numeric FunctionThis video provides a worked example demonstrating how to configure and implement the Remove Non Alpha-numeric operator in the expression builder.view video →
Flow Tutorials - Substring FunctionThis video provides a worked example demonstrating how to configure and implement the Substring operator in the expression builder.view video →
Flow Tutorials - Contains FunctionThis video provides a worked example demonstrating how to configure and implement the Contrains String operator in the expression builder.view video →
Flow Tutorials - Replace StringThis video provides a worked example demonstrating how to configure and implement the Replace String operator in the expression builder.view video →
Flow Tutorials - Index Of FunctionThis video provides a worked example demonstrating how to configure and implement the Index Of operator in the expression builder.view video →
Flow Tutorials - Compute Length of StringThis video provides a worked example demonstrating how to compute the length of a string using the Length operator in the expression builder.view video →
Flow Tutorials - ProperCase FunctionThis video provides a worked example demonstrating how to configure and implement the ProperCase operator in the expression builder.view video →
Flow Tutorials - To Upper FunctionThis video provides a worked example demonstrating how to configure and implement the To Upper operator in the expression builder.view video →
Flow Tutorials - To Lower FunctionThis video provides a worked example demonstrating how to configure and implement the To Lower operator in the expression builder.view video →
Flow Tutorials - Concat With Space SeparatorThis video provides a worked example demonstrating how to configure and implement the Concat With Space Seperator operator in the expression builder.view video →
Flow Tutorials - Data Point AdditionThis video provides a worked example demonstrating how to perform addition operations against two data points.view video →
Flow Tutorials - Data Point MultiplicationThis video provides a worked example demonstrating how to perform multiplication operations against two data points.view video →
Flow Tutorials - Year Difference Date ArithmeticThis video provides a worked example demonstrating how to take the difference in years between two datetime data points by using the Year Difference Date Arithmetic operator.view video →
Flow Tutorials - Month Difference Date ArithmeticThis video provides a worked example demonstrating how to take the difference in months between two datetime data points by using the Month Difference Date Arithmetic operator.view video →
Flow Tutorials - Week Difference Date ArithmeticThis video provides a worked example demonstrating how to take the difference in weeks between two datetime data points by using the Week Difference Date Arithmetic operator.view video →
Flow Tutorials - Day Difference Date ArithmeticThis video provides a worked example demonstrating how to take the difference in days between two datetime data points by using the Day Difference Date Arithmetic operator.view video →
Flow Tutorials - Extract Day of Week StringThis video provides a worked example demonstrating how to configure and implement the Extract Day of Week String operator in the expression builder.view video →
Flow Tutorials - Extract Month StringThis video provides a worked example demonstrating how to configure and implement the Extract Month String operator in the expression builder.view video →
Flow Tutorials - Extract QuarterYear String FunctionThis video provides a worked example demonstrating how to configure and implement the Extract QuarterYear String operator in the expression builder.view video →
Flow Tutorials - Extract Quarter String FunctionThis video provides a worked example demonstrating how to configure and implement the Extract Quarter String operator in the expression builder.view video →
Flow Tutorials - Add/Subtract HoursThis video provides a worked example demonstrating how to add and subtract hours from a target datetime data point using the Add Hours operator in the expression builder.view video →
Flow Tutorials - Add/Subtract DaysThis video provides a worked example demonstrating how to add and subtract days from a target datetime data point using the Add Days operator in the expression builder.view video →
Flow Tutorials - Extract Days From NowThis video provides a worked example demonstrating how to compute the distance in days between a target datetime and the current date using the Days From Now operator in the expression builder.view video →
Flow Tutorials - Extract Hours From Now FunctionThis video provides a worked example demonstrating how to compute the distance in hours between a target datetime and the current date using the Hours From Now operator in the expression builder.view video →
Flow Tutorials - Minutes From Now FunctionThis video provides a worked example demonstrating how to compute the distance in minuets between a target datetime and the current date using the Minutes From Now operator in the expression builder.view video →
Flow Tutorials - From Unix Time Stamp FunctionThis video provides a worked example demonstrating how to configure and implement the From Unix Timestamp operator in the expression builder.view video →
Flow Tutorials - To Unix TimeStamp FunctionThis video provides a worked example demonstrating how to configure and implement the To Unix Timestamp operator in the expression builder.view video →
Flow Tutorials - Extract Numeric YearMonthThis video provides a worked example demonstrating how to configure and implement the Extract Numeric YearMonth operator in the expression builder.view video →
Flow Tutorials - Extract Numeric YearQuarterThis video provides a worked example demonstrating how to configure and implement the Extract Numeric YearQuarter operator in the expression builder.view video →
Flow Tutorials - Extract Numeric WeekdayThis video provides a worked example demonstrating how to configure and implement the Extract Numeric Weekday operator in the expression builder.view video →
Flow Tutorials - Extract Numeric WeekThis video provides a worked example demonstrating how to configure and implement the Extract Numeric Week operator in the expression builder.view video →
Flow Tutorials - Extract Numeric Day of MonthThis video provides a worked example demonstrating how to configure and implement the Extract Numeric Day of Month operator in the expression builder.view video →
Flow Tutorials - Extract Day of YearThis video provides a worked example demonstrating how to configure and implement the Extract Day of Year operator in the expression builder.view video →
Flow Tutorials - Extract Numeric MonthThis video provides a worked example demonstrating how to configure and implement the Extract Numeric Month operator in the expression builder.view video →
Flow Tutorials - Converting Data Types and Generic ExpressionsThis video provides a guided introduction to data types in Flow. This video looks at how to to interconvert data types in a target generic data collection by using the Change Data Type function.view video →
Flow Tutorials - Add Data Point ActionThis video provides a worked example demonstrating how to configure and implement the Add Data Point action. The Add Data Point action allows you to add a data point with a default initial value to a target dataset.view video →
Flow Tutorials - Remove Data Points ActionThis video provides a worked example demonstrating how to configure and implement the Remove Data Points action in Flow.view video →
Flow Tutorials - Slice FunctionThis video provides a worked example demonstrating how to configure and implement the Slice function in Flow.view video →
Flow Tutorials - Compound Filter ExpressionThis video provides a worked example demonstrating how to configure a compound filter expression. This video teaches how to use the Filter Expression builder to add a workflow action which performs a complex filter on a set of joint conditional evaluations.view video →
Flow Tutorials - Extract Numeric QuarterThis video provides a worked example demonstrating how to configure and implement the Extract Numeric Quarter operator in the expression builder.view video →
Flow Tutorials - Extract Year FunctionThis video provides a worked example demonstrating how to configure and implement the Extract Year operator in the expression builder.view video →
Flow Tutorials - Sum and DivisionThis video provides a worked example demonstrating how to configure and implement a compound expression using the Sum and Division operators.view video →
Flow Tutorials - Simple Filter ExpressionThis video demonstrates how to build a simple Filter expression in Flow. This video looks at how to use the Filter expression builder and narrow down elements that match or do not match a simple filter criteria.view video →
Flow Tutorials - DeduplicationThis video provides a worked example demonstrating how to implement and configure the deduplicate data operation in Flow. The deduplication operation is used to identify and remove duplicate values from a dataset based on a set of key datapoints.view video →
Flow Tutorials - Denormalization and HyperCubesThis video demonstrates a typical analysis process in Flow. This video gives a worked example showing how to join and denormalize data coming from various flat files. Datasets are flattened and transformed using the denormalization action and then converted into hypercubes for multidimensional analysis.view video →
Flow Tutorials - Index FunctionThis video provides a worked example demonstrating how to configure and implement the Index function in Flow.view video →
Flow Tutorials - Import Database TablesThis video provides a worked example demonstrating how to import relational database tables into Flow. This video shows how to use the Database Integration Interface to target tables from a database and load them into Flow's generic data for analysis.view video →
Flow Tutorials - Sample DataThis video provides a worked example demonstrating how to configure and implement the Sample Data action to take a random sample of a target dataset.view video →
Flow Tutorials - Filter Top NThis video provides a worked example demonstrating how to configure and implement the Filter Top N action to identify the Top/Bottom N values in a target dataset.view video →
Flow Tutorials - Describe Data FunctionThis video provides a worked example demonstrating how to configure and implement the Describe Data action to generate a summary statistics profile of a target dataset.view video →
Flow Tutorials - Denormalization ExampleThis video provides a worked example of denormalization in Flow. This video demonstrates importing data from various flat files and joining the datasets together into a single dataset for analysis.view video →
Flow Tutorials - Word Count ExampleThis video demonstrates how to perform an analysis of language on unstructured text data. This video shows how to use the Language Summary analysis function to evaluate a word count across all text values in a target string data point.view video →
Flow Tutorials - Analysis of BlanksThis video demonstrates how to perform an analysis of blank or missing values in a dataset.view video →
Flow Tutorials - Import and Analyze MS AccessThis video provides a worked example demonstrating how to import and analyze a MS Access database. This video shows the computational power of Flow by flattening the target database and performing a multidimensional hypercube analysis against the data.view video →
Flow Tutorials - Import and Analyze JSONThis video demonstrates Flow's unique ability to consume any JSON source without requiring knowledge of schema or structure. JSON data from a target public api endpoint is absorbed by Flow.view video →
Sample Workflows
Articles and Links
Get Started With Flow Analytics Workflows

Completely free - no trial period or credit card required.

Fully functional - access all the core features of the Flow platform.

Develop custom workflows - use our advanced, configure-not-code development environment to build custom solutions.

Free help and support - full documentation and how-to videos along with free online, telephone or email support.