When working with natural language data a typical first course of action is to evaluate the tokens (or words) present in the data and assess the distribution of their frequencies.
Flow provides a specific action to perform this type of analysis called Language Summary. The Language Summary function takes in a text data point and returns a new profile data set containing each word in the text and the count of times that token occurred.
This function is a powerful one-step means of performing an initial exploratory analysis on text data.
In this blog post, I will provide a worked example demonstrating how to configure and implement the Language Summary function.
The scenario we will explore will focus on a sample set of natural language data loaded from the Flow Cloud.
Each record in our sample dataset contains the raw text of an article written about business intelligence. We will learn how to apply the Language Summary function to analyze the raw text data. The result of our analysis will be a new dataset which holds each unique word from our text and the number of times the word occurred.
If you do not have a Flow account - register here. Flow is free for personal use.
To see the step-by-step walkthrough of how to perform this type of natural language analysis - check out the video below: