2018-04-25

A Quick Introduction to the Five Types of Filters in Flow

A Quick Introduction to the Five Types of Filters in Flow


In this blog post, I will provide a worked example demonstrating the five different types of filter operations in Flow.

The filters we will explore in this blog post are as follows:

  1. Conditional Filter
  2. Index-Based Row Filter
  3. Top-N Filter
  4. Pop Chunk
  5. Sample Data

Each of these filters is used to select a specific subset of rows from a target data collection into a designated result data collection. The five different types of filters each have their various applications and make up some of the elementary operations that are integral to powerful workflow design.

The filter operations are considered a subset of the Working Data actions in Flow. The Working Data actions are the set-based actions in the framework - that is - they are part of the group of operations which take in datasets as input and produce datasets as a result.

This blog post will focus on the simple (non-dynamic and non-variable based) implementations of these different filter operations. In a later blog post, I will provide a detailed explanation of the more advanced configurations of these various actions such as filtering based on variables or embedding filters in loops.

In the section below, I provide a brief overview of each of the different types of filters. I then include a comprehensive video walkthrough which shows examples of how to configure and implement each of these different filters.

Conditional Filter

The Conditional Filter (simply called Filter) is used to filter out records from a target input data collection which match a sequence of conditional rules.

For example, if we had the following dataset with four data points:

Figure 1 - Sample Dataset Before Applying Conditional Filter

Row IDDate of SaleStateSale Amount
a1/1/2017 CA 40000
b1/3/2017 MD 13000
c1/5/2017 DE 31000
d1/9/2017 CA 12000
e1/12/2017 FL 54000
f1/14/2017 CA 53000

We may want to explicitly select a subset rows where the State = CA and Sales > 20000.

The Conditional Filter could be used to create a sequence of rules such as:

  State = CA
      AND
Sales > 20000

Applying this filter to the example dataset above would produce the following result:

Figure 2 - Matched Items Data Collection After Applying Conditional Filter

Row IDDate of SaleStateSale Amount
a1/1/2017 CA 40000
f1/14/2017 CA 53000

The Conditional Filter also gives the option to subset the records which do not match the filter criteria into its own data collection.

The unmatched items for the filter expression above would produce the following result:

Figure 3 - Unmatched Items Data Collection After Applying Conditional Filter

Row IDDate of SaleStateSale Amount
b1/3/2017 MD 13000
c1/5/2017 DE 31000
d1/9/2017 CA 12000
e1/12/2017 FL 54000

As you can see, the unmatched objects are the elements in our target dataset which did not match our filter conditions. That is - the State was not Equal to CA, or the State was equal to CA, but the Sale Amount was less than 20,000.

Index-based Row Filter

The Index-based Row Filter is used to filter out records from a target data collection based on a given start and end index.

For example, we may want to select the 20th to 80th rows from a data collection and store just the rows between those given indices into a new data collection.

There are many applications of this type of filter that come up in practice. The Index-based Filter provides the ability to select a certain subset of rows based on their position in a data collection.

If we had the sample dataset below:

Figure 4 - Sample Dataset Before Applying Index-based Row Filter

Row IDTransaction DateCustomer IDCustomer NameSale Amount
a 12/15/2017 1 John Smith $4,130
b 12/17/2017 2 Dan Kindig $1,130
c 11/13/2017 3 Bob LeFlore $2,277
d 9/3/2017 4 Kevin James $12,394
e 12/15/2017 5 Debby Finn $932
f 12/30/2017 6 Lynn Johnson $1,981
g 3/9/2017 7 Paul Steel $3,112
h 9/23/2017 8 Robert Stick $9,721

We may want to apply a filter which selects the elements from index 3 to index 7. Applying the index based row filter to this dataset with start index of 3 and end index of 7 would produce the following dataset result:

Figure 5 - Result Dataset After Applying Index-based Row Filter (Selecting Indices 3 to 7)

Row ID Transaction Date Customer ID Customer Name Sale Amount
c 11/13/2017 3 Bob LeFlore $2,277
d 9/3/2017 4 Kevin James $12,394
e 12/15/2017 5 Debby Finn $932
f 12/30/2017 6 Lynn Johnson $1,981

Top-N Filter

The Top-N Filter is used to select the Top/Bottom N or N % of values from a target data collection.

An example of this could be to create a top sales report or to filter outliers from a target dataset. For example - we may want to filter out our Top 20 Customers based on the number of purchases they are making.

We could also use the % based implementation of the Top N filter to target the Top 20% of customers based on a quantitative data point of interest.

Random Sample Filter

The Random Sample Filter is used to take a random sample of a fixed size and store that sample as a new data collection. Sampling data comes up often in statistics and data analysis. The random sample filter provides the option to allow the same element to be selected multiple times (allow repeats) or to ensure that the sample is random with no repeats.

Pop Chunk

Pop Chunk is a unique type of filter which pops the first N values off the top of the target source collection and appends them to the bottom of a target result collection.

If the designated result collection of the Pop Chunk function does not exist in the working data container - it will be created when the operation executes, and the popped records will subsequently append into that collection.

For example - if we had the sample dataset below:

Figure 6 - Sample Source Dataset Before Applying Pop Chunk With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
a 12/15/2017 1 John Smith $4,130
b 12/17/2017 2 Dan Kindig $1,130
c 11/13/2017 3 Bob LeFlore $2,277
d 9/3/2017 4 Kevin James $12,394
e 12/15/2017 5 Debby Finn $932
f 12/30/2017 6 Lynn Johnson $1,981
g 3/9/2017 7 Paul Steel $3,112
h 9/23/2017 8 Robert Stick $9,721

We could apply the Pop Chunk function setting N equal to 2. That is, we are popping two elements off of the top of our source collection and appending them to the bottom of our result collection.

Applying the Pop Chunk function to our sample source data collection with N = 2 would produce the following result:

Figure 7 - Sample Source Data Collection After Applying 1 Pop Chunk With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
c 11/13/2017 3 Bob LeFlore $2,277
d 9/3/2017 4 Kevin James $12,394
e 12/15/2017 5 Debby Finn $932
f 12/30/2017 6 Lynn Johnson $1,981
g 3/9/2017 7 Paul Steel $3,112
h 9/23/2017 8 Robert Stick $9,721

Figure 8 - Result Data Collection After Applying 1 Pop Chunk With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
a 12/15/2017 1 John Smith $4,130
b 12/17/2017 2 Dan Kindig $1,130

If we apply the Pop Chunk function a second time - we would pop the next two elements off of the source collection and append them to the bottom of the result collection.

The result would like this:

Figure 9 - Sample Source Data Collection After Applying 2 Pop Chunks With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
e 12/15/2017 5 Debby Finn $932
f 12/30/2017 6 Lynn Johnson $1,981
g 3/9/2017 7 Paul Steel $3,112
h 9/23/2017 8 Robert Stick $9,721

Figure 10 - Result Data Collection After Applying 2 Pop Chunks With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
a 12/15/2017 1 John Smith $4,130
b 12/17/2017 2 Dan Kindig $1,130
c 11/13/2017 3 Bob LeFlore $2,277
d 9/3/2017 4 Kevin James $12,394

If we apply the Pop Chunk function a third time - the pattern would continue and we would continue to remove the first N (in this case 2) elements from the source collection and append them into the designated result collection.

The result would like this:

Figure 11 - Sample Source Data Collection After Applying 3 Pop Chunks With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
g 3/9/2017 7 Paul Steel $3,112
h 9/23/2017 8 Robert Stick $9,721

Figure 12 - Result Data Collection After Applying 3 Pop Chunks With N = 2

Row ID Transaction Date Customer ID Customer Name Sale Amount
a 12/15/2017 1 John Smith $4,130
b 12/17/2017 2 Dan Kindig $1,130
c 11/13/2017 3 Bob LeFlore $2,277
d 9/3/2017 4 Kevin James $12,394
e 12/15/2017 5 Debby Finn $932
f 12/30/2017 6 Lynn Johnson $1,981

The Pop Chunk function allows for all kinds of powerful data processing techniques by allowing generic data collections to act similar to a stack. You can continue to apply the Pop Chunk function to a data collection until all elements in the source collection are empty.

In the video below, I provide a comprehensive walk-through which explores the implementations of these different filters. I demonstrate how to configure and apply each of the filter workflow actions described above and explore the result data collections produced by their execution.

Check out the video here:

#if !DEBUG #endif