How To Remove Duplicate Values in Tableau Prep
One feature of tableau prep is the ability to help with data cleansing. Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset, table, or database and refers to identifying incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. There are times when you want your dataset to only have unique values. In this example, we are going to use Tableau Prep to create a dataset that only has one record per customer. We would typically use this as a dimension/lookup table in our data model.
Option 1: Aggregate
We can use the built-in aggregate functionality to remove duplicates. By default, Tableau prep will remove duplicate values when you use group by
We have connected to the superstore dataset and removed the unnecessary columns. We now have a dataset that contains Customer ID and Customer Name. In this example, you can see that there are several customers with multiple rows in the dataset.
Next we will add an Aggregate step to the workflow and add Customer ID and Customer Name to the Grouped Field section. We will now see that we have one record per customer.
To test this out I’ve filtered Claire Gute and we can see only one record for this customer.
Option 2: Create a unique rank and filter out results
In this example, we will walk through removing records based on the latest order date. In our dataset, we have the order date, customer id, and customer name.
Next, we are going to create a calculated field and create a ID using the partition, order by, and Row Number functionality. We partitioned by Customer ID because we want our counts to reset after each new customer id. We ordered by order date DESC because want the id to be based on the latest date (if we wanted this to be based on the earliest date then we would use ASC)
We now have a unique ID for each record in sequential order for each customer.
Next we will filter our calculated field to only keep 1.
Now we will remove Rank and Order from our dataset and we will have a finished dataset with only unique values.
Tableau Prep can be a powerful tool that can save you a lot of time in preparing your data to visualize. I love helping clients understand their data at a new level through the art and science of data visualization. To learn more about how I and Lovelytics help clients do more with their data, please visit us at www.lovelytics.com or connect with us by email at [email protected].