How to… Split data fields
One of the most common actions you will take when preparing data for analysis will be to split a column in to it's sub-parts. This happens as data is picked up from operational systems, outputted as a unique ID for that record of data, or, squeezed together to fit in to a poorly designed database table. The human brain is fantastic at spotting patterns in data (that's why we create visual analytics) so you will often spot the need to split data fields (columns) by just looking at the original data.
Probably needs to become split apart to form the three separate columns that would help with the analysis of this data set.
Basic Splits
Splitting data in most data tools can be very easy to achieve; Tableau Prep is no different. Simply choose the data field you want to split, go and find the '...' icon, select, 'Split Values' and then 'Automatic Split' and Prep will decide what is the best logic to split the field.
In this case, the automatic split has worked as desired:
Prep actually write three calculated fields to perform this task which you can always edit if you want a slightly different result.
You will find these calculations in the Changes pane in Prep. You can also edit them from here, or alternatively learn how Prep completed the task you set it:
This formula is splitting the 'Product Code' field on the '-' separator and pulling back the third part (ie what comes after the second '-'. The resulting values are then 'trimmed' which means any leading or trailing spaces are removed as these can play havoc when matching text values.
When that doesn't work as intended - Advanced Splits
Let's tweak the data a little bit to give a pattern of delimiters (the character we split the data field by) but hide that a little more.
The Product Type (first part of the Product Code) has had the word 'Soap' removed and spaces added either side of the hyphen. The results from Prep showed this more irregular pattern doesn't give the results we are looking for when using an Automatic Split.
The Product Type has disappeared from the data resulting from the split completely. Notice all those blank records? The blanks are the result of the word 'Soap' being removed from the bars as there are no blanks resulting from the 'Liquid Soap' rows. Let's look at the calculation Prep is writing:
Well that's not clear, is it?
In these situations, this is where your human eye can take over and ignore the Prep automatic and set your own 'Custom' split. By setting up the Custom Split to work on the hyphen then the results are back to what we'd want from this data set.
In the Custom Split, not only do you set the delimiter, but you also set what you want returned from the split. Here, let's select 'All' to ensure all the data is returned.
Custom Split can be a very powerful way to save you from having to write a number of complex calculations. Here the third calculation Prep has built is quite complicated but you could simplify if you choose to but no need unless your flow is struggling for performance.
Where to not use Split
Although Split is a powerful function, there are times when you might want to check it is the right technique for the task you have been set.
Address Data
Here the address details are separated by columns. However, if you were to split them, due to some 'street' address lines being separated by commas in two parts and some all in one, that section of the address will reside in different columns. Addresses are typically difficult to pick apart and likely take more complex calculations / logic to align the correct parts to the correct columns. If you want to solve this problem have a go at 2019 Week 46 Challenge as split can be part of the solution, but won't be the only part.
No clear delimiter
If there is no clear and logical delimiter to split by, then the split technique is definitely not the one to follow. In this circumstance, more advanced string functions like FIND() or FINDNTH() might be a good approach. Otherwise, regular expressions (also known as regex) functions would be useful. These will covered in a future 'How to...' post.
Ultimately, split is a great first technique to investigate to break-up data columns to aid your analysis. In these situations Prep has a great couple of options but might not always be the solution and you'll need to form the logic yourself.