How to… use String functions

Rarely is the data you want to use clean and ready for analysis; that’s probably why you are reading this! The most complex data to clean is what we know as the string data type. If that name isn’t familiar, I’d recommend you read this post before continuing. There are many calculation functions that help you process your string data and turn it into something more useful for analysis. As with the previous post on date functions, this post will take you through examples of which functions you might want to focus on and what they will allow you to do.

Prep Builder click-only options

Just like the dates, Prep Builder allows you access to a number of common functions without having to build each of the calculations yourself. There are three main areas under the ellipsis menu (the … icon that appears when you hover over a data field on the profile pane):

Clean

Screenshot 2021-05-10 at 20.05.05.png

This section lets you complete lots of easy (and some not so easy) functions to make your string values fit what you need them to be. Here is a brief description of each:

  • Make Uppercase - Makes the alphabetical characters UPPERCASE

  • Make Lowercase - Makes the alphabetical character lowercase

  • Remove Letters - would remove all alphabetical characters from the values in the data field.

  • Remove Numbers - would remove all numerical characters between 0 - 9

  • Remove Punctuation - removes common punctuation marks (but not all symbols)

  • Trim Spaces - removes any space characters from the start or end of a value

  • Remove Extra Spaces - removes any double (or more) space characters from anywhere within the value leaving just one space

  • Removes All Spaces - removes any space from anywhere within a value

Group Values

Screenshot 2021-05-10 at 20.20.06.png

There are four options you have in this section but I wrote a separate post about them in much more detail here. Grouping is very useful to fix inconsistent values through spelling mistakes.

Split Values

Screenshot 2021-05-10 at 20.20.49.png

Some data sets will contain values that are amalgamations of multiple values. To be analyse each in turn you will need to split these values up. I wrote mote about these functions here.

String functions in a calculation

Prep Builder’s options for click-only solution continues to grow but sometimes you will need to go beyond even these options. This will involve creating a calculation and then using functions to instruct Prep Builder to complete a specific task. I will use a version of Superstore Sales (the demo data set that ships with Tableau) so you can follow along with the examples if you wish to.

Before we get to using a function in the calculation editor, there is one basic concept you will likely need when working with string data; joining two string fields together. In all of the Tableau tools, this is achieved by using a + sign. For example, if I wanted to have State and Region in the same field, I can use the calculation [State]+[Region]

Using Desktop, we can see the result of this calculation might not be ideal for the analysis you want to do.

Frequently you will want to add additional spaces or punctuation into the calculation to make the format more presentable and readable. You can use quote marks or speech marks around the punctuation you want to add into the resulting string field. Notice how only the spaces inside the quote marks in the image below make it into the resulting data field.

Let’s go through some of the specific functions that you will commonly call upon to solve more challenging use cases:

Contains

A contains function allow you to see whether a set of characters, in a certain order are contained within each value in the field being assessed by the calculation. These characters in a certain order is often referred to as a substring. The result of the calculation is just true is the substring is found in the value or false if the substring is not found.

Screenshot 2021-05-10 at 20.46.18.png

This result of the calculation is a field with either true or false returned depending is the word bookcase is found anywhere within the Product Name field.

Screenshot 2021-05-10 at 20.48.19.png

If you only want to see if a substring exists at the start of the value you can use the STARTSWITH( ) function. If you only want to assess the end of the value then you can use the ENDSWITH( ) function to see if the value ends with the substring you put in the function.

Find

One of my favourite functions that is frequently a building block in a more complex calculation. The function allows you to see the position (ie which character number in a string) where a substring is or starts. This function is useful where the substring isn’t always in the same place in each value.

For example, here’s how you could locate the space between someone’s first name and second name. This is useful as first names have many different lengths whether you’re assessing Tom, Carl, Jenny or Jonathan.

Screenshot 2021-05-10 at 20.57.31.png

In this example, the resulting value is the numeric value of the position where the space is found:

Screenshot 2021-05-10 at 21.00.52.png

If you didn’t want to just return the value for the first instance of a substring, you could use the FINDNTH( ) function instead. This involves adding an additional element to the function after the substring with a value to return that nth instance of the substring.

Screenshot 2021-05-10 at 21.07.42.png

If the nth instance isn’t found, a zero will be returned instead. Which results in the following values:

Screenshot 2021-05-10 at 21.07.28.png

Left and Right

These two functions are used interchangeably depending on what direction you want to bring a substring back from a longer string field. If you want to bring back the first n characters, you will need to use the left function:

Screenshot 2021-05-10 at 21.16.15.png

This results in the first 3 characters being returned from the data field being assessed:

Screenshot 2021-05-10 at 21.17.45.png

The right function works from the other end of the string and counts in the opposite direction. The Order ID field in the version of Superstore this post uses has a sequential number as the last six digits at the end of it that we can return using the right function.

Screenshot 2021-05-10 at 21.21.46.png

This results in the following values:

Screenshot 2021-05-10 at 21.22.08.png

If you don’t have a consistent number of characters you need to return as the point you return too isn’t consistent then you might want to couple the left or right function with a FIND( ) function.

Screenshot 2021-05-10 at 21.25.10.png

This means the customer’s first name, no matter what the length will be returned:

Screenshot 2021-05-10 at 21.26.14.png

Mid

If you don't want to bring back just the first or last characters of a string but some from the middle of a string then MID( ) will be your new best friend. Let’s use our Order ID field but this time you might want the year of the order that in the middle of the Order ID field. To get the year value back, you can use this calculation:

Screenshot 2021-05-10 at 21.29.51.png

This brings back the substring from the fourth character for the next four characters:

Screenshot 2021-05-10 at 21.31.03.png

Whilst you must specify where you want your MID( ) function to start, you don’t have to specify how many character to return. If you don't add in the third part of the function, Prep Builder will return all the character from the set start point. For more complex string calculations, FIND( ) will again prove to be very useful.

String calculations can take a lot of getting used to especially when you start using them inside each other. There is nothing wrong with building each calculation in turn and using the result of one inside the other. Getting used to working with string data will make your data cleaning and preparation much faster and more effective.

Previous
Previous

How to… create a Running Total

Next
Next

How to… use date functions