How to… connect to a Database
Using data files will be a significant part of most users data preparation but connecting to databases should not be ignored. Most organisations have built up significant data assets and the majority of this data will be held within databases.
What is a database?
Databases, data warehouses and data lakes will all be common terms to most people working close to those using data in their jobs everyday but not everyone will be familiar about the differences.
Database - a piece of software that resides on a computer (often a server) that specialises in ingesting, storing and providing data to other tools. The database is likely to be split up in to different objects namely tables and views, therefore, the data needs to be well structured.
Data Warehouse - a collection of databases, or particularly large databases, are commonly called a data warehouse. By storing multiple databases together, it becomes beneficial as resources like memory can be shared between the servers.
Data Lake - a newer concept that is being more commonly found. A Data Lake allows for more flexible storage of all data types and files. Data is often held in Data Lakes where the data is yet to be processed to provide the structure required for storage in a database.
Databases will be used as catch-all term for all of the above storage forms unless otherwise specified as Tableau Prep largely connects to the database-like software that is built into even Data Lakes.
Databases hold the majority of the data in Tables. The main type of storage is referred to as a Relational Database where data is held in tables but to reduce the storage space, most data will be held as numeric IDs that are looked up in Reference tables. Piecing the data together that you require add another set of potential challenges to working with data.
Connecting all of the tables together each time you want to use those data sets would be laborious and prone to mistakes. Views are created and are connected to in very similar ways that tables are. Only in the database is a View different to Table. The Table holds the data, whereas the View is the logic of how tables connect together to form the view.
How to connect to a database within Prep?
Connecting to a database is a different experience from connecting to a file.
The set-up of the database connection is relatively similar between the different database connections available within Prep. Let's walkthrough one of the most typical databases in the world - Microsoft SQL Server.
Enter the IP address or URL for your server
Pick the type of authentication that the set-up requires
The username and password are the same credentials set-up for each user on the database
The credentials the database uses means that any access set-up for the analyst who normally uses the database, will be available for them within Tableau. The same is true also for tables and views that the user doesn't have access to, they won't have access in Tableau.
After clicking 'Sign in', the user is taken through to the normal working screen within Prep but with a different left hand connection pane than when working with Files.
In a weird quirk of terminology and logic, the user is presented with a list of databases found at the address entered earlier. After selecting the database required, the user is presented with a list of tables to select where the data resides.
Danger!! What damage can you do?
Currently, very little. Tableau announced at their 2019 Global conference that Prep would gain the functionality to write back to a database. At the time of writing this (version 2020.1) this functionality has not been released.
The only damage that can be done is through making decisions based on incorrectly formed datasets. As Joins are much more common in databases, there is greater potential for poorly formed datasets. Prep has a great answer to this through - the Profile Pane. Seeing the data, having rogue values highlighted through data roles and gaps in the distribution of data values, the Profile Pane helps understand where Joins have formed incorrectly.
The benefits of working with a database far outweigh the downsides. Everyone has to learn how to get the most out of a database but using Prep is a great introduction to this. If you want to push this connection type further, look at the 'How to...' post on Custom SQL.
When should you avoid connecting to a database?
Databases are not the solution for everything that you will do within data preparation. Databases largely require structured, clean data. If you have files of data that contain useful information, it's worthwhile accessing these directly rather than wait to have them loaded into a database first. Obviously, the files can be added to the database if they become a data source for a productionalised report or analysis.
Gaining permissions to a database can become a blocker to quick, agile work. If the data only resides in a database on a table to what you don't have permission for, you have no choice but to wait. However, if you can source the information from source, faster than waiting for your permission to be granted to the official data source, then this might be worthwhile just getting started and switching to the official source once permission is granted.
Databases are heavily used in large organisations and can actually be slower to work with than extracts held on Tableau Server or elsewhere. If you find Prep begins to take a long time to process anything you do within the tool, work with your data team to understand if there are other options to host the data outside of the database.
Either way, once you gain experience connecting to databases and you know your tables inside and out, there is very little to stop you from creating strong analysis. Data held in databases will often require less cleaning than File Type data but setting up joins and applying filtering logic will create powerful resources. The time spent practising and learning how to use databases will reap a lot of rewards.