Connecting to Data in Tableau
Every Tableau project begins with a question and a data source. How you connect to that data — and how you structure it before analysis — determines the performance, accuracy, and flexibility of everything you build. This chapter covers every connection option Tableau offers, from simple CSV files to cloud databases, and walks through the critical decisions around joins, unions, blending, and the modern Relationships data model.
Types of Data Connections: Live vs Extract
When you connect to a data source, Tableau offers two fundamental connection modes. Choosing the right one affects performance, freshness, and portability.
Live Connection
A Live connection queries your data source in real time. Every time you interact with a view — dragging a filter, switching a dimension, opening a dashboard — Tableau sends a query directly to the underlying database and retrieves fresh results.
How it works: Tableau translates your VizQL actions into SQL (or the native query language of your source) and sends them to the database. The database executes the query and returns results. This happens every time the view updates.
Best scenarios for Live connections:
- Data changes frequently (hourly, daily) and analysts need the most current figures
- The database is powerful (Snowflake, BigQuery, Redshift) and can handle ad hoc queries efficiently
- Security requirements mandate that data never leaves the source system
- The dataset is so large that extracting it is impractical
Extract (.hyper)
An Extract is a snapshot of your data stored in Tableau's proprietary columnar format (.hyper). The extract is saved locally or on Tableau Server and can be queried at very high speed, independent of the source system.
How it works: Tableau connects to the source, pulls the data (either fully or incrementally), and saves it as a compressed, column-oriented file. All subsequent queries run against this local file rather than the original source.
Best scenarios for Extract connections:
- The source system is slow or has query limits (REST APIs, legacy databases)
- Analysts need to work offline or in locations without network access
- You need to share data with others without giving them source access
- Performance is critical for end-user dashboard responsiveness
- The source data is stable enough that a daily or hourly refresh is acceptable
Comparison Table: Live vs Extract
| Attribute | Live Connection | Extract (.hyper) |
|---|---|---|
| Data freshness | Real-time, always current | Snapshot at time of extract |
| Query performance | Depends on source DB | Very fast — optimized columnar store |
| Source dependency | Always requires connection | Can work offline after extract |
| Portability | Not portable | Fully portable (.hyper or .twbx) |
| Data volume | Can query full dataset | Best practice: filter at extract time |
| Security | Data stays in source | Data leaves source — requires controls |
| Refresh | No refresh needed | Must schedule or manually refresh |
| Source load | Queries hit source system | No source load after initial extract |
| Best for | Real-time ops dashboards | Self-service BI, shared workbooks |
Switching Between Live and Extract
On the Data Source page, look for the Connection radio buttons in the top-right area. You can switch from Live to Extract at any time by clicking Extract and then clicking Edit to configure what data to include. When you save an extract, Tableau creates a .hyper file. Packaged workbooks (.twbx) bundle the extract inside the workbook file for easy sharing.
Supported Data Source Types
Tableau connects to over 100 native data sources. Here is a structured overview:
File-Based Sources
| File Type | Extension | Notes |
|---|---|---|
| Microsoft Excel | .xls, .xlsx | Connects to individual sheets or named ranges |
| Comma-Separated Values | .csv | Text-based, widely compatible |
| JSON | .json | Hierarchical data — Tableau flattens arrays |
| Extracts tabular data from PDF tables | ||
| Text file | .txt, .tsv | Configurable delimiter |
| Statistical files | .sas7bdat, .sav, .rda | SAS, SPSS, R data files |
| Spatial files | .shp, .kml, .geojson | For geographic/mapping analysis |
| Microsoft Access | .accdb, .mdb | Windows only |
| OData | URL | REST-based data protocol |
Relational Databases
| Database | Notes |
|---|---|
| MySQL | Open-source; widely used in web applications |
| PostgreSQL | Advanced open-source; excellent for analytics |
| Microsoft SQL Server | Enterprise standard; deep Windows ecosystem |
| Oracle Database | Enterprise standard for large organizations |
| Amazon Redshift | AWS-native analytical data warehouse |
| Google BigQuery | GCP-native serverless analytical warehouse |
| Snowflake | Cloud-agnostic data warehouse — excellent Tableau compatibility |
| IBM Db2 | Enterprise database, common in financial services |
| Teradata | High-performance analytical database |
| SAP HANA | In-memory analytics for SAP environments |
Cloud and SaaS Sources
| Source | Use Case |
|---|---|
| Salesforce | CRM data — leads, opportunities, accounts |
| Google Analytics | Web traffic and behavior data |
| Adobe Analytics | Digital marketing analytics |
| Marketo | Marketing automation data |
| ServiceNow | IT service management data |
| HubSpot | Inbound marketing and sales CRM |
| Dropbox / Box | Cloud file storage |
| Google Sheets | Collaborative spreadsheets |
| Microsoft SharePoint | SharePoint lists and Excel files |
ODBC and JDBC
For sources without a native Tableau connector, you can use ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity). This requires installing the appropriate driver and configuring a DSN (Data Source Name). Performance may be lower than native connectors.
Web Data Connector (WDC)
Tableau's Web Data Connector framework allows developers to build custom JavaScript connectors for any web-accessible API. The community has built WDCs for hundreds of services including Twitter, GitHub, weather APIs, and financial data feeds.
Connecting to Excel: Step by Step
Excel is one of the most common starting points for Tableau beginners.
Steps:
- Launch Tableau Desktop or Tableau Public
- On the Start Page, under Connect > To a File, click Microsoft Excel
- In the file browser, navigate to your Excel file and click Open
- The Data Source page opens. In the left panel, you will see all sheets and named ranges in the workbook
- Drag the sheet you want to analyze onto the canvas (the white area in the center)
- A preview of your data appears in the Data Grid at the bottom
- Verify that column names are correct and data types are recognized properly
- If your Excel file has a header row on row 2 or 3 (not row 1), click Data Interpreter — Tableau will attempt to automatically detect the real header row
- Toggle to Metadata Grid (the icon next to the data grid tabs) to see column names, data types, and remote field names. Rename or retype fields as needed
- Click a worksheet tab at the bottom to begin analysis
Common issues with Excel connections:
- Merged cells in headers confuse Tableau — avoid merged cells in source Excel files
- Multiple tables in a single sheet should be separated into separate sheets
- Formulas that depend on other sheets may not evaluate correctly — values-only exports are safer
- Date fields stored as text (e.g., "January 2024") must be cast to date type manually
Connecting to CSV: Step by Step
CSV files are the universal data format — clean, lightweight, and supported everywhere.
Steps:
- On the Start Page, under Connect > To a File, click Text File
- Browse to your
.csvfile and click Open - Tableau auto-detects the delimiter (comma, tab, semicolon, pipe) — verify this is correct in the Data Source page
- The Data Grid shows a preview. Scroll right to verify all columns are captured
- If column headers are missing, Tableau uses F1, F2, F3 as placeholders — rename them in the Metadata Grid
- Check that numeric columns are not typed as strings (look for the
Abcicon vs#icon on column headers) - To change a data type: click the type icon on the column header and select the correct type from the dropdown
- Proceed to worksheet tabs to begin analysis
Tips for CSVs:
- Use UTF-8 encoding to avoid character encoding issues
- Avoid commas within quoted fields if possible — they can cause parsing errors
- Ensure consistent date formats within a date column
The Data Source Page in Detail
The Data Source page is your data preparation workspace. Understanding its components makes everything downstream easier.
Left Panel
The left panel lists all available tables, sheets, or views from your connected source. For Excel, this shows each sheet. For a SQL database, it shows schemas and tables. You can:
- Search for a specific table using the search box
- Expand schemas to browse tables within them
- Drag tables directly onto the canvas
Canvas
The canvas is the central workspace for building your data model. Drag tables here to begin. When you drag a second table, Tableau automatically attempts to create a join or relationship based on matching field names. The canvas displays tables as boxes connected by lines representing joins or relationships.
You can:
- Double-click the join/relationship icon between tables to edit the join type or join conditions
- Drag additional tables to create multi-table models
- Click the union icon to stack tables vertically (union)
Data Grid
The Data Grid at the bottom shows a preview of your combined dataset — up to 10,000 rows by default. Use it to verify:
- Column names look correct
- Data types are appropriate
- Row counts seem reasonable
- Null values are handled properly
Metadata Grid
Toggle to the Metadata Grid (click the grid icon) to see:
- Field Name: What Tableau will call the field in the Data pane
- Remote Field Name: The actual column name in the source
- Data Type: The type Tableau has assigned
- Table: Which table the field comes from
In the Metadata Grid, you can rename fields, hide fields (right-click > Hide), and change data types — all non-destructively.
Joins in Tableau
Joins combine columns from two or more tables horizontally, matching rows based on a common key.
Join Types
| Join Type | Rows Included | SQL Equivalent | Best Use Case |
|---|---|---|---|
| Inner Join | Only rows with matching keys in both tables | INNER JOIN | When you only want records with complete data on both sides |
| Left Join | All rows from left table; matching rows from right | LEFT OUTER JOIN | Keep all records from primary table even if no match |
| Right Join | Matching rows from left; all rows from right | RIGHT OUTER JOIN | Keep all records from secondary table |
| Full Outer Join | All rows from both tables; nulls where no match | FULL OUTER JOIN | When you need every record from both sides regardless of match |
Creating a Join in Tableau
- On the Data Source page, drag the first table onto the canvas
- Drag the second table next to it — Tableau shows a join icon (two overlapping circles) between them
- Click the join icon to open the Join editor
- Verify the Join Type (default is Inner) and the Join Clauses (field pairs used to match rows)
- Add additional join conditions by clicking the + button if needed
- The Data Grid updates to show the joined result
Understanding Join Performance
- Joins happen at query time in Tableau
- For large tables, pre-joining in the database (using a SQL view or custom SQL) is often faster
- Cross-database joins (joining tables from two different sources) are possible but slower than single-source joins
When to Use Each Join Type
Scenario: You have an Orders table and a Returns table. You want to see all orders and flag which ones were returned.
- Answer: Left Join from Orders to Returns — keeps all orders, adds return information where it exists (NULL where not returned)
Scenario: You have a Products table and a Sales table. You want only products that have actually been sold.
- Answer: Inner Join — excludes products with no sales records
Unions in Tableau
A Union stacks tables vertically — appending rows from one table beneath another. Both tables must have the same (or compatible) column structure.
When to Use Unions
- Sales data split across multiple CSV files (one per month, one per region)
- Multiple Excel sheets with the same structure
- Historical data split across tables by year
Creating a Union
- Drag the first table onto the canvas
- Drag the second table directly on top of the first — Tableau shows a Union prompt (vs. a join prompt when you drop it beside)
- Tableau creates a union and shows a "Sheet" column identifying which source each row came from
- You can expand the union to add more tables, or use Wildcard Union to automatically include all sheets matching a pattern
Wildcard Union
Wildcard Union is a powerful feature for when you have many files with the same structure — for example, monthly CSV exports.
- Connect to one of the CSV files
- On the Data Source page, drag the file to the canvas
- Click the union dropdown arrow and select Edit Union
- Switch from Specific to Wildcard (automatic) — Tableau will include all matching files in the same folder
Data Blending vs Joins
Data Blending and Joins both combine data from multiple sources, but they work very differently and suit different scenarios.
Joins
- Happen at the row level before aggregation
- Both tables must be in the same data source (or cross-database join)
- Result is a single combined table
Data Blending
- Happens after aggregation — each source is queried independently
- Used when you cannot join at the row level (different granularities, different sources, different databases)
- One source is the Primary, the other is the Secondary
- The Secondary source contributes aggregated values to the Primary view
Key Differences Table
| Feature | Join | Data Blend |
|---|---|---|
| Timing | Before aggregation (row-level) | After aggregation |
| Data sources | Same source (or cross-DB join) | Can be completely different sources |
| Granularity | Must match on a key | Can differ — blending aggregates first |
| Performance | Generally faster | Can be slower for large secondaries |
| Flexibility | Less flexible with mismatched levels | More flexible |
| Nulls | Creates nulls for non-matching rows | Non-linked fields show asterisk (*) |
When to Blend
Use Data Blending when:
- Your data lives in incompatible sources (e.g., SQL Server + Google Sheets)
- The tables have different granularities (e.g., daily sales vs monthly targets)
- A join would cause row duplication due to one-to-many relationships
Setting Up a Blend
- Connect to your Primary source and build a view
- Connect to your Secondary source (via Data > New Data Source)
- In the view, both sources appear in the Data pane — the primary has a blue checkmark, the secondary has an orange link icon
- Ensure the linking fields (the common dimensions) have the same name, or manually define the relationship in Data > Edit Blend Relationships
Relationships (Tableau 2020.2+)
Starting with Tableau 2020.2, Relationships replaced joins as the default data modeling approach. Relationships are more flexible, more intelligent, and avoid many of the pitfalls of traditional joins.
How Relationships Differ from Joins
| Concept | Traditional Join | Relationship |
|---|---|---|
| When combined | At data source level, always | At query time, only when needed |
| Row duplication | Yes, with one-to-many | No — each table queried independently |
| Null handling | Depends on join type | Automatic — preserves row counts |
| LOD calculations | Can break with joins | Context-aware, table-level accurate |
| Flexibility | Rigid — fixed join type | Flexible — adapts to the analysis |
How Relationships Work
Think of Relationships as "noodles" connecting tables — they define how tables are related without immediately merging them. Tableau stores the tables separately and only combines them when the view actually needs both tables simultaneously. This means:
- A SUM([Sales]) from the Orders table never double-counts just because Returns is also related
- You can have different levels of granularity between related tables without distortion
- Tableau automatically determines the join type (inner/outer) based on what the visualization needs
Creating a Relationship
- Drag the first table onto the canvas (the canvas switches to "relationship mode" — it looks different from the join canvas)
- Drag the second table onto the canvas
- Tableau draws a line (noodle) between them and suggests matching fields
- Click the line to see and edit the relationship fields
- Optionally configure Performance Options: tell Tableau whether the relationship is one-to-one, one-to-many, or many-to-many (helps optimize queries)
Data Source Filters
Data Source Filters are applied at the connection level — before any data reaches the worksheet. They are the most efficient filters in Tableau because they reduce the dataset at the source.
Creating a Data Source Filter
- On the Data Source page, click Add in the Filters section (top-right corner)
- Click Add in the dialog box
- Select a field to filter on
- Configure the filter condition (e.g.,
[Year] = 2024or[Region] IN ("North", "South")) - Click OK
Best practices for Data Source Filters:
- Filter out irrelevant historical data to improve extract performance
- Use Data Source Filters to restrict data by geography, business unit, or date range
- They apply to all worksheets using that data source — use carefully if some sheets need the full dataset
Hiding and Renaming Fields
Keeping your Data pane organized dramatically improves analyst productivity — especially on large datasets with hundreds of columns.
Hiding Fields
Right-click any field in the Data pane or Metadata Grid and select Hide. Hidden fields do not appear in the Data pane but still exist in the data. They can be unhidden at any time.
Use hiding to: Remove ID columns, internal flags, system fields, or any field that analysts will never use directly.
Renaming Fields
Double-click a field name in the Data pane or Metadata Grid to rename it. Renaming is non-destructive — the original column name in the source is preserved (visible as "Remote Field Name" in the Metadata Grid).
Best practices:
- Rename abbreviations to full words (
Qty→Quantity,Prd→Product) - Remove underscores and add spaces (
order_date→Order Date) - Use consistent title case formatting
Changing Data Types
Tableau auto-detects data types, but sometimes gets them wrong — especially with dates stored as text, or ZIP codes stored as numbers.
Available Data Types
| Icon | Type | Examples |
|---|---|---|
Abc | String (text) | "Customer Name", "Region" |
# | Number (decimal) | 1234.56 |
# | Number (whole) | 1234 |
| Calendar icon | Date | 2024-01-15 |
| Calendar+clock icon | Date & Time | 2024-01-15 14:30:00 |
| T/F | Boolean | True / False |
| Globe icon | Geographic role | Country, State, ZIP Code |
Changing a Data Type
- In the Data Source page Metadata Grid, click the type icon on the column header
- Select the correct type from the dropdown
- Tableau converts all values in that column to the new type (values that cannot be converted become
Null)
Common corrections:
- ZIP codes detected as integers → change to String (to preserve leading zeros)
- Date columns stored as text → change to Date (ensure format matches, e.g.,
MM/DD/YYYY) - Boolean columns stored as integers (0/1) → change to Boolean
Geographic Roles
Tableau can plot geographic fields on maps automatically if they are assigned the correct Geographic Role.
- Right-click a field in the Data pane
- Hover over Geographic Role
- Select the appropriate role: Country/Region, State/Province, City, ZIP Code/Postcode, Latitude, Longitude, etc.
Tableau matches field values against its internal geographic database. Fields that match successfully get a globe icon and can be placed on the view to generate an automatic map.
Practice Exercises
Exercise 1: Live vs Extract Decision Making
Objective: Choose the right connection type for real-world scenarios.
For each scenario below, decide whether a Live connection or an Extract is more appropriate. Write a 2-3 sentence justification for each choice.
Scenarios:
- A retail chain's operations team monitors store sales every hour and needs to react immediately to anomalies. Their data lives in a high-performance Redshift warehouse.
- A marketing analyst needs to share a quarterly campaign performance report with 50 non-technical stakeholders. Data comes from Google Analytics and Salesforce. Updates are needed weekly.
- A financial analyst in a regulated industry cannot allow sensitive customer data to leave the on-premises Oracle database under any circumstances.
- A startup is building a public-facing COVID-19 tracker using daily CSV files published by a government agency.
Deliverable: A decision table with connection type choice and justification for each scenario.
Exercise 2: Join Exploration with Superstore
Objective: Experience different join types and understand how they affect row counts.
Steps:
- Download or locate the Sample - Superstore Excel file
- Connect to it in Tableau
- Drag the Orders sheet to the canvas
- Drag the Returns sheet next to it — observe the automatic join
- Click the join icon. What join type did Tableau choose? Why?
- Change the join to a Left Join — how does the row count in the Data Grid change?
- Change to Inner Join — how many rows are there now? What happened to non-returned orders?
- Change to Full Outer Join — what do the NULL values represent?
- Deliverable: A table recording row counts for each join type and an explanation of the difference
Exercise 3: Data Source Preparation
Objective: Clean and organize a data source using the Data Source page tools.
Steps:
- Connect Tableau to the Superstore Excel file
- On the Data Source page, open the Metadata Grid
- Rename the following fields:
Row ID→Record IDPostal Code→ZIP Code
- Hide the following fields that are not needed for analysis:
Row ID(renamed),Country/Region(since all data is USA) - Change
ZIP Codeto the String data type - Assign
State/Provincethe Geographic Role of State/Province - Add a Data Source Filter: include only orders from the year 2023 and 2024 (
Order Dateyear is 2023 or 2024) - Navigate to a worksheet and confirm only 2023-2024 data appears
- Deliverable: A screenshot of your cleaned Data Source page with the filter, renamed fields, and hidden fields visible
Summary
This chapter covered everything you need to know to confidently connect Tableau to any data source and prepare it for analysis.
Key takeaways:
- Live connections query data in real time — best for fresh, fast databases where timeliness matters
- Extracts (.hyper) are local snapshots — best for performance, portability, and slow/limited sources
- Tableau supports over 100 native connectors spanning files, relational databases, cloud platforms, and statistical tools
- Connecting to Excel and CSV is straightforward — use Data Interpreter for messy headers and always verify data types
- The Data Source page has four key areas: left panel (tables), canvas (data model), Data Grid (row preview), and Metadata Grid (column properties)
- Joins combine tables horizontally at the row level; choose Inner, Left, Right, or Full Outer based on which rows you need to preserve
- Unions stack tables vertically — both tables must have compatible column structures
- Data Blending combines aggregated results from separate sources — use it when row-level joins are impossible or inappropriate
- Relationships (Tableau 2020.2+) are the modern, flexible default — they avoid row duplication and adapt automatically to the analysis context
- Data Source Filters restrict data at the connection level for performance and security
- Always rename, hide, and retype fields on the Data Source page to keep your Data pane organized and analysis-ready
In the next chapter, you will put this data to work — building your first visualizations using Tableau's drag-and-drop interface, mastering chart types, and learning the visual design principles that separate a good chart from a great one.