Chapter 2 of 12

Connecting to Data in Tableau

A deep dive into Tableau's data connection options — Live vs Extract, supported sources, joins, unions, blending, relationships, and data preparation on the Data Source page.

Meritshot21 min read
TableauData ConnectionData SourcesETL
All Tableau Chapters

Connecting to Data in Tableau

Every Tableau project begins with a question and a data source. How you connect to that data — and how you structure it before analysis — determines the performance, accuracy, and flexibility of everything you build. This chapter covers every connection option Tableau offers, from simple CSV files to cloud databases, and walks through the critical decisions around joins, unions, blending, and the modern Relationships data model.


Types of Data Connections: Live vs Extract

When you connect to a data source, Tableau offers two fundamental connection modes. Choosing the right one affects performance, freshness, and portability.

Live Connection

A Live connection queries your data source in real time. Every time you interact with a view — dragging a filter, switching a dimension, opening a dashboard — Tableau sends a query directly to the underlying database and retrieves fresh results.

How it works: Tableau translates your VizQL actions into SQL (or the native query language of your source) and sends them to the database. The database executes the query and returns results. This happens every time the view updates.

Best scenarios for Live connections:

  • Data changes frequently (hourly, daily) and analysts need the most current figures
  • The database is powerful (Snowflake, BigQuery, Redshift) and can handle ad hoc queries efficiently
  • Security requirements mandate that data never leaves the source system
  • The dataset is so large that extracting it is impractical

Extract (.hyper)

An Extract is a snapshot of your data stored in Tableau's proprietary columnar format (.hyper). The extract is saved locally or on Tableau Server and can be queried at very high speed, independent of the source system.

How it works: Tableau connects to the source, pulls the data (either fully or incrementally), and saves it as a compressed, column-oriented file. All subsequent queries run against this local file rather than the original source.

Best scenarios for Extract connections:

  • The source system is slow or has query limits (REST APIs, legacy databases)
  • Analysts need to work offline or in locations without network access
  • You need to share data with others without giving them source access
  • Performance is critical for end-user dashboard responsiveness
  • The source data is stable enough that a daily or hourly refresh is acceptable

Comparison Table: Live vs Extract

AttributeLive ConnectionExtract (.hyper)
Data freshnessReal-time, always currentSnapshot at time of extract
Query performanceDepends on source DBVery fast — optimized columnar store
Source dependencyAlways requires connectionCan work offline after extract
PortabilityNot portableFully portable (.hyper or .twbx)
Data volumeCan query full datasetBest practice: filter at extract time
SecurityData stays in sourceData leaves source — requires controls
RefreshNo refresh neededMust schedule or manually refresh
Source loadQueries hit source systemNo source load after initial extract
Best forReal-time ops dashboardsSelf-service BI, shared workbooks

Switching Between Live and Extract

On the Data Source page, look for the Connection radio buttons in the top-right area. You can switch from Live to Extract at any time by clicking Extract and then clicking Edit to configure what data to include. When you save an extract, Tableau creates a .hyper file. Packaged workbooks (.twbx) bundle the extract inside the workbook file for easy sharing.


Supported Data Source Types

Tableau connects to over 100 native data sources. Here is a structured overview:

File-Based Sources

File TypeExtensionNotes
Microsoft Excel.xls, .xlsxConnects to individual sheets or named ranges
Comma-Separated Values.csvText-based, widely compatible
JSON.jsonHierarchical data — Tableau flattens arrays
PDF.pdfExtracts tabular data from PDF tables
Text file.txt, .tsvConfigurable delimiter
Statistical files.sas7bdat, .sav, .rdaSAS, SPSS, R data files
Spatial files.shp, .kml, .geojsonFor geographic/mapping analysis
Microsoft Access.accdb, .mdbWindows only
ODataURLREST-based data protocol

Relational Databases

DatabaseNotes
MySQLOpen-source; widely used in web applications
PostgreSQLAdvanced open-source; excellent for analytics
Microsoft SQL ServerEnterprise standard; deep Windows ecosystem
Oracle DatabaseEnterprise standard for large organizations
Amazon RedshiftAWS-native analytical data warehouse
Google BigQueryGCP-native serverless analytical warehouse
SnowflakeCloud-agnostic data warehouse — excellent Tableau compatibility
IBM Db2Enterprise database, common in financial services
TeradataHigh-performance analytical database
SAP HANAIn-memory analytics for SAP environments

Cloud and SaaS Sources

SourceUse Case
SalesforceCRM data — leads, opportunities, accounts
Google AnalyticsWeb traffic and behavior data
Adobe AnalyticsDigital marketing analytics
MarketoMarketing automation data
ServiceNowIT service management data
HubSpotInbound marketing and sales CRM
Dropbox / BoxCloud file storage
Google SheetsCollaborative spreadsheets
Microsoft SharePointSharePoint lists and Excel files

ODBC and JDBC

For sources without a native Tableau connector, you can use ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity). This requires installing the appropriate driver and configuring a DSN (Data Source Name). Performance may be lower than native connectors.

Web Data Connector (WDC)

Tableau's Web Data Connector framework allows developers to build custom JavaScript connectors for any web-accessible API. The community has built WDCs for hundreds of services including Twitter, GitHub, weather APIs, and financial data feeds.


Connecting to Excel: Step by Step

Excel is one of the most common starting points for Tableau beginners.

Steps:

  1. Launch Tableau Desktop or Tableau Public
  2. On the Start Page, under Connect > To a File, click Microsoft Excel
  3. In the file browser, navigate to your Excel file and click Open
  4. The Data Source page opens. In the left panel, you will see all sheets and named ranges in the workbook
  5. Drag the sheet you want to analyze onto the canvas (the white area in the center)
  6. A preview of your data appears in the Data Grid at the bottom
  7. Verify that column names are correct and data types are recognized properly
  8. If your Excel file has a header row on row 2 or 3 (not row 1), click Data Interpreter — Tableau will attempt to automatically detect the real header row
  9. Toggle to Metadata Grid (the icon next to the data grid tabs) to see column names, data types, and remote field names. Rename or retype fields as needed
  10. Click a worksheet tab at the bottom to begin analysis

Common issues with Excel connections:

  • Merged cells in headers confuse Tableau — avoid merged cells in source Excel files
  • Multiple tables in a single sheet should be separated into separate sheets
  • Formulas that depend on other sheets may not evaluate correctly — values-only exports are safer
  • Date fields stored as text (e.g., "January 2024") must be cast to date type manually

Connecting to CSV: Step by Step

CSV files are the universal data format — clean, lightweight, and supported everywhere.

Steps:

  1. On the Start Page, under Connect > To a File, click Text File
  2. Browse to your .csv file and click Open
  3. Tableau auto-detects the delimiter (comma, tab, semicolon, pipe) — verify this is correct in the Data Source page
  4. The Data Grid shows a preview. Scroll right to verify all columns are captured
  5. If column headers are missing, Tableau uses F1, F2, F3 as placeholders — rename them in the Metadata Grid
  6. Check that numeric columns are not typed as strings (look for the Abc icon vs # icon on column headers)
  7. To change a data type: click the type icon on the column header and select the correct type from the dropdown
  8. Proceed to worksheet tabs to begin analysis

Tips for CSVs:

  • Use UTF-8 encoding to avoid character encoding issues
  • Avoid commas within quoted fields if possible — they can cause parsing errors
  • Ensure consistent date formats within a date column

The Data Source Page in Detail

The Data Source page is your data preparation workspace. Understanding its components makes everything downstream easier.

Left Panel

The left panel lists all available tables, sheets, or views from your connected source. For Excel, this shows each sheet. For a SQL database, it shows schemas and tables. You can:

  • Search for a specific table using the search box
  • Expand schemas to browse tables within them
  • Drag tables directly onto the canvas

Canvas

The canvas is the central workspace for building your data model. Drag tables here to begin. When you drag a second table, Tableau automatically attempts to create a join or relationship based on matching field names. The canvas displays tables as boxes connected by lines representing joins or relationships.

You can:

  • Double-click the join/relationship icon between tables to edit the join type or join conditions
  • Drag additional tables to create multi-table models
  • Click the union icon to stack tables vertically (union)

Data Grid

The Data Grid at the bottom shows a preview of your combined dataset — up to 10,000 rows by default. Use it to verify:

  • Column names look correct
  • Data types are appropriate
  • Row counts seem reasonable
  • Null values are handled properly

Metadata Grid

Toggle to the Metadata Grid (click the grid icon) to see:

  • Field Name: What Tableau will call the field in the Data pane
  • Remote Field Name: The actual column name in the source
  • Data Type: The type Tableau has assigned
  • Table: Which table the field comes from

In the Metadata Grid, you can rename fields, hide fields (right-click > Hide), and change data types — all non-destructively.


Joins in Tableau

Joins combine columns from two or more tables horizontally, matching rows based on a common key.

Join Types

Join TypeRows IncludedSQL EquivalentBest Use Case
Inner JoinOnly rows with matching keys in both tablesINNER JOINWhen you only want records with complete data on both sides
Left JoinAll rows from left table; matching rows from rightLEFT OUTER JOINKeep all records from primary table even if no match
Right JoinMatching rows from left; all rows from rightRIGHT OUTER JOINKeep all records from secondary table
Full Outer JoinAll rows from both tables; nulls where no matchFULL OUTER JOINWhen you need every record from both sides regardless of match

Creating a Join in Tableau

  1. On the Data Source page, drag the first table onto the canvas
  2. Drag the second table next to it — Tableau shows a join icon (two overlapping circles) between them
  3. Click the join icon to open the Join editor
  4. Verify the Join Type (default is Inner) and the Join Clauses (field pairs used to match rows)
  5. Add additional join conditions by clicking the + button if needed
  6. The Data Grid updates to show the joined result

Understanding Join Performance

  • Joins happen at query time in Tableau
  • For large tables, pre-joining in the database (using a SQL view or custom SQL) is often faster
  • Cross-database joins (joining tables from two different sources) are possible but slower than single-source joins

When to Use Each Join Type

Scenario: You have an Orders table and a Returns table. You want to see all orders and flag which ones were returned.

  • Answer: Left Join from Orders to Returns — keeps all orders, adds return information where it exists (NULL where not returned)

Scenario: You have a Products table and a Sales table. You want only products that have actually been sold.

  • Answer: Inner Join — excludes products with no sales records

Unions in Tableau

A Union stacks tables vertically — appending rows from one table beneath another. Both tables must have the same (or compatible) column structure.

When to Use Unions

  • Sales data split across multiple CSV files (one per month, one per region)
  • Multiple Excel sheets with the same structure
  • Historical data split across tables by year

Creating a Union

  1. Drag the first table onto the canvas
  2. Drag the second table directly on top of the first — Tableau shows a Union prompt (vs. a join prompt when you drop it beside)
  3. Tableau creates a union and shows a "Sheet" column identifying which source each row came from
  4. You can expand the union to add more tables, or use Wildcard Union to automatically include all sheets matching a pattern

Wildcard Union

Wildcard Union is a powerful feature for when you have many files with the same structure — for example, monthly CSV exports.

  1. Connect to one of the CSV files
  2. On the Data Source page, drag the file to the canvas
  3. Click the union dropdown arrow and select Edit Union
  4. Switch from Specific to Wildcard (automatic) — Tableau will include all matching files in the same folder

Data Blending vs Joins

Data Blending and Joins both combine data from multiple sources, but they work very differently and suit different scenarios.

Joins

  • Happen at the row level before aggregation
  • Both tables must be in the same data source (or cross-database join)
  • Result is a single combined table

Data Blending

  • Happens after aggregation — each source is queried independently
  • Used when you cannot join at the row level (different granularities, different sources, different databases)
  • One source is the Primary, the other is the Secondary
  • The Secondary source contributes aggregated values to the Primary view

Key Differences Table

FeatureJoinData Blend
TimingBefore aggregation (row-level)After aggregation
Data sourcesSame source (or cross-DB join)Can be completely different sources
GranularityMust match on a keyCan differ — blending aggregates first
PerformanceGenerally fasterCan be slower for large secondaries
FlexibilityLess flexible with mismatched levelsMore flexible
NullsCreates nulls for non-matching rowsNon-linked fields show asterisk (*)

When to Blend

Use Data Blending when:

  • Your data lives in incompatible sources (e.g., SQL Server + Google Sheets)
  • The tables have different granularities (e.g., daily sales vs monthly targets)
  • A join would cause row duplication due to one-to-many relationships

Setting Up a Blend

  1. Connect to your Primary source and build a view
  2. Connect to your Secondary source (via Data > New Data Source)
  3. In the view, both sources appear in the Data pane — the primary has a blue checkmark, the secondary has an orange link icon
  4. Ensure the linking fields (the common dimensions) have the same name, or manually define the relationship in Data > Edit Blend Relationships

Relationships (Tableau 2020.2+)

Starting with Tableau 2020.2, Relationships replaced joins as the default data modeling approach. Relationships are more flexible, more intelligent, and avoid many of the pitfalls of traditional joins.

How Relationships Differ from Joins

ConceptTraditional JoinRelationship
When combinedAt data source level, alwaysAt query time, only when needed
Row duplicationYes, with one-to-manyNo — each table queried independently
Null handlingDepends on join typeAutomatic — preserves row counts
LOD calculationsCan break with joinsContext-aware, table-level accurate
FlexibilityRigid — fixed join typeFlexible — adapts to the analysis

How Relationships Work

Think of Relationships as "noodles" connecting tables — they define how tables are related without immediately merging them. Tableau stores the tables separately and only combines them when the view actually needs both tables simultaneously. This means:

  • A SUM([Sales]) from the Orders table never double-counts just because Returns is also related
  • You can have different levels of granularity between related tables without distortion
  • Tableau automatically determines the join type (inner/outer) based on what the visualization needs

Creating a Relationship

  1. Drag the first table onto the canvas (the canvas switches to "relationship mode" — it looks different from the join canvas)
  2. Drag the second table onto the canvas
  3. Tableau draws a line (noodle) between them and suggests matching fields
  4. Click the line to see and edit the relationship fields
  5. Optionally configure Performance Options: tell Tableau whether the relationship is one-to-one, one-to-many, or many-to-many (helps optimize queries)

Data Source Filters

Data Source Filters are applied at the connection level — before any data reaches the worksheet. They are the most efficient filters in Tableau because they reduce the dataset at the source.

Creating a Data Source Filter

  1. On the Data Source page, click Add in the Filters section (top-right corner)
  2. Click Add in the dialog box
  3. Select a field to filter on
  4. Configure the filter condition (e.g., [Year] = 2024 or [Region] IN ("North", "South"))
  5. Click OK

Best practices for Data Source Filters:

  • Filter out irrelevant historical data to improve extract performance
  • Use Data Source Filters to restrict data by geography, business unit, or date range
  • They apply to all worksheets using that data source — use carefully if some sheets need the full dataset

Hiding and Renaming Fields

Keeping your Data pane organized dramatically improves analyst productivity — especially on large datasets with hundreds of columns.

Hiding Fields

Right-click any field in the Data pane or Metadata Grid and select Hide. Hidden fields do not appear in the Data pane but still exist in the data. They can be unhidden at any time.

Use hiding to: Remove ID columns, internal flags, system fields, or any field that analysts will never use directly.

Renaming Fields

Double-click a field name in the Data pane or Metadata Grid to rename it. Renaming is non-destructive — the original column name in the source is preserved (visible as "Remote Field Name" in the Metadata Grid).

Best practices:

  • Rename abbreviations to full words (QtyQuantity, PrdProduct)
  • Remove underscores and add spaces (order_dateOrder Date)
  • Use consistent title case formatting

Changing Data Types

Tableau auto-detects data types, but sometimes gets them wrong — especially with dates stored as text, or ZIP codes stored as numbers.

Available Data Types

IconTypeExamples
AbcString (text)"Customer Name", "Region"
#Number (decimal)1234.56
#Number (whole)1234
Calendar iconDate2024-01-15
Calendar+clock iconDate & Time2024-01-15 14:30:00
T/FBooleanTrue / False
Globe iconGeographic roleCountry, State, ZIP Code

Changing a Data Type

  1. In the Data Source page Metadata Grid, click the type icon on the column header
  2. Select the correct type from the dropdown
  3. Tableau converts all values in that column to the new type (values that cannot be converted become Null)

Common corrections:

  • ZIP codes detected as integers → change to String (to preserve leading zeros)
  • Date columns stored as text → change to Date (ensure format matches, e.g., MM/DD/YYYY)
  • Boolean columns stored as integers (0/1) → change to Boolean

Geographic Roles

Tableau can plot geographic fields on maps automatically if they are assigned the correct Geographic Role.

  1. Right-click a field in the Data pane
  2. Hover over Geographic Role
  3. Select the appropriate role: Country/Region, State/Province, City, ZIP Code/Postcode, Latitude, Longitude, etc.

Tableau matches field values against its internal geographic database. Fields that match successfully get a globe icon and can be placed on the view to generate an automatic map.


Practice Exercises

Exercise 1: Live vs Extract Decision Making

Objective: Choose the right connection type for real-world scenarios.

For each scenario below, decide whether a Live connection or an Extract is more appropriate. Write a 2-3 sentence justification for each choice.

Scenarios:

  1. A retail chain's operations team monitors store sales every hour and needs to react immediately to anomalies. Their data lives in a high-performance Redshift warehouse.
  2. A marketing analyst needs to share a quarterly campaign performance report with 50 non-technical stakeholders. Data comes from Google Analytics and Salesforce. Updates are needed weekly.
  3. A financial analyst in a regulated industry cannot allow sensitive customer data to leave the on-premises Oracle database under any circumstances.
  4. A startup is building a public-facing COVID-19 tracker using daily CSV files published by a government agency.

Deliverable: A decision table with connection type choice and justification for each scenario.

Exercise 2: Join Exploration with Superstore

Objective: Experience different join types and understand how they affect row counts.

Steps:

  1. Download or locate the Sample - Superstore Excel file
  2. Connect to it in Tableau
  3. Drag the Orders sheet to the canvas
  4. Drag the Returns sheet next to it — observe the automatic join
  5. Click the join icon. What join type did Tableau choose? Why?
  6. Change the join to a Left Join — how does the row count in the Data Grid change?
  7. Change to Inner Join — how many rows are there now? What happened to non-returned orders?
  8. Change to Full Outer Join — what do the NULL values represent?
  9. Deliverable: A table recording row counts for each join type and an explanation of the difference

Exercise 3: Data Source Preparation

Objective: Clean and organize a data source using the Data Source page tools.

Steps:

  1. Connect Tableau to the Superstore Excel file
  2. On the Data Source page, open the Metadata Grid
  3. Rename the following fields:
    • Row IDRecord ID
    • Postal CodeZIP Code
  4. Hide the following fields that are not needed for analysis: Row ID (renamed), Country/Region (since all data is USA)
  5. Change ZIP Code to the String data type
  6. Assign State/Province the Geographic Role of State/Province
  7. Add a Data Source Filter: include only orders from the year 2023 and 2024 (Order Date year is 2023 or 2024)
  8. Navigate to a worksheet and confirm only 2023-2024 data appears
  9. Deliverable: A screenshot of your cleaned Data Source page with the filter, renamed fields, and hidden fields visible

Summary

This chapter covered everything you need to know to confidently connect Tableau to any data source and prepare it for analysis.

Key takeaways:

  • Live connections query data in real time — best for fresh, fast databases where timeliness matters
  • Extracts (.hyper) are local snapshots — best for performance, portability, and slow/limited sources
  • Tableau supports over 100 native connectors spanning files, relational databases, cloud platforms, and statistical tools
  • Connecting to Excel and CSV is straightforward — use Data Interpreter for messy headers and always verify data types
  • The Data Source page has four key areas: left panel (tables), canvas (data model), Data Grid (row preview), and Metadata Grid (column properties)
  • Joins combine tables horizontally at the row level; choose Inner, Left, Right, or Full Outer based on which rows you need to preserve
  • Unions stack tables vertically — both tables must have compatible column structures
  • Data Blending combines aggregated results from separate sources — use it when row-level joins are impossible or inappropriate
  • Relationships (Tableau 2020.2+) are the modern, flexible default — they avoid row duplication and adapt automatically to the analysis context
  • Data Source Filters restrict data at the connection level for performance and security
  • Always rename, hide, and retype fields on the Data Source page to keep your Data pane organized and analysis-ready

In the next chapter, you will put this data to work — building your first visualizations using Tableau's drag-and-drop interface, mastering chart types, and learning the visual design principles that separate a good chart from a great one.