The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K…

Follow publication

ETL Pipeline and Data Pipeline Comparison

Scott Guttenberger
The Startup
Published in
5 min readJan 21, 2021

--

ETL and Data Pipelines seem to be cropping up in every work conversation, have you noticed? As business moves more digital and companies build out their tech stacks these two concepts, ETL Pipeline and Data Pipelines, will become more critical.

Take for example a simple comment on an Instagram post, it’s picked up by your social media tool and perhaps thrown into an advertising segment. Around the same time, you might be reporting on this action and others in a weekly social media report or on a live dashboard. Perhaps sales or customer success is also triggered off a step along the path of this prospect. The data and the message behind that data is also part of several other pipelines.

For this article, let’s take a look at what a data pipeline is and what an ETL pipeline is. We will discuss some key differences that might help you understand your data workflows. Finally, we will take a look at why you might want to use an ETL and why you might want to use data pipelines.

What is a Data Pipeline?

Simply put, “Data Pipeline” is a term that can describe any process that moves data from one platform to another. It might transform data, but sometimes not. Some real-world examples of this would be duplicating data, filter data, moving (migrating) to the cloud, and any data enrichment processes.

Use case examples for Data Pipeline

  • Predictive analytics
  • Real-time dashboards or reporting, metric updates/refresh
  • Moving, storing, enriching, or transforming

What is an ETL Pipeline?

ETL stands for “Extract”, “Transform”, and “Load”. This is when there is a collection of different processes that extract data, transform that data, and loads that data into a selected destination. The source of this data is probably from Salesforce or a social paid campaign. Any business or marketing system including data warehouses, cloud-hosted database like Amazon RedShift, Snowflake, and Google BigQuery.

Use case examples for ETL Pipeline

  • Use to centralize data across your whole company. The “One-Truth” example.
  • Moving data between your different data silo’s
  • Enriching your CRM with all your additional related data

Key Differentiators

This is where many get confused about what an ETL is and what a Data Pipeline is. The terms are related, which doesn’t help the confusion. Both are terms for processes by which you are moving data from point A to point B but they are different.

The end of the line

An ETL pipeline ends with Load. While a Data Pipeline doesn’t typically end with loading data to a different platform. In fact, with Data Pipelines the loading can spark a new process in other systems.

Transformation is key

Simply put, the ETL process always will involve Transformation and then loading. Data pipelines typically won’t have the transformation step.

Real-time or Batch Process

Probably the most notable is how the data is worked within the real world. ETL is usually run in batches or chunks and these are typically on a schedule. This is a great choice when you need to move large blocks of data and you want to take advantage of downtime, like closed business hours. On the other side of this coin Data Pipelines often run in real-time. This is great when your output is a live expansive dashboard. Personally, this is a major choice to make if you are offering a data visualization service.

Why Use ETL Pipelines?

To make this easy, use ETL when you need to Extract, Transform then Load data. If you are more focused on deep analytics and moving data sets after some basic transformation ETL is more your solution. If you are about to migrate data from a legacy system, ETL might be a good choice.

As mentioned above the heavy lifting here is in the transformation of the data. Imagine data coming from many different sources; website, PPC, CSV, CRM, and sales systems. During transformation, all these different data sources with different data types or names are transformed into a unified format. Once this data is unified the Load process takes over. This data then becomes easily available to many individuals.

By the end of this process, your data is all uniform and ready to be visualized or moved to a platform that your entire team could use. Your development team will love the logic rules application. The output is what a business would call “clean data.”

Why Use Data Pipelines?

With cloud on the rise and remote work becoming the norm companies are using many different platforms and applications to serve many different functions. Often the result of this messy, shallow tech stack is fragmented data. Data silos often reduce your analytical ability. Fetching data from several places and trying to combine and visualize in this manner is time-consuming. Let’s not even get started on trying to implement streaming data in this example!

Data Pipelines would consolidate this data from all your third-party sources and move them to a shared destination. This gives you a quick analysis of your data consolidating the many into one. This also ensured the quality of your data is high, removing some of the room for error.

What Should I Use?

Ah, the main question. Like most things in life the answer is never as simple as you hope it to be. It depends on your needs, what are you trying to do?

A general rule of thumb might help; why do we move data? More often than not we move data to allow us a deeper analysis of something. ETL and Data Pipelines are a way to manage your data and structure it. Are you needing to transform and load or is your need to group and move to a different platform?

Scott Guttenberger

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Scott Guttenberger
Scott Guttenberger

Written by Scott Guttenberger

Strategic executive marketer with more than a decade of experience in fast-paced organizations in Web3, blockchain, NFTs, and SaaS. https://linktr.ee/0xxerobit

Responses (1)

Write a response