• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Friday, May 30, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Deploy Amazon SageMaker Projects with Terraform Cloud

    Artificial Intelligence

    Data Science ETL Pipelines with DuckDB

    Artificial Intelligence

    New Amazon Bedrock Data Automation capabilities streamline video and audio analysis

    Artificial Intelligence

    Surprising Things You Can Do with Python’s csv Module

    Artificial Intelligence

    Set up a custom plugin on Amazon Q Business and authenticate with Amazon Cognito to interact with backend systems

    Artificial Intelligence

    StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

    Artificial Intelligence

    3 Excellent Practical Generative AI Courses

    Artificial Intelligence

    Building End-to-End Data Pipelines with Dask

    Artificial Intelligence

    Automate document translation and standardization with Amazon Bedrock and Amazon Translate

  • Crypto
    XRP’s Institutional Access Expands With Hashkey Exchange Listing

    XRP’s Institutional Access Expands With Hashkey Exchange Listing

    Caricom Bloc Advances Pilot to Reduce Dollar Dependence

    Caricom Bloc Advances Pilot to Reduce Dollar Dependence

    Best Presales to Buy Today – Which Coins Are Poised for a Breakout?

    Google AI Forecasts XRP, ADA, TRUMP

    Trump Media Raises $2.32B—Bitcoin Treasury and Big Moves Ahead

    Trump Media Raises $2.32B—Bitcoin Treasury and Big Moves Ahead

    Solana Set to Reclaim $200? PumpSwap’s $2.5B Launch Puts DEX Fuel Behind SOL

    What’s Elon Planning With Toncoin? Price Prediction After $300M Grok AI Deal

    Bitcoin ETF Streak Ends With $359 Million Outflow as Ether ETFs Keep Momentum

    Bitcoin ETF Streak Ends With $359 Million Outflow as Ether ETFs Keep Momentum

    Hyperliquid’s Wynn has $99M Bitcoin liquidation

    Hyperliquid’s Wynn has $99M Bitcoin liquidation

    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    Staking on PoS Blockchains Not a Security, Says SEC Staff

    Ripple USD Accelerates: 0 Fees, 140+ Onramps, 15K RLUSD Reward Storm

    Ripple USD Accelerates: 0 Fees, 140+ Onramps, 15K RLUSD Reward Storm

  • Cybersecurity
    Cybersecurity

    Czech Republic Blames China-Linked APT31 Hackers for 2022 Cyberattack

    Cybersecurity

    Microsoft OneDrive File Picker Flaw Grants Apps Full Cloud Access — Even When Uploading Just One File

    Cybersecurity

    251 Amazon-Hosted IPs Used in Exploit Scan Targeting ColdFusion, Struts, and Elasticsearch

    Cybersecurity

    Apple Blocks $9 Billion in Fraud Over 5 Years Amid Rising App Store Threats

    Cybersecurity

    New Self-Spreading Malware Infects Docker Containers to Mine Dero Cryptocurrency

    Cybersecurity

    How to Deploy AI More Securely at Scale

    Cybersecurity

    FBI Alerts Law Firms to Luna Moth’s Stealth Phishing Campaign

    Cybersecurity

    Russia-Linked Hackers Target Tajikistan Government with Weaponized Word Documents

    Cybersecurity

    ViciousTrap Uses Cisco Flaw to Build Global Honeypot from 5,300 Compromised Devices

  • Deals
    RedLemon 75X60 Inches L Shaped Electric Standing Desk, Height Adjustable Corner Gaming…

    RedLemon 75X60 Inches L Shaped Electric Standing Desk, Height Adjustable Corner Gaming…

    VOYEE Switch Controller Wireless, Pro Controllers Compatible with Switch Lite/OLED/PC,…

    VOYEE Switch Controller Wireless, Pro Controllers Compatible with Switch Lite/OLED/PC,…

    StarTech.com USB C to HDMI Adapter – 4K 60Hz – Thunderbolt 3 Compatible – USB-C Adapter…

    StarTech.com USB C to HDMI Adapter – 4K 60Hz – Thunderbolt 3 Compatible – USB-C Adapter…

    UGREEN 30W USB C Charger, Nexode Foldable GaN PPS Compact Fast Wall Charger Block, USB-C…

    UGREEN 30W USB C Charger, Nexode Foldable GaN PPS Compact Fast Wall Charger Block, USB-C…

    PNY VCQRTX5000-PB Graphics Card Quadro RTX 5000 16 GB GDDR6 (Renewed)

    PNY VCQRTX5000-PB Graphics Card Quadro RTX 5000 16 GB GDDR6 (Renewed)

    FOCO Men’s NFL Team Logo Memory Foam Slide Slippers

    FOCO Men’s NFL Team Logo Memory Foam Slide Slippers

    Charger for Lenovo Ideapad 3, 330, 330S, S340, S145, 320, 310, 510, 520, 3-14, 3-15,…

    Charger for Lenovo Ideapad 3, 330, 330S, S340, S145, 320, 310, 510, 520, 3-14, 3-15,…

    Kingston

    Kingston

    JMT M.2 NGFF Key B to Dual Micro SD Card TF Card Adapter Support USB Bus, M.2 Key B in…

    JMT M.2 NGFF Key B to Dual Micro SD Card TF Card Adapter Support USB Bus, M.2 Key B in…

  • Gaming
    Yooka-Laylee developer Playtonic confirms layoffs: ‘The landscape is shifting, and with it, so must we’

    Yooka-Laylee developer Playtonic confirms layoffs: ‘The landscape is shifting, and with it, so must we’

    Basically A Review Of: God of War Ragnarok

    Basically A Review Of: God of War Ragnarok

    God Of War Ragnarok Walkthrough Part 1 (PS4)

    God Of War Ragnarok Walkthrough Part 1 (PS4)

    5 reasons Forspoken Won’t Be Cringe – NEW FORSPOKEN 4K GAMEPLAY

    5 reasons Forspoken Won’t Be Cringe – NEW FORSPOKEN 4K GAMEPLAY

    REDRAGON S101 GAMING KEYBOARD

    You can get 15 percent off Elden Ring Nightreign on Steam and Xbox

    Reviewing the upcoming Fnaf trailers

    Reviewing the upcoming Fnaf trailers

    ELDEN RING Wretch Walkthrough Part 6

    ELDEN RING Wretch Walkthrough Part 6

    Elden Ring – Starscourge Radahn Boss Fight (4K 60FPS)

    Elden Ring – Starscourge Radahn Boss Fight (4K 60FPS)

    ZeniMax QA union reaches a tentative contract agreement with Microsoft including ‘substantial across-the-board wage increases,’ worker protections, and more

    ZeniMax QA union reaches a tentative contract agreement with Microsoft including ‘substantial across-the-board wage increases,’ worker protections, and more

  • Tesla
    Sunglasses Holder for Car Visor, Leather Magnetic Visor Glasses Protective Storage Case,…

    Sunglasses Holder for Car Visor, Leather Magnetic Visor Glasses Protective Storage Case,…

    Nappa Leather Steering Wheel Cover for Tesla Model 3/Y 2016-2025 2026 Juniper/Highland,…

    Nappa Leather Steering Wheel Cover for Tesla Model 3/Y 2016-2025 2026 Juniper/Highland,…

    TMS 800LB Universal Pick Up Truck Ladder Rack Contractor Pick Up Rack Lumber Utility(US…

    TMS 800LB Universal Pick Up Truck Ladder Rack Contractor Pick Up Rack Lumber Utility(US…

    OEDRO Floor Mats Fit for 2018-2024 2025 Jeep Wrangler JL Unlimited 4-Door, TPE…

    OEDRO Floor Mats Fit for 2018-2024 2025 Jeep Wrangler JL Unlimited 4-Door, TPE…

    Tesla Cybertruck Hidden Screen Dash Storage Tray Cybertruck Center Console Organizer…

    Tesla Cybertruck Hidden Screen Dash Storage Tray Cybertruck Center Console Organizer…

    RC Cybertruck with Seamless Plug-in Recharge, 1.5 Hours Play-time, and Refined Alignment…

    RC Cybertruck with Seamless Plug-in Recharge, 1.5 Hours Play-time, and Refined Alignment…

    2 PCS Car Storage Net Bag, 7.87″ x 3.14″ Multi-Function Mobile Phone Storage Bag,…

    2 PCS Car Storage Net Bag, 7.87″ x 3.14″ Multi-Function Mobile Phone Storage Bag,…

    OMBAR Dash Cam Front and Rear 5G WiFi, Dash Cam 4K/2K/1080P+1080P, Dash Camera for Cars…

    OMBAR Dash Cam Front and Rear 5G WiFi, Dash Cam 4K/2K/1080P+1080P, Dash Camera for Cars…

    Tesla sales are down in every single European country except the UK, here’s why

    Tesla’s sales fall 87% in Quebec as its market gets wiped out

  • UFO
    How Do We Communicate With Spacecraft? We Asked a NASA Expert

    How Do We Communicate With Spacecraft? We Asked a NASA Expert

    Mandelli – Waboba Wingman UFO Toy, Colour Multi Colour, 6 inch, 303.101-APINEAPPLE

    Mandelli – Waboba Wingman UFO Toy, Colour Multi Colour, 6 inch, 303.101-APINEAPPLE

    Mysterious UFO Sightings | Pilots Encounter Strange Objects in the Sky

    Mysterious UFO Sightings | Pilots Encounter Strange Objects in the Sky

    Alien Agenda Planet Earth The Cosmic Conspiracy [DVD]

    Alien Agenda Planet Earth The Cosmic Conspiracy [DVD]

    Top 5 RAREST Natural Phenomena That Could DESTROY Our World!?

    Top 5 RAREST Natural Phenomena That Could DESTROY Our World!?

    UFO Chaos Vlog mit Paluten

    UFO Chaos Vlog mit Paluten

    Sunco UFO LED High Bay Light, Shop Lights for Warehouse, Plug & Play, 19500 LM, 150W, 5000K Daylight, Power Cord Included, IP65 Waterproof Shatterproof Fixture UL 3 Pack.

    Sunco UFO LED High Bay Light, Shop Lights for Warehouse, Plug & Play, 19500 LM, 150W, 5000K Daylight, Power Cord Included, IP65 Waterproof Shatterproof Fixture UL 3 Pack.

    Ryushi: The Aftermath and the Abomination

    Ryushi: The Aftermath and the Abomination

    When 300 UFO Sightings Were Reported in Texas

    When 300 UFO Sightings Were Reported in Texas

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Deploy Amazon SageMaker Projects with Terraform Cloud

    Artificial Intelligence

    Data Science ETL Pipelines with DuckDB

    Artificial Intelligence

    New Amazon Bedrock Data Automation capabilities streamline video and audio analysis

    Artificial Intelligence

    Surprising Things You Can Do with Python’s csv Module

    Artificial Intelligence

    Set up a custom plugin on Amazon Q Business and authenticate with Amazon Cognito to interact with backend systems

    Artificial Intelligence

    StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

    Artificial Intelligence

    3 Excellent Practical Generative AI Courses

    Artificial Intelligence

    Building End-to-End Data Pipelines with Dask

    Artificial Intelligence

    Automate document translation and standardization with Amazon Bedrock and Amazon Translate

  • Crypto
    XRP’s Institutional Access Expands With Hashkey Exchange Listing

    XRP’s Institutional Access Expands With Hashkey Exchange Listing

    Caricom Bloc Advances Pilot to Reduce Dollar Dependence

    Caricom Bloc Advances Pilot to Reduce Dollar Dependence

    Best Presales to Buy Today – Which Coins Are Poised for a Breakout?

    Google AI Forecasts XRP, ADA, TRUMP

    Trump Media Raises $2.32B—Bitcoin Treasury and Big Moves Ahead

    Trump Media Raises $2.32B—Bitcoin Treasury and Big Moves Ahead

    Solana Set to Reclaim $200? PumpSwap’s $2.5B Launch Puts DEX Fuel Behind SOL

    What’s Elon Planning With Toncoin? Price Prediction After $300M Grok AI Deal

    Bitcoin ETF Streak Ends With $359 Million Outflow as Ether ETFs Keep Momentum

    Bitcoin ETF Streak Ends With $359 Million Outflow as Ether ETFs Keep Momentum

    Hyperliquid’s Wynn has $99M Bitcoin liquidation

    Hyperliquid’s Wynn has $99M Bitcoin liquidation

    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    Staking on PoS Blockchains Not a Security, Says SEC Staff

    Ripple USD Accelerates: 0 Fees, 140+ Onramps, 15K RLUSD Reward Storm

    Ripple USD Accelerates: 0 Fees, 140+ Onramps, 15K RLUSD Reward Storm

  • Cybersecurity
    Cybersecurity

    Czech Republic Blames China-Linked APT31 Hackers for 2022 Cyberattack

    Cybersecurity

    Microsoft OneDrive File Picker Flaw Grants Apps Full Cloud Access — Even When Uploading Just One File

    Cybersecurity

    251 Amazon-Hosted IPs Used in Exploit Scan Targeting ColdFusion, Struts, and Elasticsearch

    Cybersecurity

    Apple Blocks $9 Billion in Fraud Over 5 Years Amid Rising App Store Threats

    Cybersecurity

    New Self-Spreading Malware Infects Docker Containers to Mine Dero Cryptocurrency

    Cybersecurity

    How to Deploy AI More Securely at Scale

    Cybersecurity

    FBI Alerts Law Firms to Luna Moth’s Stealth Phishing Campaign

    Cybersecurity

    Russia-Linked Hackers Target Tajikistan Government with Weaponized Word Documents

    Cybersecurity

    ViciousTrap Uses Cisco Flaw to Build Global Honeypot from 5,300 Compromised Devices

  • Deals
    RedLemon 75X60 Inches L Shaped Electric Standing Desk, Height Adjustable Corner Gaming…

    RedLemon 75X60 Inches L Shaped Electric Standing Desk, Height Adjustable Corner Gaming…

    VOYEE Switch Controller Wireless, Pro Controllers Compatible with Switch Lite/OLED/PC,…

    VOYEE Switch Controller Wireless, Pro Controllers Compatible with Switch Lite/OLED/PC,…

    StarTech.com USB C to HDMI Adapter – 4K 60Hz – Thunderbolt 3 Compatible – USB-C Adapter…

    StarTech.com USB C to HDMI Adapter – 4K 60Hz – Thunderbolt 3 Compatible – USB-C Adapter…

    UGREEN 30W USB C Charger, Nexode Foldable GaN PPS Compact Fast Wall Charger Block, USB-C…

    UGREEN 30W USB C Charger, Nexode Foldable GaN PPS Compact Fast Wall Charger Block, USB-C…

    PNY VCQRTX5000-PB Graphics Card Quadro RTX 5000 16 GB GDDR6 (Renewed)

    PNY VCQRTX5000-PB Graphics Card Quadro RTX 5000 16 GB GDDR6 (Renewed)

    FOCO Men’s NFL Team Logo Memory Foam Slide Slippers

    FOCO Men’s NFL Team Logo Memory Foam Slide Slippers

    Charger for Lenovo Ideapad 3, 330, 330S, S340, S145, 320, 310, 510, 520, 3-14, 3-15,…

    Charger for Lenovo Ideapad 3, 330, 330S, S340, S145, 320, 310, 510, 520, 3-14, 3-15,…

    Kingston

    Kingston

    JMT M.2 NGFF Key B to Dual Micro SD Card TF Card Adapter Support USB Bus, M.2 Key B in…

    JMT M.2 NGFF Key B to Dual Micro SD Card TF Card Adapter Support USB Bus, M.2 Key B in…

  • Gaming
    Yooka-Laylee developer Playtonic confirms layoffs: ‘The landscape is shifting, and with it, so must we’

    Yooka-Laylee developer Playtonic confirms layoffs: ‘The landscape is shifting, and with it, so must we’

    Basically A Review Of: God of War Ragnarok

    Basically A Review Of: God of War Ragnarok

    God Of War Ragnarok Walkthrough Part 1 (PS4)

    God Of War Ragnarok Walkthrough Part 1 (PS4)

    5 reasons Forspoken Won’t Be Cringe – NEW FORSPOKEN 4K GAMEPLAY

    5 reasons Forspoken Won’t Be Cringe – NEW FORSPOKEN 4K GAMEPLAY

    REDRAGON S101 GAMING KEYBOARD

    You can get 15 percent off Elden Ring Nightreign on Steam and Xbox

    Reviewing the upcoming Fnaf trailers

    Reviewing the upcoming Fnaf trailers

    ELDEN RING Wretch Walkthrough Part 6

    ELDEN RING Wretch Walkthrough Part 6

    Elden Ring – Starscourge Radahn Boss Fight (4K 60FPS)

    Elden Ring – Starscourge Radahn Boss Fight (4K 60FPS)

    ZeniMax QA union reaches a tentative contract agreement with Microsoft including ‘substantial across-the-board wage increases,’ worker protections, and more

    ZeniMax QA union reaches a tentative contract agreement with Microsoft including ‘substantial across-the-board wage increases,’ worker protections, and more

  • Tesla
    Sunglasses Holder for Car Visor, Leather Magnetic Visor Glasses Protective Storage Case,…

    Sunglasses Holder for Car Visor, Leather Magnetic Visor Glasses Protective Storage Case,…

    Nappa Leather Steering Wheel Cover for Tesla Model 3/Y 2016-2025 2026 Juniper/Highland,…

    Nappa Leather Steering Wheel Cover for Tesla Model 3/Y 2016-2025 2026 Juniper/Highland,…

    TMS 800LB Universal Pick Up Truck Ladder Rack Contractor Pick Up Rack Lumber Utility(US…

    TMS 800LB Universal Pick Up Truck Ladder Rack Contractor Pick Up Rack Lumber Utility(US…

    OEDRO Floor Mats Fit for 2018-2024 2025 Jeep Wrangler JL Unlimited 4-Door, TPE…

    OEDRO Floor Mats Fit for 2018-2024 2025 Jeep Wrangler JL Unlimited 4-Door, TPE…

    Tesla Cybertruck Hidden Screen Dash Storage Tray Cybertruck Center Console Organizer…

    Tesla Cybertruck Hidden Screen Dash Storage Tray Cybertruck Center Console Organizer…

    RC Cybertruck with Seamless Plug-in Recharge, 1.5 Hours Play-time, and Refined Alignment…

    RC Cybertruck with Seamless Plug-in Recharge, 1.5 Hours Play-time, and Refined Alignment…

    2 PCS Car Storage Net Bag, 7.87″ x 3.14″ Multi-Function Mobile Phone Storage Bag,…

    2 PCS Car Storage Net Bag, 7.87″ x 3.14″ Multi-Function Mobile Phone Storage Bag,…

    OMBAR Dash Cam Front and Rear 5G WiFi, Dash Cam 4K/2K/1080P+1080P, Dash Camera for Cars…

    OMBAR Dash Cam Front and Rear 5G WiFi, Dash Cam 4K/2K/1080P+1080P, Dash Camera for Cars…

    Tesla sales are down in every single European country except the UK, here’s why

    Tesla’s sales fall 87% in Quebec as its market gets wiped out

  • UFO
    How Do We Communicate With Spacecraft? We Asked a NASA Expert

    How Do We Communicate With Spacecraft? We Asked a NASA Expert

    Mandelli – Waboba Wingman UFO Toy, Colour Multi Colour, 6 inch, 303.101-APINEAPPLE

    Mandelli – Waboba Wingman UFO Toy, Colour Multi Colour, 6 inch, 303.101-APINEAPPLE

    Mysterious UFO Sightings | Pilots Encounter Strange Objects in the Sky

    Mysterious UFO Sightings | Pilots Encounter Strange Objects in the Sky

    Alien Agenda Planet Earth The Cosmic Conspiracy [DVD]

    Alien Agenda Planet Earth The Cosmic Conspiracy [DVD]

    Top 5 RAREST Natural Phenomena That Could DESTROY Our World!?

    Top 5 RAREST Natural Phenomena That Could DESTROY Our World!?

    UFO Chaos Vlog mit Paluten

    UFO Chaos Vlog mit Paluten

    Sunco UFO LED High Bay Light, Shop Lights for Warehouse, Plug & Play, 19500 LM, 150W, 5000K Daylight, Power Cord Included, IP65 Waterproof Shatterproof Fixture UL 3 Pack.

    Sunco UFO LED High Bay Light, Shop Lights for Warehouse, Plug & Play, 19500 LM, 150W, 5000K Daylight, Power Cord Included, IP65 Waterproof Shatterproof Fixture UL 3 Pack.

    Ryushi: The Aftermath and the Abomination

    Ryushi: The Aftermath and the Abomination

    When 300 UFO Sightings Were Reported in Texas

    When 300 UFO Sightings Were Reported in Texas

No Result
View All Result
Techcratic
No Result
View All Result
Home AI

Building End-to-End Data Pipelines with Dask

AI by AI
May 5, 2025
in AI
Reading Time: 9 mins read
123 8
A A
0
Share on FacebookShare on XShare on LinkedIn

Cornellius Yudha Wijaya
2025-05-05 10:00:00
www.kdnuggets.com

Building End-to-End Data Pipelines with DaskImage by Author | Ideogram

 

Data is a vital asset that businesses utilize to gain a competitive edge. With advancements in technology, it has become much easier to collect and store data. The problem is that the abundance of data hinders processing, as it becomes slower with larger data sizes.

To enhance data processing, several tools can assist, including Dask. Dask is a powerful Python library that provides a Pandas-compatible API to scale data processing via parallel, out-of-core computation. It handles large datasets by partitioning workflows into smaller batches and executing them concurrently across multiple cores or machines.

As Dask is a valuable tool, it was wise to learn how to establish an end-to-end data pipeline that any data professional can use. That’s why this article will teach you how to set up the data pipeline with Dask.

Let’s get into it.
 

Preparation

 
For this tutorial to work, we need to set up a few things. First, we will establish a database to store our data. In this case, we will use MySQL as the database; therefore, simply download it and follow the standard installation instructions.

For the dataset, we will utilise the Data Scientist Salary dataset, which is publicly available on Kaggle. Save the data in a folder called ‘data’ and leave it for now.

Next, set up the environment by creating a virtual environment using the following code.

python -m venv dask_pipeline 

 
You can choose other names for your virtual environment, but I prefer a self-explanatory name. Activate the virtual environment and create a requirements.txt file, which will be populated with the necessary libraries for the project.

dask[complete]      
pandas            
numpy              
sqlalchemy            
PyMySQL        
luigi            
python-dotenv
setuptools

 
Once the file is ready, we will install the libraries using the following code.

pip install -r requirements.txt

 
Then, create a file called ‘.env’, where we will store all the variables used in this project, primarily for database access. Fill the file with the following information:

DB_USER=your_username
DB_PASS=your_password
DB_HOST=localhost
DB_PORT=3306
DB_NAME=analytics

 
Then, create a file called config.py, which will be used for connecting to the database.

from dotenv import load_dotenv
import os

load_dotenv()

DB_USER = os.getenv("DB_USER")
DB_PASS = os.getenv("DB_PASS")
DB_HOST = os.getenv("DB_HOST")
DB_PORT = os.getenv("DB_PORT")
DB_NAME = os.getenv("DB_NAME")

CONN_STR = (
    f"mysql+pymysql://{DB_USER}:{DB_PASS}@"
    f"{DB_HOST}:{DB_PORT}/{DB_NAME}"
)

 

With everything in place, we will then create our end-to-end data pipelines with Dask.
 

Data Pipeline with Dask

 
To set up the data pipeline, we will utilize the Luigi Python library, which is typically used to build complex pipelines for batch jobs. In our case, it will be used to develop a pipeline that utilizes Dask to ingest CSV data into the database, transform it using Dask,, and load it back into the database.

Let’s start creating the pipeline by setting up the code for creating the database which I will create in the Python file called luigi_pipeline.py. We will import all the necessary libraries and create a task to establish a database.

import luigi
from luigi import LocalTarget, Parameter, IntParameter
from sqlalchemy import create_engine, text
import pandas as pd
from dask import delayed
import dask.dataframe as dd
from config import DB_USER, DB_PASS, DB_HOST, DB_PORT, DB_NAME, CONN_STR

class CreateDatabase(luigi.Task):
    def output(self):
        return LocalTarget("tmp/db_created.txt")

    def run(self):
        engine = create_engine(
            f"mysql+pymysql://{DB_USER}:{DB_PASS}@{DB_HOST}:{DB_PORT}/"
        )
        with engine.connect() as conn:
            conn.execute(text(f"CREATE DATABASE IF NOT EXISTS {DB_NAME}"))
        self.output().makedirs()
        with self.output().open("w") as f:
            f.write("ok")

 
The code above will create a new database when run if the database name doesn’t exist previously. We will use the class above in the CSV ingestion pipeline with Dask that we will set up below.

class IngestCSV(luigi.Task):
    csv_path   = Parameter()
    table_name = Parameter(default="ds_salaries")

    def requires(self):
        return CreateDatabase()

    def output(self):
        return LocalTarget("tmp/ingest_done.txt")

    def run(self):

        url_no_db = f"mysql+pymysql://{DB_USER}:{DB_PASS}@{DB_HOST}:{DB_PORT}/"
        engine0 = create_engine(url_no_db)
        with engine0.connect() as conn:
            conn.execute(text(f"CREATE DATABASE IF NOT EXISTS {DB_NAME}"))

        ddf = dd.read_csv(self.csv_path, assume_missing=True)

        engine = create_engine(CONN_STR)
        empty = ddf.head(0)
        empty.to_sql(self.table_name, con=engine, if_exists="replace", index=False)

        def append_part(pdf):
            pdf.to_sql(self.table_name, con=engine, if_exists="append", index=False)

        ddf.map_partitions(append_part, meta=()).compute()

        with self.output().open("w") as f:
            f.write("ok")

 
In the code above, we use Dask to read the CSV files and send them to the database. We are using Dask to enhance the reading process and make it more manageable to send the data to the database.

As a part of the pipeline, we will put the CSV ingestion into the ETL transformation using the code below.

class TransformETL(luigi.Task):
    csv_path   = Parameter()
    table_name = Parameter(default="ds_salaries")
    chunk_size = IntParameter(default=100_000)

    def requires(self):
        return IngestCSV(csv_path=self.csv_path,
                         table_name=self.table_name)

    def output(self):
        return LocalTarget("tmp/etl_done.txt")

    def run(self):
        engine = create_engine(CONN_STR)

        # 1. Count total rows for chunking
        with engine.connect() as conn:
            total = conn.execute(
                text(f"SELECT COUNT(*) FROM {self.table_name}")
            ).scalar()

        # 2. Build delayed partitions
        @delayed
        def load_chunk(offset, limit):
            return pd.read_sql(
                f"SELECT * FROM {self.table_name} LIMIT {limit} OFFSET {offset}",
                engine
            )

        parts = [
            load_chunk(i * self.chunk_size, self.chunk_size)
            for i in range((total // self.chunk_size) + 1)
        ]

        # 3. Load zero‐row metadata and cast to correct dtypes
        meta = (
            pd.read_sql(f"SELECT * FROM {self.table_name} LIMIT 0", engine)
              .astype({
                  "work_year":     "int64",
                  "salary":        "float64",
                  "salary_in_usd": "float64",
                  "remote_ratio":  "int64",
                  # leave the rest as object
              })
        )

        # 4. Create Dask DataFrame with corrected meta
        ddf = dd.from_delayed(parts, meta=meta)

        # 5. Filter & clean
        ddf = (
            ddf
            .dropna(subset=["salary_in_usd"])
            .assign(
                salary_in_usd=ddf["salary_in_usd"].astype(float)
            )
        )

        # 6. Keep only full-time roles
        ddf = ddf[ddf["employment_type"] == "FT"]

        # 7. Compute salary bracket at 10k USD
        bracket = (ddf["salary_in_usd"] // 10_000).astype(int) * 10_000
        ddf = ddf.assign(salary_bracket=bracket)

        # 8. Aggregate: average salary by year
        result = (
            ddf.groupby("work_year")["salary_in_usd"]
               .mean()
               .compute()
               .reset_index()
               .rename(columns={"salary_in_usd": "avg_salary_usd"})
        )

        # 9. Persist results
        result.to_sql("avg_salary_by_year",
                      con=engine, if_exists="replace", index=False)

        with self.output().open("w") as f:
            f.write("ok")

 

The code above performs multiple tasks using Dask to transform the data we have. Specifically, here is what Dask does within the pipeline:

  1. Load the dataset from the database in chunks.
  2. Set the metadata and create a Dask dataframe.
  3. Filter and data cleaning with Dask.
  4. Data Transformation with Dask.
  5. Load the data into the Database.

The data pipeline is then ready to use, and we can execute it with Python using the code below.

python luigi_pipeline.py TransformETL --csv-path data\ds_salaries.csv

 
You will receive the output information as follows.

===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 complete ones were encountered:
    - 1 TransformETL(csv_path=data\ds_salaries.csv, table_name=ds_salaries, chunk_size=100000)

Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies

 

Then, you can check the Luigi UI to see if the pipeline is working well.

 

You can see the dashboard output in the image below.
 
Building End-to-End Data Pipelines with Dask
 

If it’s successful, you can see that the pipeline has executed successfully, and you can check the result in your database.

SELECT * FROM analytics.avg_salary_by_year;

 
Where the output is shown below.
 
Building End-to-End Data Pipelines with Dask
 
With that, you just built an end-to-end data pipeline with Dask. All the code is stored in the following GitHub repository.
 

Conclusion

 
Building a data pipeline is a crucial skill for data professionals, particularly when utilizing Dask, as it is a tool that enhances data processing and manipulation. In this article, we have learned how to build the end-to-end data pipeline from ingesting data to loading it back into the database.

I hope this has helped!
 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

Source Link



Shark AI Ultra Voice Control Robot Vacuum

Transform your cleaning routine with the Shark AI Ultra Voice Control Robot Vacuum! This high-tech marvel boasts over 32,487 ratings, an impressive 4.2 out of 5 stars, and has been purchased over 900 times in the past month. Perfect for keeping your home spotless with minimal effort, this vacuum is now available for the unbeatable price of $349.99!

Don’t miss out on this limited-time offer. Order now and let Shark AI do the work for you!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: AI NEWS
Share162Tweet101Share28
Previous Post

4 Money Moves You Need to Make Before Wednesday

Next Post

You could get a cut of Apple’s $95 million Siri settlement – here’s how

AI

AI

Explore the dynamic realm of AI, where breakthroughs and trends are shaping the future. Stay informed and see how AI is making an impact. Don’t miss the crucial updates—read the latest articles here at Techcratic.

Related Posts

Artificial Intelligence
AI

Deploy Amazon SageMaker Projects with Terraform Cloud

May 30, 2025
1.3k
Artificial Intelligence
AI

Data Science ETL Pipelines with DuckDB

May 30, 2025
1.3k
Artificial Intelligence
AI

New Amazon Bedrock Data Automation capabilities streamline video and audio analysis

May 27, 2025
1.3k
Artificial Intelligence
AI

Surprising Things You Can Do with Python’s csv Module

May 21, 2025
1.3k
Artificial Intelligence
AI

Set up a custom plugin on Amazon Q Business and authenticate with Amazon Cognito to interact with backend systems

May 16, 2025
1.3k
Artificial Intelligence
AI

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

May 12, 2025
1.3k
Load More
Next Post
You could get a cut of Apple’s $95 million Siri settlement – here’s how

You could get a cut of Apple's $95 million Siri settlement - here's how

These Are The Documents You’ll Need When Buying A Used Motorcycle

These Are The Documents You'll Need When Buying A Used Motorcycle

Western Digital’s leaked PCIe 5 SSD could be an utter speed demon

Western Digital's leaked PCIe 5 SSD could be an utter speed demon

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • AnandTech
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • Home
  • Apple
  • Gaming
  • Microsoft
  • AnandTech