• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Tuesday, June 10, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

    Artificial Intelligence

    How to Use Deep Research Like a Pro

    Artificial Intelligence

    World-Consistent Video Diffusion With Explicit 3D Modeling

  • Crypto
    Uniswap Surges 24% on $88B Volume, Targeting $12

    Uniswap Surges 24% on $88B Volume, Targeting $12

    No One Fell for It: Paraguay’s Bitcoin Legal Tender Announcement Was a Zero-Sum Hack

    No One Fell for It: Paraguay’s Bitcoin Legal Tender Announcement Was a Zero-Sum Hack

    Pi Network Dives Toward $1 – Here’s Why Investors Are Nervous

    XRP Price to Pump With Golden Cross and Long-Term Holder Data

    Franklin Templeton Debuts Second-by-Second ‘Intraday Yield’ on Blockchain Platform

    Franklin Templeton Debuts Second-by-Second ‘Intraday Yield’ on Blockchain Platform

    Bitcoin ETFs Bounce Back With $386 Million Inflow as Ether ETFs Maintain Bull Run

    Bitcoin ETFs Bounce Back With $386 Million Inflow as Ether ETFs Maintain Bull Run

    Bitcoin Core Developers Merge Controversial Policy Changes: Is a Fork Ahead?

    Bitcoin Core Developers Merge Controversial Policy Changes: Is a Fork Ahead?

    Crypto to “Become Part of All Sectors” Under Trump: Kevin O’Leary

    Russian Crypto CEO Charged in $530M Laundering Fraud

    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

  • Cybersecurity
    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

    Cybersecurity

    CISA Adds Erlang SSH and Roundcube Flaws to Known Exploited Vulnerabilities Catalog

    Cybersecurity

    Malicious Browser Extensions Infect 722 Users Across Latin America Since Early 2025

    Cybersecurity

    Empower Users and Protect Against GenAI Data Loss

    Cybersecurity

    Popular Chrome Extensions Leak API Keys, User Data via HTTP and Hardcoded Credentials

    Cybersecurity

    Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

    Cybersecurity

    Why Traditional DLP Solutions Fail in the Browser Era

  • Deals
    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

    HP Samsung Electronics CLT-M406S Toner, Magenta

    HP Samsung Electronics CLT-M406S Toner, Magenta

    SAMSUNG Galaxy S23 FE 5G, US Version, 128GB, Black – Unlocked (Renewed)

    SAMSUNG Galaxy S23 FE 5G, US Version, 128GB, Black – Unlocked (Renewed)

    LaCie Rugged SSD 1TB, Externe SSD, voor Mac & PC, USB-C, Schok- Regen- en drukbestendig,…

    LaCie Rugged SSD 1TB, Externe SSD, voor Mac & PC, USB-C, Schok- Regen- en drukbestendig,…

    Kingspec 44PIN IDE PATA MLC 2GB 4GB 8GB 16GB 32GB DOM SSD Disk On Module For Network…

    Kingspec 44PIN IDE PATA MLC 2GB 4GB 8GB 16GB 32GB DOM SSD Disk On Module For Network…

    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    Hitachi MAF0058 Mass Air Flow Sensor

    Hitachi MAF0058 Mass Air Flow Sensor

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

  • Gaming
    Elden ring dlc walkthrough part 2

    Elden ring dlc walkthrough part 2

    Elden Ring Quick character build with Cheat Engine – Detailed Walkthrough for Creating builds Faste

    Elden Ring Quick character build with Cheat Engine – Detailed Walkthrough for Creating builds Faste

    The D&D Movie IS NOT WOKE!  A Review

    The D&D Movie IS NOT WOKE! A Review

    The Legends of Zelda BOTW Switch 2 – Final Boss and Ending (4K60FPS)

    The Legends of Zelda BOTW Switch 2 – Final Boss and Ending (4K60FPS)

    The Legend of Zelda Breath of the Wild Walkthrough Part 7 (E3 2016 Gameplay)

    The Legend of Zelda Breath of the Wild Walkthrough Part 7 (E3 2016 Gameplay)

    Blue Lion Supercomputer Will Run on NVIDIA Vera Rubin

    Blue Lion Supercomputer Will Run on NVIDIA Vera Rubin

    BOTW – Breadcrumbs – Walkthrough 68, pt. 7 (Sasa Kai Shrine)

    BOTW – Breadcrumbs – Walkthrough 68, pt. 7 (Sasa Kai Shrine)

    Yellow Wind Sage Boss Theme | Black Myth: Wukong

    Yellow Wind Sage Boss Theme | Black Myth: Wukong

    Baldurs Gate 3 REVIEW (In Progress) – My Brutally Honest Opinion & Is It Worth It? (BG3 Review)

    Baldurs Gate 3 REVIEW (In Progress) – My Brutally Honest Opinion & Is It Worth It? (BG3 Review)

  • Tesla
    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

    GOOACC 200PCS Car Plastic Rivets Fasteners Push Retainer Kit, 10 Most Popular Sizes Auto…

    GOOACC 200PCS Car Plastic Rivets Fasteners Push Retainer Kit, 10 Most Popular Sizes Auto…

    Tera Electric Vehicle Charger Tesla: ETL Certified Level 2 48 Amp 240 Volt DIY Stickers…

    Tera Electric Vehicle Charger Tesla: ETL Certified Level 2 48 Amp 240 Volt DIY Stickers…

    Tesla (TSLA) sales are crashing in China, and things are about to get worse

    Tesla (TSLA) sales are crashing in China, and things are about to get worse

    Lifting Jack Pad for Model 3/Y/S/X,4 PCS Jack Pad with Tire Repair Tool & Storage Box,…

    Lifting Jack Pad for Model 3/Y/S/X,4 PCS Jack Pad with Tire Repair Tool & Storage Box,…

    j Junsun Portable Electric Car Charger Level 2 EV Charger 32A 240V for Tesla 21ft Cable…

    j Junsun Portable Electric Car Charger Level 2 EV Charger 32A 240V for Tesla 21ft Cable…

    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

  • UFO
    Alien Research

    Alien Research

    History Classics: UFOs & Aliens

    History Classics: UFOs & Aliens

    Mysteries Of Ancient Aliens According To Hinduism || #shorts || #youtube || #religion ||

    Mysteries Of Ancient Aliens According To Hinduism || #shorts || #youtube || #religion ||

    The Light Gate Welcomes Rafael Lugo, Contactee, August 21st, 2023

    The Light Gate Welcomes Rafael Lugo, Contactee, August 21st, 2023

    FOCO NFL Mens Football Team Logo Moccasin Slippers Shoes

    FOCO NFL Mens Football Team Logo Moccasin Slippers Shoes

    Horrifying Encounter While Truck Driving #scary #paranormal

    Horrifying Encounter While Truck Driving #scary #paranormal

    Vintage Gators Personalized Name Apparel Retro Classic T-Shirt

    Vintage Gators Personalized Name Apparel Retro Classic T-Shirt

    Pop Culture Conspiracy Theories! Taylor Swift, BRAT, and The Simpson Predictions!

    Pop Culture Conspiracy Theories! Taylor Swift, BRAT, and The Simpson Predictions!

    Mufon and Ufos: The Proof is Out There [DVD]

    Mufon and Ufos: The Proof is Out There [DVD]

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

    Artificial Intelligence

    How to Use Deep Research Like a Pro

    Artificial Intelligence

    World-Consistent Video Diffusion With Explicit 3D Modeling

  • Crypto
    Uniswap Surges 24% on $88B Volume, Targeting $12

    Uniswap Surges 24% on $88B Volume, Targeting $12

    No One Fell for It: Paraguay’s Bitcoin Legal Tender Announcement Was a Zero-Sum Hack

    No One Fell for It: Paraguay’s Bitcoin Legal Tender Announcement Was a Zero-Sum Hack

    Pi Network Dives Toward $1 – Here’s Why Investors Are Nervous

    XRP Price to Pump With Golden Cross and Long-Term Holder Data

    Franklin Templeton Debuts Second-by-Second ‘Intraday Yield’ on Blockchain Platform

    Franklin Templeton Debuts Second-by-Second ‘Intraday Yield’ on Blockchain Platform

    Bitcoin ETFs Bounce Back With $386 Million Inflow as Ether ETFs Maintain Bull Run

    Bitcoin ETFs Bounce Back With $386 Million Inflow as Ether ETFs Maintain Bull Run

    Bitcoin Core Developers Merge Controversial Policy Changes: Is a Fork Ahead?

    Bitcoin Core Developers Merge Controversial Policy Changes: Is a Fork Ahead?

    Crypto to “Become Part of All Sectors” Under Trump: Kevin O’Leary

    Russian Crypto CEO Charged in $530M Laundering Fraud

    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

  • Cybersecurity
    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

    Cybersecurity

    CISA Adds Erlang SSH and Roundcube Flaws to Known Exploited Vulnerabilities Catalog

    Cybersecurity

    Malicious Browser Extensions Infect 722 Users Across Latin America Since Early 2025

    Cybersecurity

    Empower Users and Protect Against GenAI Data Loss

    Cybersecurity

    Popular Chrome Extensions Leak API Keys, User Data via HTTP and Hardcoded Credentials

    Cybersecurity

    Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

    Cybersecurity

    Why Traditional DLP Solutions Fail in the Browser Era

  • Deals
    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

    HP Samsung Electronics CLT-M406S Toner, Magenta

    HP Samsung Electronics CLT-M406S Toner, Magenta

    SAMSUNG Galaxy S23 FE 5G, US Version, 128GB, Black – Unlocked (Renewed)

    SAMSUNG Galaxy S23 FE 5G, US Version, 128GB, Black – Unlocked (Renewed)

    LaCie Rugged SSD 1TB, Externe SSD, voor Mac & PC, USB-C, Schok- Regen- en drukbestendig,…

    LaCie Rugged SSD 1TB, Externe SSD, voor Mac & PC, USB-C, Schok- Regen- en drukbestendig,…

    Kingspec 44PIN IDE PATA MLC 2GB 4GB 8GB 16GB 32GB DOM SSD Disk On Module For Network…

    Kingspec 44PIN IDE PATA MLC 2GB 4GB 8GB 16GB 32GB DOM SSD Disk On Module For Network…

    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    Hitachi MAF0058 Mass Air Flow Sensor

    Hitachi MAF0058 Mass Air Flow Sensor

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

  • Gaming
    Elden ring dlc walkthrough part 2

    Elden ring dlc walkthrough part 2

    Elden Ring Quick character build with Cheat Engine – Detailed Walkthrough for Creating builds Faste

    Elden Ring Quick character build with Cheat Engine – Detailed Walkthrough for Creating builds Faste

    The D&D Movie IS NOT WOKE!  A Review

    The D&D Movie IS NOT WOKE! A Review

    The Legends of Zelda BOTW Switch 2 – Final Boss and Ending (4K60FPS)

    The Legends of Zelda BOTW Switch 2 – Final Boss and Ending (4K60FPS)

    The Legend of Zelda Breath of the Wild Walkthrough Part 7 (E3 2016 Gameplay)

    The Legend of Zelda Breath of the Wild Walkthrough Part 7 (E3 2016 Gameplay)

    Blue Lion Supercomputer Will Run on NVIDIA Vera Rubin

    Blue Lion Supercomputer Will Run on NVIDIA Vera Rubin

    BOTW – Breadcrumbs – Walkthrough 68, pt. 7 (Sasa Kai Shrine)

    BOTW – Breadcrumbs – Walkthrough 68, pt. 7 (Sasa Kai Shrine)

    Yellow Wind Sage Boss Theme | Black Myth: Wukong

    Yellow Wind Sage Boss Theme | Black Myth: Wukong

    Baldurs Gate 3 REVIEW (In Progress) – My Brutally Honest Opinion & Is It Worth It? (BG3 Review)

    Baldurs Gate 3 REVIEW (In Progress) – My Brutally Honest Opinion & Is It Worth It? (BG3 Review)

  • Tesla
    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

    GOOACC 200PCS Car Plastic Rivets Fasteners Push Retainer Kit, 10 Most Popular Sizes Auto…

    GOOACC 200PCS Car Plastic Rivets Fasteners Push Retainer Kit, 10 Most Popular Sizes Auto…

    Tera Electric Vehicle Charger Tesla: ETL Certified Level 2 48 Amp 240 Volt DIY Stickers…

    Tera Electric Vehicle Charger Tesla: ETL Certified Level 2 48 Amp 240 Volt DIY Stickers…

    Tesla (TSLA) sales are crashing in China, and things are about to get worse

    Tesla (TSLA) sales are crashing in China, and things are about to get worse

    Lifting Jack Pad for Model 3/Y/S/X,4 PCS Jack Pad with Tire Repair Tool & Storage Box,…

    Lifting Jack Pad for Model 3/Y/S/X,4 PCS Jack Pad with Tire Repair Tool & Storage Box,…

    j Junsun Portable Electric Car Charger Level 2 EV Charger 32A 240V for Tesla 21ft Cable…

    j Junsun Portable Electric Car Charger Level 2 EV Charger 32A 240V for Tesla 21ft Cable…

    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

  • UFO
    Alien Research

    Alien Research

    History Classics: UFOs & Aliens

    History Classics: UFOs & Aliens

    Mysteries Of Ancient Aliens According To Hinduism || #shorts || #youtube || #religion ||

    Mysteries Of Ancient Aliens According To Hinduism || #shorts || #youtube || #religion ||

    The Light Gate Welcomes Rafael Lugo, Contactee, August 21st, 2023

    The Light Gate Welcomes Rafael Lugo, Contactee, August 21st, 2023

    FOCO NFL Mens Football Team Logo Moccasin Slippers Shoes

    FOCO NFL Mens Football Team Logo Moccasin Slippers Shoes

    Horrifying Encounter While Truck Driving #scary #paranormal

    Horrifying Encounter While Truck Driving #scary #paranormal

    Vintage Gators Personalized Name Apparel Retro Classic T-Shirt

    Vintage Gators Personalized Name Apparel Retro Classic T-Shirt

    Pop Culture Conspiracy Theories! Taylor Swift, BRAT, and The Simpson Predictions!

    Pop Culture Conspiracy Theories! Taylor Swift, BRAT, and The Simpson Predictions!

    Mufon and Ufos: The Proof is Out There [DVD]

    Mufon and Ufos: The Proof is Out There [DVD]

No Result
View All Result
Techcratic
No Result
View All Result
Home AI

Implementing Machine Learning Pipelines with Apache Spark

AI by AI
June 3, 2025
in AI
Reading Time: 7 mins read
127 10
A A
0
Share on FacebookShare on XShare on LinkedIn

Jayita Gulati
2025-06-03 08:00:00
www.kdnuggets.com

Machine Learning Pipelines with Apache Spark
Image by Editor (Kanwal Mehreen) | Canva

 

Apache Spark is a tool for working with big data. It is free to use and very fast. Spark can manage large amounts of data that don’t fit in a computer’s memory. A machine learning pipeline is a series of steps to prepare data and train models. These steps include collecting data, cleaning it, selecting important features, training the model, and checking how well it works.

Spark makes it easy to build these pipelines. With Spark, companies can quickly analyze large amounts of data and create machine learning models. This helps them make better decisions based on the information they have. In this article, we will explain how to set up and use machine learning pipelines in Spark.

 

Components of a Machine Learning Pipeline in Spark

 
Spark’s MLlib library has many built-in tools. These tools can be linked together to build a complete machine learning process.

 

Transformers

Transformers change data in some way. They take a DataFrame and return a modified version of it. These are used for tasks like encoding categorical data or scaling numerical features. Examples include StringIndexer (for encoding) and StandardScaler (for scaling). Transformers are reusable and don’t change the original data permanently.

 

Estimators

Estimators learn from data to create models. They include algorithms like LogisticRegression and RandomForestClassifier. Estimators use a fit method to train on data, and they output a Model object that can make predictions.

 

Pipeline

A Pipeline is a tool to connect transformers and estimators into a single workflow. By organizing them in sequence, data flows smoothly from one step to the next. Pipelines make it easy to retrain models, repeat processes, and adjust parameters.

Let’s go through a basic example of building a classification pipeline to predict customer churn. In this pipeline, we’ll:

  1. Load the Data: Import the dataset into Spark for processing.
  2. Preprocess the Data: Clean and prepare the data for modeling.
  3. Setup the Model: Prepare the logistic regression model.
  4. Train the Model: Fit a machine learning model to the data.
  5. Evaluate the Model: Check how well the model performs.

 

Initialize Spark Session and Load Dataset

 
First, we use SparkSession.builder to set up the session. Then, we load the customer churn dataset. This churn data is about bank customers who have closed their accounts.

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("MLPipeline").getOrCreate()

# Load dataset
data = spark.read.csv("/content/Customer Churn.csv", header=True, inferSchema=True)

# Show the first few rows of the dataset
data.show(5)

 
dataset

 

Data Preprocessing

 
First, we check the data for any missing values. If there are missing values, we remove those rows to make sure the data is complete. Next, we convert categorical data into numerical format so that the computer can understand it. We do this using methods like StringIndexer and OneHotEncoder. Finally, we combine all the features into a single vector and scale the data.

from pyspark.sql import functions as F
from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler, StandardScaler

# Check for missing values
missing_values = data.select([F.count(F.when(F.isnan(c) | F.col(c).isNull(), c)).alias(c) for c in data.columns])

# Drop rows with any missing values
data = data.na.drop()  

# Identify categorical columns
categorical_columns = ['country', 'gender', 'credit_card', 'active_member']

# Create a list to hold the stages of the pipeline
stages = []

# Apply StringIndexer to convert categorical columns to numerical indices
for column in categorical_columns:
    indexer = StringIndexer(inputCol=column, outputCol=column + "_index")
    stages.append(indexer)

    # Apply OneHotEncoder for categorical features
    encoder = OneHotEncoder(inputCols=[column + "_index"], outputCols=[column + "_ohe"])
    stages.append(encoder)

label_column = 'churn'  # The label column
feature_columns = [column + "_ohe" for column in categorical_columns]

# Add numerical columns to the features list
numerical_columns = ['credit_score', 'age', 'tenure', 'balance', 'products_number', 'estimated_salary']
feature_columns += numerical_columns

# Create VectorAssembler to combine all feature columns
vector_assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
stages.append(vector_assembler)

# Scale the features using StandardScaler
scaler = StandardScaler(inputCol="features", outputCol="scaled_features", withMean=True, withStd=True)
stages.append(scaler)

 

Logistic Regression Model Setup

 
We import LogisticRegression from pyspark.ml.classification. Next, we create a logistic regression model by using LogisticRegression().

from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline

# Logistic Regression Model
lr = LogisticRegression(featuresCol="scaled_features", labelCol=label_column)
stages.append(lr)

# Create and Run the Pipeline
pipeline = Pipeline(stages=stages)

 

Model Training and Predictions

 
We split the dataset into training and testing sets. Then, we fit the pipeline model to the training data and make predictions on the test data.

# Split data into training and testing sets
train_data, test_data = data.randomSplit([0.8, 0.2], seed=42)

# Fit the model
pipeline_model = pipeline.fit(train_data)

# Make Predictions
predictions = pipeline_model.transform(test_data)

# Show the predictions
predictions.select("prediction", label_column, "scaled_features").show(10)

 
predictions
 

Model Evaluation

 
We import MulticlassClassificationEvaluator from pyspark.ml.evaluation to evaluate our model’s performance. We calculate the accuracy, precision, recall, and F1 score using the predictions from our model. Finally, we stop the Spark session to free up resources.

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# Accuracy
evaluator_accuracy = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="accuracy")
accuracy = evaluator_accuracy.evaluate(predictions)
print(f"Accuracy: {accuracy}")

# Precision
evaluator_precision = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="weightedPrecision")
precision = evaluator_precision.evaluate(predictions)
print(f"Precision: {precision}")

# Recall
evaluator_recall = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="weightedRecall")
recall = evaluator_recall.evaluate(predictions)
print(f"Recall: {recall}")

# F1 Score
evaluator_f1 = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="f1")
f1_score = evaluator_f1.evaluate(predictions)
print(f"F1 Score: {f1_score}")

# Stop Spark session
spark.stop()

 
evaluation

 

Conclusion

 
In this article, we learned about machine learning pipelines in Apache Spark. Pipelines help organize each step of the ML process. We started by loading and cleaning the customer churn dataset. Then, we transformed the data and created a logistic regression model. After training the model, we made predictions on new data. Finally, we evaluated the model’s performance using accuracy, precision, recall, and F1 score.
 
 

Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.

Source Link



Shark AI Ultra Voice Control Robot Vacuum

Transform your cleaning routine with the Shark AI Ultra Voice Control Robot Vacuum! This high-tech marvel boasts over 32,487 ratings, an impressive 4.2 out of 5 stars, and has been purchased over 900 times in the past month. Perfect for keeping your home spotless with minimal effort, this vacuum is now available for the unbeatable price of $349.99!

Don’t miss out on this limited-time offer. Order now and let Shark AI do the work for you!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: AI NEWS
Share170Tweet106Share30
Previous Post

Brianna Bryan Interview – Blue’s Clues & You: Rainbow Puppy Adventures

Next Post

ERP/Database Specialist

AI

AI

Explore the dynamic realm of AI, where breakthroughs and trends are shaping the future. Stay informed and see how AI is making an impact. Don’t miss the crucial updates—read the latest articles here at Techcratic.

Related Posts

Artificial Intelligence
AI

7 Python Errors That Are Actually Features

June 10, 2025
1.3k
Artificial Intelligence
AI

10 Awesome OCR Models for 2025

June 6, 2025
1.3k
Artificial Intelligence
AI

5 Error Handling Patterns in Python (Beyond Try-Except)

June 6, 2025
1.3k
Artificial Intelligence
AI

Top 5 Alternative Data Career Paths and How to Learn Them for Free

June 5, 2025
1.3k
Artificial Intelligence
AI

Learn Power BI for Free This Week

June 2, 2025
1.4k
Artificial Intelligence
AI

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

June 2, 2025
1.3k
Load More
Next Post
2 HOUR JOB SEARCH

ERP/Database Specialist

Smartphone

Adobe Photoshop finally launches on Android

Klarna Has a New BNPL Debit Card That Seems Like a Credit Card. Is It Worth Getting?

Klarna Has a New BNPL Debit Card That Seems Like a Credit Card. Is It Worth Getting?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • AnandTech
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • Home
  • Apple
  • Gaming
  • Microsoft
  • AnandTech