• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Wednesday, June 18, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    How Apollo Tyres is unlocking machine insights using agentic AI-powered Manufacturing Reasoner

    Artificial Intelligence

    Automatically Build AI Workflows with Magical AI

    Artificial Intelligence

    Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

  • Crypto
    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    US Senate Passes First Major Stablecoin Regulation Bill

    Ripple and SEC Ask Court to Pause Appeals as They Fight to End XRP Case

    Ripple and SEC Ask Court to Pause Appeals as They Fight to End XRP Case

    Bitcoin Trades Near $102K Support as FOMC Triggers Selling

    Bitcoin Trades Near $102K Support as FOMC Triggers Selling

    Uniswap Surges 24% on $88B Volume, Targeting $12

    Pump.fun Accused of Stealing $741 M in Fees, Critics Warn

    Canada Approves First XRP Spot ETF on Toronto Stock Exchange

    Canada Approves First XRP Spot ETF on Toronto Stock Exchange

    Fold Announces $250M Equity Deal to Bolster Bitcoin Treasury

    Fold Announces $250M Equity Deal to Bolster Bitcoin Treasury

    Key BTC price levels to watch as fed rate cut hopes fade

    Key BTC price levels to watch as fed rate cut hopes fade

    Theminermag Bitcoin Mining Update: May/June 2025

    Theminermag Bitcoin Mining Update: May/June 2025

    Warning: Blackrock Could Orchestrate Institutional Bitcoin Takeover

    Warning: Blackrock Could Orchestrate Institutional Bitcoin Takeover

  • Cybersecurity
    Cybersecurity

    Critical RCE Bug Rated 9.9 CVSS in Backup & Replication

    Cybersecurity

    Hard-Coded ‘b’ Password in Sitecore XP Sparks Major RCE Risk in Enterprise Deployments

    Cybersecurity

    AI Agents Run on Secret Accounts — Learn How to Secure Them in This Webinar

    Cybersecurity

    How to Address the Expanding Security Risk

    Cybersecurity

    ConnectWise to Rotate ScreenConnect Code Signing Certificates Due to Security Risks

    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

  • Deals
    2018 Apple iPad Pro (12.9-inch, Wi-Fi, 256GB) – Silver (Renewed)

    2018 Apple iPad Pro (12.9-inch, Wi-Fi, 256GB) – Silver (Renewed)

    MageGee SKY98 Mechanical Gaming Keyboard, 96% Gasket Hot Swappable Wired Custom Creamy…

    MageGee SKY98 Mechanical Gaming Keyboard, 96% Gasket Hot Swappable Wired Custom Creamy…

    Sceptre 27-inch FHD 1080p IPS Gaming LED Monitor up to 165Hz 144Hz 1ms DisplayPort HDMI,…

    Sceptre 27-inch FHD 1080p IPS Gaming LED Monitor up to 165Hz 144Hz 1ms DisplayPort HDMI,…

    Razer Enki X Essential Gaming Chair: All-Day Comfort – Built-in Lumbar Arch – Optimized…

    Razer Enki X Essential Gaming Chair: All-Day Comfort – Built-in Lumbar Arch – Optimized…

    MSI Thin 15.6 inch FHD 144Hz Gaming Laptop Intel Core i5-13420H NVIDIA GeForce RTX…

    MSI Thin 15.6 inch FHD 144Hz Gaming Laptop Intel Core i5-13420H NVIDIA GeForce RTX…

    Sonic’s Ultimate Genesis Collection (Platinum Hits) – Xbox 360 (Renewed)

    Sonic’s Ultimate Genesis Collection (Platinum Hits) – Xbox 360 (Renewed)

    Donkey Kong Country Returns (Renewed)

    Donkey Kong Country Returns (Renewed)

    Buffalo Games CHRONOLOGY – The Game Where You Make History – 20th Anniversary Edition

    Buffalo Games CHRONOLOGY – The Game Where You Make History – 20th Anniversary Edition

    Sprunki Plush Toys, Horror Games Plushies Toy for Fans, Soft Stuffed Animal Pillow…

    Sprunki Plush Toys, Horror Games Plushies Toy for Fans, Soft Stuffed Animal Pillow…

  • Gaming
    Maliketh Black Blade Build 2025 VS Main Bosses + DLC – Elden Ring Colossal Sword Build Patch 1.16

    Maliketh Black Blade Build 2025 VS Main Bosses + DLC – Elden Ring Colossal Sword Build Patch 1.16

    OGL BACKLASH As Dungeon And Dragons Movie Faces Boycott

    OGL BACKLASH As Dungeon And Dragons Movie Faces Boycott

    Overwatch 2 Season 17 is finally giving power back to the people by introducing map voting for quick play and competitive

    Overwatch 2 Season 17 is finally giving power back to the people by introducing map voting for quick play and competitive

    The Legend of Zelda: Breath of the Wild – Monya Toma Shrine Walkthrough [HD 1080P]

    The Legend of Zelda: Breath of the Wild – Monya Toma Shrine Walkthrough [HD 1080P]

    BOTW – Lynel Hunting II – Walkthrough 27, pt. 5

    BOTW – Lynel Hunting II – Walkthrough 27, pt. 5

    Top 4 SECRET Broken Black Myth: Wukong Builds (Most OP Builds That You Missed Out On)

    Top 4 SECRET Broken Black Myth: Wukong Builds (Most OP Builds That You Missed Out On)

    Is Baldur’s Gate 3 Worth the Hype?

    Is Baldur’s Gate 3 Worth the Hype?

    Could The Mario Movie Be What The Next Mario Game Is Like?! Open World Mario!?

    Could The Mario Movie Be What The Next Mario Game Is Like?! Open World Mario!?

    The Calisto Protocol -Non-Spoiler Review- (PS5)

    The Calisto Protocol -Non-Spoiler Review- (PS5)

  • Tesla
    Custom Fit Tesla Cybertruck 2024 2025 Sunshade Umbrella -100% Blackout Ratio Thickened…

    Custom Fit Tesla Cybertruck 2024 2025 Sunshade Umbrella -100% Blackout Ratio Thickened…

    KEEPER Portable Trunk Organizer, 19L, Car Organizers and Storage, Non-Slip Bottom,…

    KEEPER Portable Trunk Organizer, 19L, Car Organizers and Storage, Non-Slip Bottom,…

    ARKSEN 64 x 39 x 4 Inch Upgrade Universal Roof Rack – 250Lbs Capacity Heavy Duty Rooftop…

    ARKSEN 64 x 39 x 4 Inch Upgrade Universal Roof Rack – 250Lbs Capacity Heavy Duty Rooftop…

    2025 Upgrade Sunshade Roof for Tesla Model Y Accessories, [Graphene Cooling Tech & High…

    2025 Upgrade Sunshade Roof for Tesla Model Y Accessories, [Graphene Cooling Tech & High…

    Tesla (TSLA) is sitting on so much inventory it has to take over parking lots all over the US

    Tesla (TSLA) is sitting on so much inventory it has to take over parking lots all over the US

    Tesla (TSLA) plans to pause production at Gigafactory Texas for second time in 2 months

    DEWALT CCS1 to NACS Fast Charging Adapter for All 2021 and Newer Tesla Models Excluding…

    DEWALT CCS1 to NACS Fast Charging Adapter for All 2021 and Newer Tesla Models Excluding…

    6PCS Trunk Mats & Frunk Mat & Backrest Mats for New 2025 2026 Tesla Model Y Juniper…

    6PCS Trunk Mats & Frunk Mat & Backrest Mats for New 2025 2026 Tesla Model Y Juniper…

    Tesla gives update on Tesla Semi factory, says on track for volume production in 2026

    Tesla gears up to start selling Tesla Semi electric truck in Europe

  • UFO
    Mind-Blowing Celebrity Encounters: Uncovering Unknown Stories and Unexpected Reactions

    Mind-Blowing Celebrity Encounters: Uncovering Unknown Stories and Unexpected Reactions

    Alien Abductions: Real Accounts and Theories #AlienAbductions #Extraterrestrial #Mystery #short

    Alien Abductions: Real Accounts and Theories #AlienAbductions #Extraterrestrial #Mystery #short

    ’UFO’ spotted by Beijing residents #shorts

    ’UFO’ spotted by Beijing residents #shorts

    Roswell Revisited

    Roswell Revisited

    The Bizarre Colares UFO Attack | Shocking Truth Behind Brazil's Biggest UFO Encounter

    The Bizarre Colares UFO Attack | Shocking Truth Behind Brazil's Biggest UFO Encounter

    The Alien Experiment | He saw Aliens #vigyanrecharge

    The Alien Experiment | He saw Aliens #vigyanrecharge

    UFO Completes 5 Orbits Around the Moon?! | Ancient Aliens | #Shorts

    UFO Completes 5 Orbits Around the Moon?! | Ancient Aliens | #Shorts

    A Pleiadian Contactee Describes His Experience

    A Pleiadian Contactee Describes His Experience

    Aidatain Outer Space Spaceship Tapestry Interior International Space Station Wall Hanging, Art Large Tapestry Spacecraft Backdrop 80″X 60″ Flannel for Bedroom Home Decor TFNAT0123

    Aidatain Outer Space Spaceship Tapestry Interior International Space Station Wall Hanging, Art Large Tapestry Spacecraft Backdrop 80″X 60″ Flannel for Bedroom Home Decor TFNAT0123

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    How Apollo Tyres is unlocking machine insights using agentic AI-powered Manufacturing Reasoner

    Artificial Intelligence

    Automatically Build AI Workflows with Magical AI

    Artificial Intelligence

    Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

  • Crypto
    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    US Senate Passes First Major Stablecoin Regulation Bill

    Ripple and SEC Ask Court to Pause Appeals as They Fight to End XRP Case

    Ripple and SEC Ask Court to Pause Appeals as They Fight to End XRP Case

    Bitcoin Trades Near $102K Support as FOMC Triggers Selling

    Bitcoin Trades Near $102K Support as FOMC Triggers Selling

    Uniswap Surges 24% on $88B Volume, Targeting $12

    Pump.fun Accused of Stealing $741 M in Fees, Critics Warn

    Canada Approves First XRP Spot ETF on Toronto Stock Exchange

    Canada Approves First XRP Spot ETF on Toronto Stock Exchange

    Fold Announces $250M Equity Deal to Bolster Bitcoin Treasury

    Fold Announces $250M Equity Deal to Bolster Bitcoin Treasury

    Key BTC price levels to watch as fed rate cut hopes fade

    Key BTC price levels to watch as fed rate cut hopes fade

    Theminermag Bitcoin Mining Update: May/June 2025

    Theminermag Bitcoin Mining Update: May/June 2025

    Warning: Blackrock Could Orchestrate Institutional Bitcoin Takeover

    Warning: Blackrock Could Orchestrate Institutional Bitcoin Takeover

  • Cybersecurity
    Cybersecurity

    Critical RCE Bug Rated 9.9 CVSS in Backup & Replication

    Cybersecurity

    Hard-Coded ‘b’ Password in Sitecore XP Sparks Major RCE Risk in Enterprise Deployments

    Cybersecurity

    AI Agents Run on Secret Accounts — Learn How to Secure Them in This Webinar

    Cybersecurity

    How to Address the Expanding Security Risk

    Cybersecurity

    ConnectWise to Rotate ScreenConnect Code Signing Certificates Due to Security Risks

    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

  • Deals
    2018 Apple iPad Pro (12.9-inch, Wi-Fi, 256GB) – Silver (Renewed)

    2018 Apple iPad Pro (12.9-inch, Wi-Fi, 256GB) – Silver (Renewed)

    MageGee SKY98 Mechanical Gaming Keyboard, 96% Gasket Hot Swappable Wired Custom Creamy…

    MageGee SKY98 Mechanical Gaming Keyboard, 96% Gasket Hot Swappable Wired Custom Creamy…

    Sceptre 27-inch FHD 1080p IPS Gaming LED Monitor up to 165Hz 144Hz 1ms DisplayPort HDMI,…

    Sceptre 27-inch FHD 1080p IPS Gaming LED Monitor up to 165Hz 144Hz 1ms DisplayPort HDMI,…

    Razer Enki X Essential Gaming Chair: All-Day Comfort – Built-in Lumbar Arch – Optimized…

    Razer Enki X Essential Gaming Chair: All-Day Comfort – Built-in Lumbar Arch – Optimized…

    MSI Thin 15.6 inch FHD 144Hz Gaming Laptop Intel Core i5-13420H NVIDIA GeForce RTX…

    MSI Thin 15.6 inch FHD 144Hz Gaming Laptop Intel Core i5-13420H NVIDIA GeForce RTX…

    Sonic’s Ultimate Genesis Collection (Platinum Hits) – Xbox 360 (Renewed)

    Sonic’s Ultimate Genesis Collection (Platinum Hits) – Xbox 360 (Renewed)

    Donkey Kong Country Returns (Renewed)

    Donkey Kong Country Returns (Renewed)

    Buffalo Games CHRONOLOGY – The Game Where You Make History – 20th Anniversary Edition

    Buffalo Games CHRONOLOGY – The Game Where You Make History – 20th Anniversary Edition

    Sprunki Plush Toys, Horror Games Plushies Toy for Fans, Soft Stuffed Animal Pillow…

    Sprunki Plush Toys, Horror Games Plushies Toy for Fans, Soft Stuffed Animal Pillow…

  • Gaming
    Maliketh Black Blade Build 2025 VS Main Bosses + DLC – Elden Ring Colossal Sword Build Patch 1.16

    Maliketh Black Blade Build 2025 VS Main Bosses + DLC – Elden Ring Colossal Sword Build Patch 1.16

    OGL BACKLASH As Dungeon And Dragons Movie Faces Boycott

    OGL BACKLASH As Dungeon And Dragons Movie Faces Boycott

    Overwatch 2 Season 17 is finally giving power back to the people by introducing map voting for quick play and competitive

    Overwatch 2 Season 17 is finally giving power back to the people by introducing map voting for quick play and competitive

    The Legend of Zelda: Breath of the Wild – Monya Toma Shrine Walkthrough [HD 1080P]

    The Legend of Zelda: Breath of the Wild – Monya Toma Shrine Walkthrough [HD 1080P]

    BOTW – Lynel Hunting II – Walkthrough 27, pt. 5

    BOTW – Lynel Hunting II – Walkthrough 27, pt. 5

    Top 4 SECRET Broken Black Myth: Wukong Builds (Most OP Builds That You Missed Out On)

    Top 4 SECRET Broken Black Myth: Wukong Builds (Most OP Builds That You Missed Out On)

    Is Baldur’s Gate 3 Worth the Hype?

    Is Baldur’s Gate 3 Worth the Hype?

    Could The Mario Movie Be What The Next Mario Game Is Like?! Open World Mario!?

    Could The Mario Movie Be What The Next Mario Game Is Like?! Open World Mario!?

    The Calisto Protocol -Non-Spoiler Review- (PS5)

    The Calisto Protocol -Non-Spoiler Review- (PS5)

  • Tesla
    Custom Fit Tesla Cybertruck 2024 2025 Sunshade Umbrella -100% Blackout Ratio Thickened…

    Custom Fit Tesla Cybertruck 2024 2025 Sunshade Umbrella -100% Blackout Ratio Thickened…

    KEEPER Portable Trunk Organizer, 19L, Car Organizers and Storage, Non-Slip Bottom,…

    KEEPER Portable Trunk Organizer, 19L, Car Organizers and Storage, Non-Slip Bottom,…

    ARKSEN 64 x 39 x 4 Inch Upgrade Universal Roof Rack – 250Lbs Capacity Heavy Duty Rooftop…

    ARKSEN 64 x 39 x 4 Inch Upgrade Universal Roof Rack – 250Lbs Capacity Heavy Duty Rooftop…

    2025 Upgrade Sunshade Roof for Tesla Model Y Accessories, [Graphene Cooling Tech & High…

    2025 Upgrade Sunshade Roof for Tesla Model Y Accessories, [Graphene Cooling Tech & High…

    Tesla (TSLA) is sitting on so much inventory it has to take over parking lots all over the US

    Tesla (TSLA) is sitting on so much inventory it has to take over parking lots all over the US

    Tesla (TSLA) plans to pause production at Gigafactory Texas for second time in 2 months

    DEWALT CCS1 to NACS Fast Charging Adapter for All 2021 and Newer Tesla Models Excluding…

    DEWALT CCS1 to NACS Fast Charging Adapter for All 2021 and Newer Tesla Models Excluding…

    6PCS Trunk Mats & Frunk Mat & Backrest Mats for New 2025 2026 Tesla Model Y Juniper…

    6PCS Trunk Mats & Frunk Mat & Backrest Mats for New 2025 2026 Tesla Model Y Juniper…

    Tesla gives update on Tesla Semi factory, says on track for volume production in 2026

    Tesla gears up to start selling Tesla Semi electric truck in Europe

  • UFO
    Mind-Blowing Celebrity Encounters: Uncovering Unknown Stories and Unexpected Reactions

    Mind-Blowing Celebrity Encounters: Uncovering Unknown Stories and Unexpected Reactions

    Alien Abductions: Real Accounts and Theories #AlienAbductions #Extraterrestrial #Mystery #short

    Alien Abductions: Real Accounts and Theories #AlienAbductions #Extraterrestrial #Mystery #short

    ’UFO’ spotted by Beijing residents #shorts

    ’UFO’ spotted by Beijing residents #shorts

    Roswell Revisited

    Roswell Revisited

    The Bizarre Colares UFO Attack | Shocking Truth Behind Brazil's Biggest UFO Encounter

    The Bizarre Colares UFO Attack | Shocking Truth Behind Brazil's Biggest UFO Encounter

    The Alien Experiment | He saw Aliens #vigyanrecharge

    The Alien Experiment | He saw Aliens #vigyanrecharge

    UFO Completes 5 Orbits Around the Moon?! | Ancient Aliens | #Shorts

    UFO Completes 5 Orbits Around the Moon?! | Ancient Aliens | #Shorts

    A Pleiadian Contactee Describes His Experience

    A Pleiadian Contactee Describes His Experience

    Aidatain Outer Space Spaceship Tapestry Interior International Space Station Wall Hanging, Art Large Tapestry Spacecraft Backdrop 80″X 60″ Flannel for Bedroom Home Decor TFNAT0123

    Aidatain Outer Space Spaceship Tapestry Interior International Space Station Wall Hanging, Art Large Tapestry Spacecraft Backdrop 80″X 60″ Flannel for Bedroom Home Decor TFNAT0123

No Result
View All Result
Techcratic
No Result
View All Result
Home AI

Implementing Machine Learning Pipelines with Apache Spark

AI by AI
June 3, 2025
in AI
Reading Time: 7 mins read
127 10
A A
0

Jayita Gulati
2025-06-03 08:00:00
www.kdnuggets.com

Machine Learning Pipelines with Apache Spark
Image by Editor (Kanwal Mehreen) | Canva

 

Apache Spark is a tool for working with big data. It is free to use and very fast. Spark can manage large amounts of data that don’t fit in a computer’s memory. A machine learning pipeline is a series of steps to prepare data and train models. These steps include collecting data, cleaning it, selecting important features, training the model, and checking how well it works.

Spark makes it easy to build these pipelines. With Spark, companies can quickly analyze large amounts of data and create machine learning models. This helps them make better decisions based on the information they have. In this article, we will explain how to set up and use machine learning pipelines in Spark.

 

Components of a Machine Learning Pipeline in Spark

 
Spark’s MLlib library has many built-in tools. These tools can be linked together to build a complete machine learning process.

 

Transformers

Transformers change data in some way. They take a DataFrame and return a modified version of it. These are used for tasks like encoding categorical data or scaling numerical features. Examples include StringIndexer (for encoding) and StandardScaler (for scaling). Transformers are reusable and don’t change the original data permanently.

 

Estimators

Estimators learn from data to create models. They include algorithms like LogisticRegression and RandomForestClassifier. Estimators use a fit method to train on data, and they output a Model object that can make predictions.

 

Pipeline

A Pipeline is a tool to connect transformers and estimators into a single workflow. By organizing them in sequence, data flows smoothly from one step to the next. Pipelines make it easy to retrain models, repeat processes, and adjust parameters.

Let’s go through a basic example of building a classification pipeline to predict customer churn. In this pipeline, we’ll:

  1. Load the Data: Import the dataset into Spark for processing.
  2. Preprocess the Data: Clean and prepare the data for modeling.
  3. Setup the Model: Prepare the logistic regression model.
  4. Train the Model: Fit a machine learning model to the data.
  5. Evaluate the Model: Check how well the model performs.

 

Initialize Spark Session and Load Dataset

 
First, we use SparkSession.builder to set up the session. Then, we load the customer churn dataset. This churn data is about bank customers who have closed their accounts.

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("MLPipeline").getOrCreate()

# Load dataset
data = spark.read.csv("/content/Customer Churn.csv", header=True, inferSchema=True)

# Show the first few rows of the dataset
data.show(5)

 
dataset

 

Data Preprocessing

 
First, we check the data for any missing values. If there are missing values, we remove those rows to make sure the data is complete. Next, we convert categorical data into numerical format so that the computer can understand it. We do this using methods like StringIndexer and OneHotEncoder. Finally, we combine all the features into a single vector and scale the data.

from pyspark.sql import functions as F
from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler, StandardScaler

# Check for missing values
missing_values = data.select([F.count(F.when(F.isnan(c) | F.col(c).isNull(), c)).alias(c) for c in data.columns])

# Drop rows with any missing values
data = data.na.drop()  

# Identify categorical columns
categorical_columns = ['country', 'gender', 'credit_card', 'active_member']

# Create a list to hold the stages of the pipeline
stages = []

# Apply StringIndexer to convert categorical columns to numerical indices
for column in categorical_columns:
    indexer = StringIndexer(inputCol=column, outputCol=column + "_index")
    stages.append(indexer)

    # Apply OneHotEncoder for categorical features
    encoder = OneHotEncoder(inputCols=[column + "_index"], outputCols=[column + "_ohe"])
    stages.append(encoder)

label_column = 'churn'  # The label column
feature_columns = [column + "_ohe" for column in categorical_columns]

# Add numerical columns to the features list
numerical_columns = ['credit_score', 'age', 'tenure', 'balance', 'products_number', 'estimated_salary']
feature_columns += numerical_columns

# Create VectorAssembler to combine all feature columns
vector_assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
stages.append(vector_assembler)

# Scale the features using StandardScaler
scaler = StandardScaler(inputCol="features", outputCol="scaled_features", withMean=True, withStd=True)
stages.append(scaler)

 

Logistic Regression Model Setup

 
We import LogisticRegression from pyspark.ml.classification. Next, we create a logistic regression model by using LogisticRegression().

from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline

# Logistic Regression Model
lr = LogisticRegression(featuresCol="scaled_features", labelCol=label_column)
stages.append(lr)

# Create and Run the Pipeline
pipeline = Pipeline(stages=stages)

 

Model Training and Predictions

 
We split the dataset into training and testing sets. Then, we fit the pipeline model to the training data and make predictions on the test data.

# Split data into training and testing sets
train_data, test_data = data.randomSplit([0.8, 0.2], seed=42)

# Fit the model
pipeline_model = pipeline.fit(train_data)

# Make Predictions
predictions = pipeline_model.transform(test_data)

# Show the predictions
predictions.select("prediction", label_column, "scaled_features").show(10)

 
predictions
 

Model Evaluation

 
We import MulticlassClassificationEvaluator from pyspark.ml.evaluation to evaluate our model’s performance. We calculate the accuracy, precision, recall, and F1 score using the predictions from our model. Finally, we stop the Spark session to free up resources.

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# Accuracy
evaluator_accuracy = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="accuracy")
accuracy = evaluator_accuracy.evaluate(predictions)
print(f"Accuracy: {accuracy}")

# Precision
evaluator_precision = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="weightedPrecision")
precision = evaluator_precision.evaluate(predictions)
print(f"Precision: {precision}")

# Recall
evaluator_recall = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="weightedRecall")
recall = evaluator_recall.evaluate(predictions)
print(f"Recall: {recall}")

# F1 Score
evaluator_f1 = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="f1")
f1_score = evaluator_f1.evaluate(predictions)
print(f"F1 Score: {f1_score}")

# Stop Spark session
spark.stop()

 
evaluation

 

Conclusion

 
In this article, we learned about machine learning pipelines in Apache Spark. Pipelines help organize each step of the ML process. We started by loading and cleaning the customer churn dataset. Then, we transformed the data and created a logistic regression model. After training the model, we made predictions on new data. Finally, we evaluated the model’s performance using accuracy, precision, recall, and F1 score.
 
 

Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.

Source Link



Shark AI Ultra Voice Control Robot Vacuum

Transform your cleaning routine with the Shark AI Ultra Voice Control Robot Vacuum! This high-tech marvel boasts over 32,487 ratings, an impressive 4.2 out of 5 stars, and has been purchased over 900 times in the past month. Perfect for keeping your home spotless with minimal effort, this vacuum is now available for the unbeatable price of $349.99!

Don’t miss out on this limited-time offer. Order now and let Shark AI do the work for you!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: AI NEWS
Share170ShareTweet106
Previous Post

Brianna Bryan Interview – Blue’s Clues & You: Rainbow Puppy Adventures

Next Post

ERP/Database Specialist

AI

AI

Explore the dynamic realm of AI, where breakthroughs and trends are shaping the future. Stay informed and see how AI is making an impact. Don’t miss the crucial updates—read the latest articles here at Techcratic.

Related Posts

Artificial Intelligence
AI

How Apollo Tyres is unlocking machine insights using agentic AI-powered Manufacturing Reasoner

June 16, 2025
1.3k
Artificial Intelligence
AI

Automatically Build AI Workflows with Magical AI

June 16, 2025
1.3k
Artificial Intelligence
AI

Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

June 11, 2025
1.4k
Artificial Intelligence
AI

Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

June 11, 2025
1.3k
Artificial Intelligence
AI

7 Python Errors That Are Actually Features

June 10, 2025
1.3k
Artificial Intelligence
AI

10 Awesome OCR Models for 2025

June 6, 2025
1.3k
Artificial Intelligence
AI

5 Error Handling Patterns in Python (Beyond Try-Except)

June 6, 2025
1.3k
Artificial Intelligence
AI

Top 5 Alternative Data Career Paths and How to Learn Them for Free

June 5, 2025
1.3k
Load More
Next Post
2 HOUR JOB SEARCH

ERP/Database Specialist

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Forbes
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired