• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Wednesday, June 11, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

    Artificial Intelligence

    How to Use Deep Research Like a Pro

  • Crypto
    XRP Ledger Powers Institutional Onramp—Mint Treasuries 24/7 With Ripple USD

    XRP Ledger Powers Institutional Onramp—Mint Treasuries 24/7 With Ripple USD

    Strategy Begins Nasdaq Trading of New Stock With $980M Bitcoin-Fueled Momentum

    Strategy Begins Nasdaq Trading of New Stock With $980M Bitcoin-Fueled Momentum

    Crypto Prices Set To Move Higher After US Progress on Trade

    Crypto Prices Set To Move Higher After US Progress on Trade

    Crypto ETF Surge: Bitcoin and Ether Funds Attract Over $550 Million as Inflows Continue

    Crypto ETF Surge: Bitcoin and Ether Funds Attract Over $550 Million as Inflows Continue

    From ETFs to Strategic Bitcoin Reserve: Inside Trump’s crypto playbook

    From ETFs to Strategic Bitcoin Reserve: Inside Trump’s crypto playbook

    Crypto Lost $1.64 Billion to Hackers in Q1 2025

    Why Is Crypto Up Today? – June 11, 2025

    UK FCA Creates New Deputy Chief Executive Role to Oversee Regulation of Stablecoin and Crypto Firms

    UK FCA Creates New Deputy Chief Executive Role to Oversee Regulation of Stablecoin and Crypto Firms

    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    GameStop Bought 4,710 Bitcoin in 5 Weeks

    Moscow Exchange Launches Landmark Bitcoin Index

    Moscow Exchange Launches Landmark Bitcoin Index

  • Cybersecurity
    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

    Cybersecurity

    CISA Adds Erlang SSH and Roundcube Flaws to Known Exploited Vulnerabilities Catalog

    Cybersecurity

    Malicious Browser Extensions Infect 722 Users Across Latin America Since Early 2025

    Cybersecurity

    Empower Users and Protect Against GenAI Data Loss

  • Deals
    acer Aspire Premium Laptop | AMD Ryzen 7 5700U (Beats i7-1250U) CPU | 64GB RAM | 2TB SSD…

    acer Aspire Premium Laptop | AMD Ryzen 7 5700U (Beats i7-1250U) CPU | 64GB RAM | 2TB SSD…

    WALI Dual Monitor Mount, Adjustable Gas Spring Monitor Desk Stand for 2 Monitors, Heavy…

    WALI Dual Monitor Mount, Adjustable Gas Spring Monitor Desk Stand for 2 Monitors, Heavy…

    Lekvey Ergonomic Mouse, Vertical Wireless Mouse – Rechargeable 2.4GHz Optical Vertical…

    Lekvey Ergonomic Mouse, Vertical Wireless Mouse – Rechargeable 2.4GHz Optical Vertical…

    GTPLAYER Gaming Chair, Computer Office Chair with Pocket Spring Cushion, Linkage…

    GTPLAYER Gaming Chair, Computer Office Chair with Pocket Spring Cushion, Linkage…

    South Park: The Stick of Truth – Xbox 360 (Renewed)

    South Park: The Stick of Truth – Xbox 360 (Renewed)

    Dangerous Game: The Legacy Murders [DVD]

    Dangerous Game: The Legacy Murders [DVD]

    TOSY Flying Disc – 16 Million Colors RGB or 36 LEDs, Extremely Bright, Smart Modes,…

    TOSY Flying Disc – 16 Million Colors RGB or 36 LEDs, Extremely Bright, Smart Modes,…

    Transcend TS256GMTE220S 256GB M.2 PCIe Gen3x4 80mm Internal Solid State Drive

    Transcend TS256GMTE220S 256GB M.2 PCIe Gen3x4 80mm Internal Solid State Drive

    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

  • Gaming
    Rockstar Games NEW Trailer Has GTA 6 Fans SO PISSED!

    Rockstar Games NEW Trailer Has GTA 6 Fans SO PISSED!

    The Last of Us Remastered Playstation 4 Vs Last of Us PS3 | Initial Analysis | 1080P Tech Tribunal

    The Last of Us Remastered Playstation 4 Vs Last of Us PS3 | Initial Analysis | 1080P Tech Tribunal

    Videogame voice actors strike ‘suspended’ following agreement with game companies: ‘All SAG-AFTRA members are instructed to return to work’

    Videogame voice actors strike ‘suspended’ following agreement with game companies: ‘All SAG-AFTRA members are instructed to return to work’

    Why you should play Stray – Review

    Why you should play Stray – Review

    Pokemon Violet Walkthrough Part 6: Paldea is Our Cloyster!

    Pokemon Violet Walkthrough Part 6: Paldea is Our Cloyster!

    How To Get KINGAMBIT in Pokemon Scarlet and Violet!

    How To Get KINGAMBIT in Pokemon Scarlet and Violet!

    Aniimo: Breaking Down This Beautiful Creature Collector – Sign Up for a Closed Beta!

    Aniimo: Breaking Down This Beautiful Creature Collector – Sign Up for a Closed Beta!

    Zelda Ocarina of Time HD 100% Walkthrough – Part 15 – Zora's Domain | King Zora

    Zelda Ocarina of Time HD 100% Walkthrough – Part 15 – Zora's Domain | King Zora

    My Let's Play Zelda Ocarina of Time Walkthrough 25 HD

    My Let's Play Zelda Ocarina of Time Walkthrough 25 HD

  • Tesla
    Dashboard Mobile Phone Holder, Non-Slip 360 Degree Rotatable Navigation Bracket,…

    Dashboard Mobile Phone Holder, Non-Slip 360 Degree Rotatable Navigation Bracket,…

    Skechers Car Floor Mats,Heavy Duty Rubber Car Mats Full Set,All WeatherFloor…

    Skechers Car Floor Mats,Heavy Duty Rubber Car Mats Full Set,All WeatherFloor…

    Center Console Organizer Behind Screen Storage Box for 2024 Tesla Cybertruck…

    Center Console Organizer Behind Screen Storage Box for 2024 Tesla Cybertruck…

    Tesla is done in Germany: 94% say they won’t buy a Tesla car

    Tesla owners sue to break their leases over Musk making the cars ‘far-right totems’

    Flag Pole Holder Kit for Tesla Cybertruck, Lymorexan L Track Flag Pole Mount Kit for…

    Flag Pole Holder Kit for Tesla Cybertruck, Lymorexan L Track Flag Pole Mount Kit for…

    3PCS Center Console Accessories for Tesla New Model Y Juniper 2025 Model 3 Highland 2024…

    3PCS Center Console Accessories for Tesla New Model Y Juniper 2025 Model 3 Highland 2024…

    Car Sound Deadening Roller, Audio Sound Deadener Application Installation Metal Seam…

    Car Sound Deadening Roller, Audio Sound Deadener Application Installation Metal Seam…

    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

  • UFO
    Trump Discusses Drone Sightings Along US East Coast | #CISNewsStudio1s

    Trump Discusses Drone Sightings Along US East Coast | #CISNewsStudio1s

    Roswell Conspiracies: Aliens, Myths & Legends, Vol. 1

    Roswell Conspiracies: Aliens, Myths & Legends, Vol. 1

    5 Shocking Nature Sky Phenomena That Actually Happened!

    5 Shocking Nature Sky Phenomena That Actually Happened!

    UFO Hunters – Season 1 (History) (Steelbook) [DVD]

    UFO Hunters – Season 1 (History) (Steelbook) [DVD]

    The Bizarre Handbag Figure Found In Mesoamerica

    The Bizarre Handbag Figure Found In Mesoamerica

    NOVA: What are UFOs?

    NOVA: What are UFOs?

    They Are Already Here: UFO Culture and Why We See Saucers

    They Are Already Here: UFO Culture and Why We See Saucers

    Alien: Romulus

    Alien: Romulus

    Top 25 Alien Encounters: UFO Case Files Exposed [DVD]

    Top 25 Alien Encounters: UFO Case Files Exposed [DVD]

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

    Artificial Intelligence

    How to Use Deep Research Like a Pro

  • Crypto
    XRP Ledger Powers Institutional Onramp—Mint Treasuries 24/7 With Ripple USD

    XRP Ledger Powers Institutional Onramp—Mint Treasuries 24/7 With Ripple USD

    Strategy Begins Nasdaq Trading of New Stock With $980M Bitcoin-Fueled Momentum

    Strategy Begins Nasdaq Trading of New Stock With $980M Bitcoin-Fueled Momentum

    Crypto Prices Set To Move Higher After US Progress on Trade

    Crypto Prices Set To Move Higher After US Progress on Trade

    Crypto ETF Surge: Bitcoin and Ether Funds Attract Over $550 Million as Inflows Continue

    Crypto ETF Surge: Bitcoin and Ether Funds Attract Over $550 Million as Inflows Continue

    From ETFs to Strategic Bitcoin Reserve: Inside Trump’s crypto playbook

    From ETFs to Strategic Bitcoin Reserve: Inside Trump’s crypto playbook

    Crypto Lost $1.64 Billion to Hackers in Q1 2025

    Why Is Crypto Up Today? – June 11, 2025

    UK FCA Creates New Deputy Chief Executive Role to Oversee Regulation of Stablecoin and Crypto Firms

    UK FCA Creates New Deputy Chief Executive Role to Oversee Regulation of Stablecoin and Crypto Firms

    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    GameStop Bought 4,710 Bitcoin in 5 Weeks

    Moscow Exchange Launches Landmark Bitcoin Index

    Moscow Exchange Launches Landmark Bitcoin Index

  • Cybersecurity
    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

    Cybersecurity

    CISA Adds Erlang SSH and Roundcube Flaws to Known Exploited Vulnerabilities Catalog

    Cybersecurity

    Malicious Browser Extensions Infect 722 Users Across Latin America Since Early 2025

    Cybersecurity

    Empower Users and Protect Against GenAI Data Loss

  • Deals
    acer Aspire Premium Laptop | AMD Ryzen 7 5700U (Beats i7-1250U) CPU | 64GB RAM | 2TB SSD…

    acer Aspire Premium Laptop | AMD Ryzen 7 5700U (Beats i7-1250U) CPU | 64GB RAM | 2TB SSD…

    WALI Dual Monitor Mount, Adjustable Gas Spring Monitor Desk Stand for 2 Monitors, Heavy…

    WALI Dual Monitor Mount, Adjustable Gas Spring Monitor Desk Stand for 2 Monitors, Heavy…

    Lekvey Ergonomic Mouse, Vertical Wireless Mouse – Rechargeable 2.4GHz Optical Vertical…

    Lekvey Ergonomic Mouse, Vertical Wireless Mouse – Rechargeable 2.4GHz Optical Vertical…

    GTPLAYER Gaming Chair, Computer Office Chair with Pocket Spring Cushion, Linkage…

    GTPLAYER Gaming Chair, Computer Office Chair with Pocket Spring Cushion, Linkage…

    South Park: The Stick of Truth – Xbox 360 (Renewed)

    South Park: The Stick of Truth – Xbox 360 (Renewed)

    Dangerous Game: The Legacy Murders [DVD]

    Dangerous Game: The Legacy Murders [DVD]

    TOSY Flying Disc – 16 Million Colors RGB or 36 LEDs, Extremely Bright, Smart Modes,…

    TOSY Flying Disc – 16 Million Colors RGB or 36 LEDs, Extremely Bright, Smart Modes,…

    Transcend TS256GMTE220S 256GB M.2 PCIe Gen3x4 80mm Internal Solid State Drive

    Transcend TS256GMTE220S 256GB M.2 PCIe Gen3x4 80mm Internal Solid State Drive

    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

    Cable Matters 10Gbps Short USB C to Micro USB 3.0 Cable – 1ft, USB-C Hard Drive Cable,…

  • Gaming
    Rockstar Games NEW Trailer Has GTA 6 Fans SO PISSED!

    Rockstar Games NEW Trailer Has GTA 6 Fans SO PISSED!

    The Last of Us Remastered Playstation 4 Vs Last of Us PS3 | Initial Analysis | 1080P Tech Tribunal

    The Last of Us Remastered Playstation 4 Vs Last of Us PS3 | Initial Analysis | 1080P Tech Tribunal

    Videogame voice actors strike ‘suspended’ following agreement with game companies: ‘All SAG-AFTRA members are instructed to return to work’

    Videogame voice actors strike ‘suspended’ following agreement with game companies: ‘All SAG-AFTRA members are instructed to return to work’

    Why you should play Stray – Review

    Why you should play Stray – Review

    Pokemon Violet Walkthrough Part 6: Paldea is Our Cloyster!

    Pokemon Violet Walkthrough Part 6: Paldea is Our Cloyster!

    How To Get KINGAMBIT in Pokemon Scarlet and Violet!

    How To Get KINGAMBIT in Pokemon Scarlet and Violet!

    Aniimo: Breaking Down This Beautiful Creature Collector – Sign Up for a Closed Beta!

    Aniimo: Breaking Down This Beautiful Creature Collector – Sign Up for a Closed Beta!

    Zelda Ocarina of Time HD 100% Walkthrough – Part 15 – Zora's Domain | King Zora

    Zelda Ocarina of Time HD 100% Walkthrough – Part 15 – Zora's Domain | King Zora

    My Let's Play Zelda Ocarina of Time Walkthrough 25 HD

    My Let's Play Zelda Ocarina of Time Walkthrough 25 HD

  • Tesla
    Dashboard Mobile Phone Holder, Non-Slip 360 Degree Rotatable Navigation Bracket,…

    Dashboard Mobile Phone Holder, Non-Slip 360 Degree Rotatable Navigation Bracket,…

    Skechers Car Floor Mats,Heavy Duty Rubber Car Mats Full Set,All WeatherFloor…

    Skechers Car Floor Mats,Heavy Duty Rubber Car Mats Full Set,All WeatherFloor…

    Center Console Organizer Behind Screen Storage Box for 2024 Tesla Cybertruck…

    Center Console Organizer Behind Screen Storage Box for 2024 Tesla Cybertruck…

    Tesla is done in Germany: 94% say they won’t buy a Tesla car

    Tesla owners sue to break their leases over Musk making the cars ‘far-right totems’

    Flag Pole Holder Kit for Tesla Cybertruck, Lymorexan L Track Flag Pole Mount Kit for…

    Flag Pole Holder Kit for Tesla Cybertruck, Lymorexan L Track Flag Pole Mount Kit for…

    3PCS Center Console Accessories for Tesla New Model Y Juniper 2025 Model 3 Highland 2024…

    3PCS Center Console Accessories for Tesla New Model Y Juniper 2025 Model 3 Highland 2024…

    Car Sound Deadening Roller, Audio Sound Deadener Application Installation Metal Seam…

    Car Sound Deadening Roller, Audio Sound Deadener Application Installation Metal Seam…

    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    iZEEKER 2.5K Dash Cam WiFi Dash Camera for Cars, Mini Car Camera 1440P Front Dashcams…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

    2 Pack For Tesla Model X 2017-2024 Front/Back Under Seat Storage Organizer,TPE…

  • UFO
    Trump Discusses Drone Sightings Along US East Coast | #CISNewsStudio1s

    Trump Discusses Drone Sightings Along US East Coast | #CISNewsStudio1s

    Roswell Conspiracies: Aliens, Myths & Legends, Vol. 1

    Roswell Conspiracies: Aliens, Myths & Legends, Vol. 1

    5 Shocking Nature Sky Phenomena That Actually Happened!

    5 Shocking Nature Sky Phenomena That Actually Happened!

    UFO Hunters – Season 1 (History) (Steelbook) [DVD]

    UFO Hunters – Season 1 (History) (Steelbook) [DVD]

    The Bizarre Handbag Figure Found In Mesoamerica

    The Bizarre Handbag Figure Found In Mesoamerica

    NOVA: What are UFOs?

    NOVA: What are UFOs?

    They Are Already Here: UFO Culture and Why We See Saucers

    They Are Already Here: UFO Culture and Why We See Saucers

    Alien: Romulus

    Alien: Romulus

    Top 25 Alien Encounters: UFO Case Files Exposed [DVD]

    Top 25 Alien Encounters: UFO Case Files Exposed [DVD]

No Result
View All Result
Techcratic
No Result
View All Result
Home AI

Implementing Machine Learning Pipelines with Apache Spark

AI by AI
June 3, 2025
in AI
Reading Time: 7 mins read
127 10
A A
0
Share on FacebookShare on XShare on LinkedIn

Jayita Gulati
2025-06-03 08:00:00
www.kdnuggets.com

Machine Learning Pipelines with Apache Spark
Image by Editor (Kanwal Mehreen) | Canva

 

Apache Spark is a tool for working with big data. It is free to use and very fast. Spark can manage large amounts of data that don’t fit in a computer’s memory. A machine learning pipeline is a series of steps to prepare data and train models. These steps include collecting data, cleaning it, selecting important features, training the model, and checking how well it works.

Spark makes it easy to build these pipelines. With Spark, companies can quickly analyze large amounts of data and create machine learning models. This helps them make better decisions based on the information they have. In this article, we will explain how to set up and use machine learning pipelines in Spark.

 

Components of a Machine Learning Pipeline in Spark

 
Spark’s MLlib library has many built-in tools. These tools can be linked together to build a complete machine learning process.

 

Transformers

Transformers change data in some way. They take a DataFrame and return a modified version of it. These are used for tasks like encoding categorical data or scaling numerical features. Examples include StringIndexer (for encoding) and StandardScaler (for scaling). Transformers are reusable and don’t change the original data permanently.

 

Estimators

Estimators learn from data to create models. They include algorithms like LogisticRegression and RandomForestClassifier. Estimators use a fit method to train on data, and they output a Model object that can make predictions.

 

Pipeline

A Pipeline is a tool to connect transformers and estimators into a single workflow. By organizing them in sequence, data flows smoothly from one step to the next. Pipelines make it easy to retrain models, repeat processes, and adjust parameters.

Let’s go through a basic example of building a classification pipeline to predict customer churn. In this pipeline, we’ll:

  1. Load the Data: Import the dataset into Spark for processing.
  2. Preprocess the Data: Clean and prepare the data for modeling.
  3. Setup the Model: Prepare the logistic regression model.
  4. Train the Model: Fit a machine learning model to the data.
  5. Evaluate the Model: Check how well the model performs.

 

Initialize Spark Session and Load Dataset

 
First, we use SparkSession.builder to set up the session. Then, we load the customer churn dataset. This churn data is about bank customers who have closed their accounts.

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("MLPipeline").getOrCreate()

# Load dataset
data = spark.read.csv("/content/Customer Churn.csv", header=True, inferSchema=True)

# Show the first few rows of the dataset
data.show(5)

 
dataset

 

Data Preprocessing

 
First, we check the data for any missing values. If there are missing values, we remove those rows to make sure the data is complete. Next, we convert categorical data into numerical format so that the computer can understand it. We do this using methods like StringIndexer and OneHotEncoder. Finally, we combine all the features into a single vector and scale the data.

from pyspark.sql import functions as F
from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler, StandardScaler

# Check for missing values
missing_values = data.select([F.count(F.when(F.isnan(c) | F.col(c).isNull(), c)).alias(c) for c in data.columns])

# Drop rows with any missing values
data = data.na.drop()  

# Identify categorical columns
categorical_columns = ['country', 'gender', 'credit_card', 'active_member']

# Create a list to hold the stages of the pipeline
stages = []

# Apply StringIndexer to convert categorical columns to numerical indices
for column in categorical_columns:
    indexer = StringIndexer(inputCol=column, outputCol=column + "_index")
    stages.append(indexer)

    # Apply OneHotEncoder for categorical features
    encoder = OneHotEncoder(inputCols=[column + "_index"], outputCols=[column + "_ohe"])
    stages.append(encoder)

label_column = 'churn'  # The label column
feature_columns = [column + "_ohe" for column in categorical_columns]

# Add numerical columns to the features list
numerical_columns = ['credit_score', 'age', 'tenure', 'balance', 'products_number', 'estimated_salary']
feature_columns += numerical_columns

# Create VectorAssembler to combine all feature columns
vector_assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
stages.append(vector_assembler)

# Scale the features using StandardScaler
scaler = StandardScaler(inputCol="features", outputCol="scaled_features", withMean=True, withStd=True)
stages.append(scaler)

 

Logistic Regression Model Setup

 
We import LogisticRegression from pyspark.ml.classification. Next, we create a logistic regression model by using LogisticRegression().

from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline

# Logistic Regression Model
lr = LogisticRegression(featuresCol="scaled_features", labelCol=label_column)
stages.append(lr)

# Create and Run the Pipeline
pipeline = Pipeline(stages=stages)

 

Model Training and Predictions

 
We split the dataset into training and testing sets. Then, we fit the pipeline model to the training data and make predictions on the test data.

# Split data into training and testing sets
train_data, test_data = data.randomSplit([0.8, 0.2], seed=42)

# Fit the model
pipeline_model = pipeline.fit(train_data)

# Make Predictions
predictions = pipeline_model.transform(test_data)

# Show the predictions
predictions.select("prediction", label_column, "scaled_features").show(10)

 
predictions
 

Model Evaluation

 
We import MulticlassClassificationEvaluator from pyspark.ml.evaluation to evaluate our model’s performance. We calculate the accuracy, precision, recall, and F1 score using the predictions from our model. Finally, we stop the Spark session to free up resources.

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# Accuracy
evaluator_accuracy = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="accuracy")
accuracy = evaluator_accuracy.evaluate(predictions)
print(f"Accuracy: {accuracy}")

# Precision
evaluator_precision = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="weightedPrecision")
precision = evaluator_precision.evaluate(predictions)
print(f"Precision: {precision}")

# Recall
evaluator_recall = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="weightedRecall")
recall = evaluator_recall.evaluate(predictions)
print(f"Recall: {recall}")

# F1 Score
evaluator_f1 = MulticlassClassificationEvaluator(labelCol=label_column, predictionCol="prediction", metricName="f1")
f1_score = evaluator_f1.evaluate(predictions)
print(f"F1 Score: {f1_score}")

# Stop Spark session
spark.stop()

 
evaluation

 

Conclusion

 
In this article, we learned about machine learning pipelines in Apache Spark. Pipelines help organize each step of the ML process. We started by loading and cleaning the customer churn dataset. Then, we transformed the data and created a logistic regression model. After training the model, we made predictions on new data. Finally, we evaluated the model’s performance using accuracy, precision, recall, and F1 score.
 
 

Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.

Source Link



Shark AI Ultra Voice Control Robot Vacuum

Transform your cleaning routine with the Shark AI Ultra Voice Control Robot Vacuum! This high-tech marvel boasts over 32,487 ratings, an impressive 4.2 out of 5 stars, and has been purchased over 900 times in the past month. Perfect for keeping your home spotless with minimal effort, this vacuum is now available for the unbeatable price of $349.99!

Don’t miss out on this limited-time offer. Order now and let Shark AI do the work for you!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: AI NEWS
Share170Tweet106Share30
Previous Post

Brianna Bryan Interview – Blue’s Clues & You: Rainbow Puppy Adventures

Next Post

ERP/Database Specialist

AI

AI

Explore the dynamic realm of AI, where breakthroughs and trends are shaping the future. Stay informed and see how AI is making an impact. Don’t miss the crucial updates—read the latest articles here at Techcratic.

Related Posts

Artificial Intelligence
AI

Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

June 11, 2025
1.3k
Artificial Intelligence
AI

7 Python Errors That Are Actually Features

June 10, 2025
1.3k
Artificial Intelligence
AI

10 Awesome OCR Models for 2025

June 6, 2025
1.3k
Artificial Intelligence
AI

5 Error Handling Patterns in Python (Beyond Try-Except)

June 6, 2025
1.3k
Artificial Intelligence
AI

Top 5 Alternative Data Career Paths and How to Learn Them for Free

June 5, 2025
1.3k
Artificial Intelligence
AI

Learn Power BI for Free This Week

June 2, 2025
1.4k
Load More
Next Post
2 HOUR JOB SEARCH

ERP/Database Specialist

Smartphone

Adobe Photoshop finally launches on Android

Klarna Has a New BNPL Debit Card That Seems Like a Credit Card. Is It Worth Getting?

Klarna Has a New BNPL Debit Card That Seems Like a Credit Card. Is It Worth Getting?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • AnandTech
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • Home
  • Apple
  • Gaming
  • Microsoft
  • AnandTech