Cornellius Yudha Wijaya
2025-05-21 08:00:00
www.kdnuggets.com

Image by Author | Ideogram.ai
CSV, or comma-separated values, is a file format used to store tabular data. Each line represents a data entry, and commas separate the individual fields within the data. It’s one of the most common file extensions for data and one of the simplest formats for data exchange within professional environments.
As a data professional with Python knowledge, I am sure everyone has tried to read and load data using the csv module. Usually, that’s all we do with the csv module: loading data and proceeding with other tasks.
For example, I read the following CSV file of Social Sentiment Data from Kaggle with the csv module and showed all the columns.
import csv
with open('sentimentdataset.csv', newline="", encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
header = next(reader)
print("Columns:", header)
With the output like the following:
Columns: ['', 'Unnamed: 0', 'Text', 'Sentiment', 'Timestamp', 'User', 'Platform', 'Hashtags', 'Retweets', 'Likes', 'Country', 'Year', 'Month', 'Day', 'Hour']
However, there is so much more you can do with the csv module that you might not know. In this article, we will explore all the surprising things you can do with the csv module.
1. Auto-Detect Format
The csv module is intended to work with files in comma-separated format; however, using the Sniffer
method, you can use the module to detect how the data format was separated. You can detect the data structure (dialect) before you read it thoroughly.
For example, here is how we try to detect the dialect with the csv module.
import csv
with open('sentimentdataset.csv', newline="", encoding='utf-8') as f:
sample = f.read(2048)
dialect = csv.Sniffer().sniff(sample, delimiters=[',',';','\t'])
print(f"Detected delimiter: {repr(dialect.delimiter)}")
The result will be like the following output.
In the code above, we provide a sample from the first 2 KB of the data and the delimiters we want to detect. The result is the delimiter in the file detected by the module.
2. Header Detection
The CSV module can detect not only the file format but also whether the file contains a header.
We can do the detection with the following code.
has_header = csv.Sniffer().has_header(sample)
print("Header detected?" , has_header)
The result is shown in the output below.
It seems simple, but there are many cases where the csv file you have doesn’t contain the necessary headers, which means we cannot understand our data. It’s a great addition to your data pipeline for detecting mistakes when reading the file.
3. Reading Data as a List
When we read the file with the csv module, we can structure the result in the desired format. One way to achieve this is to convert each data point into a list format, which we can easily accomplish with the following code.
with open('sentimentdataset.csv', newline="", encoding='utf-8') as f:
reader = csv.reader(f, dialect)
header = next(reader)
for i, row in enumerate(reader):
if i >= 1: break
print(row)
The result is shown in the output below.
['0', '0', ' Enjoying a beautiful day at the park! ', ' Positive ', '2023-01-15 12:30:00', ' User123 ', ' Twitter ', ' #Nature #Park ', '15.0', '30.0', ' USA ', '2023', '1', '15', '12']
You can see that each data row is now presented as a list and can be processed for any further data work.
4. Map Column Names to Values
Using the csv module, we can transform each piece of data into a format similar to the dictionary data format. Essentially, we can map each column name to its corresponding value, allowing us to access it with the column name as the key.
For example, here is how we could automatically assign the column name to the value for columns Text and Sentiment,
with open('sentimentdataset.csv', newline="", encoding='utf-8') as f:
dict_reader = csv.DictReader(f, dialect=dialect)
for i, row in enumerate(dict_reader):
if i >= 2: break
print(row['Text'], row['Sentiment'])
The result is shown in the output below.
Enjoying a beautiful day at the park! Positive
Traffic was terrible this morning. Negative
The code above shows that we access each value in the data in a key-value relationship. This method allows us to process the data more flexibly.
5. Transform CSV file into Another Format
The csv module is not only about reading the file; it could also be about reformatting the file’s output format.
For example, you can transform your file into gzip format.
import csv, gzip
with gzip.open('sentiment.gz', 'wt', newline="", encoding='utf-8') as gz:
writer = csv.writer(gz)
for row in csv.reader(open('sentimentdataset.csv', encoding='utf-8'), dialect=dialect):
writer.writerow(row)
You can even transform the file into standard output like below.
import csv, sys
dialect = csv.Sniffer().sniff(sample, delimiters=[',',';','\t'])
writer = csv.writer(sys.stdout)
for row in csv.reader(open('sentimentdataset.csv', encoding='utf-8'), dialect=dialect):
writer.writerow(row)
Use the writer correctly to help your work transform them into the file format you need.
6. Quote Non-Numeric Values
In CSV files, fields can contain commas, quotes, or mixed data (text and numbers). By wrapping a value in double quotes, we force the data to be a single-quoted string in the file, ensuring that anything inside (even commas or line breaks) is treated as part of the values, not as a separator.
We can do the above using the following code.
import csv
INPUT = 'sentimentdataset.csv'
OUTPUT = 'quoted_nonnum.csv'
with open(INPUT, newline="", encoding='utf-8') as fin, \
open(OUTPUT, 'w', newline="", encoding='utf-8') as fout:
reader = csv.DictReader(fin)
writer = csv.writer(fout, quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(['Text', 'Likes'])
for row in reader:
writer.writerow([row['Text'], row['Likes']])
In the code above, we are selecting the Text and Likes columns while quoting all non-numeric values and keeping the numeric values as they are. This way, we can consistently quote the data values to avoid being detected as separators.
Conclusion
As data professionals, we can manipulate CSV files using the Python csv module. However, there are surprising things you can do with this module, including format detection, data format conversion, and much more.
I hope this has helped!
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.
Transform your cleaning routine with the Shark AI Ultra Voice Control Robot Vacuum! This high-tech marvel boasts over 32,487 ratings, an impressive 4.2 out of 5 stars, and has been purchased over 900 times in the past month. Perfect for keeping your home spotless with minimal effort, this vacuum is now available for the unbeatable price of $349.99!
Don’t miss out on this limited-time offer. Order now and let Shark AI do the work for you!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.