CodexBloom - Programming Q&A Platform

Handling UnicodeDecodeError when reading CSV files with pandas in Python 3.10

👀 Views: 25 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-10
python pandas csv encoding unicode Python

I'm a bit lost with I'm performance testing and Hey everyone, I'm running into an issue that's driving me crazy... I've tried everything I can think of but I'm maintaining legacy code that I've searched everywhere and can't find a clear answer. I'm trying to read a CSV file using pandas in Python 3.10, but I'm running into a `UnicodeDecodeError`. The CSV file has been generated from an external system and I suspect it may have some non-UTF-8 encoded characters. Here's the code I currently have: ```python import pandas as pd file_path = 'data.csv' data = pd.read_csv(file_path) ``` When I run this, I get the following behavior: ``` UnicodeDecodeError: 'utf-8' codec need to decode byte 0x89 in position 0: invalid start byte ``` I've tried specifying different encodings using the `encoding` parameter, such as `latin1` and `ISO-8859-1`, but I still need to get it to work. For example: ```python data = pd.read_csv(file_path, encoding='latin1') ``` This also throws an behavior, but it's a different one that essentially says that it couldn't parse the file correctly. I also checked if the file was binary or corrupted, but it opens fine in a text editor. Is there a way to handle this gracefully or should I check the file's encoding beforehand? How can I read this file without running into decoding issues? For context: I'm using Python on Windows. For reference, this is a production application. The project is a web app built with Python. Any feedback is welcome! I recently upgraded to Python 3.11. Could someone point me to the right documentation? I'm working with Python in a Docker container on Debian. Thanks for your help in advance! This is my first time working with Python 3.9.