I am doing some data scraping from websites, now my question is, what is the best strategy for data cleaning for fields to persist them in a database (e.g like MySQL?)
I am working mainly with Python and SQLAlchemy, the data fields are often not in a unique prepared form and can have values like “€ 29.30” for price. Should I parse the numbers only beforehand in Python, or is there a way to pass them directly to the database table in some way? Considering that I possibly want to use the numbers for later to do some data analysis, I should persist them as float or similar in the database. Values can also be empty or have string values if they are not existent.. So, what is the best approach to clean those data if you want to persist them in a table, or is there a “general” best practice guide out there or a book that someone can recommend me?