I have a couple of columns (showing only two for reference) as shown below,
Date Region
Feb 2021 North america
Jan 2021 South america
Kinsley Norway
here, you can see that the date column has a weird value instead of date and i am trying to find out what is the best way to deal with such data. You might suggest me to delete the whole row or only that value but i am not sure if it is the right way as i might lose some important information about that specific row.
Please suggest what is the best idea.
I am using Alteryx to read this data from an Excel file.
From data analytic perspective, Domain knowledge plays a crucial role in data wrangling. Sometimes, there are no missing values in the dataset but there are a lot of invalid values which we need to manually identify and remove those invalid values.
In Bold BI, if any invalid values are present in the date or integer columns, the entire column will be treated as a string column to avoid data loss. For the above sample file, all the values will be extracted into Bold BI and there will be no loss of data.
However, having invalid data will give errors upon conversion of this string column to a date column. So, these values should be manually cleaned up before we load it into Bold BI.
When dealing with bad data from a data analytics perspective, it's important to handle it appropriately to maintain data integrity and ensure accurate analysis. Here are some suggestions for dealing with the specific issue you mentioned:
By validating, cleaning, and appropriately handling bad data, you can maintain the integrity of your analysis while minimizing the risk of losing important information.
Correct, domain knowledge is indeed crucial in data wrangling and data analytics.