Remix.run Logo
freehorse 5 hours ago

I don't think these issues are close to the issues the article talks about. The author does not talk about data coverage, data collection methodologies or missing values or whatever, but data that is actually wrong, ie location coordinates, prices, numbers that make no sense. Including swapping latitude/longitude and wrong decimal points in numbers.

On the other hand, I agree that bad (but usually fixable) data is better than no data.

stared 4 hours ago | parent [-]

Yep, expect in real data actually confusing columns, NaNs casted to values like 1673, duplicates, etc, etc.

I prefer to get data with swapped lat/lng (a trivial fix), or prices said in dollars but being in cents, to no data.