Remix.run Logo
marcellodb 2 days ago

Great question, this is super important. The agents in the platform have the ability to do some degree of cleaning on your data when building a model (for example, imputing missing values). However, major improvements to data quality are generally not possible without an understanding of the data domain (i.e. business context), so you'll get better results if you "help" the platform by providing data in a reasonably clean state, answering the agent's follow-up questions in the chat, etc. By doing so you can give the agent better context and help it understand your data better, in which case it will also be more capable of dealing with things like missing values, misnamed columns etc.

This also highlights the important role of the user as a (potentially non-technical) domain expert. Hope that makes sense!