Remix.run Logo
tnt128 2 days ago

In the demo, you didn’t show the process of cleaning and labeling data, does your product do that somehow, or do you still expect the user to provide that after connecting the data source.

marcellodb 2 days ago | parent | next [-]

Great question, this is super important. The agents in the platform have the ability to do some degree of cleaning on your data when building a model (for example, imputing missing values). However, major improvements to data quality are generally not possible without an understanding of the data domain (i.e. business context), so you'll get better results if you "help" the platform by providing data in a reasonably clean state, answering the agent's follow-up questions in the chat, etc. By doing so you can give the agent better context and help it understand your data better, in which case it will also be more capable of dealing with things like missing values, misnamed columns etc.

This also highlights the important role of the user as a (potentially non-technical) domain expert. Hope that makes sense!

vaibhavdubey97 2 days ago | parent | prev [-]

We have a data enricher feature (still in a beta mode) which uses LLMs to generate labels for your data. For cleaning and feature engineering, we use agents that automatically handle it for you once you've connected your data and defined your ML problem.

P.S. Thanks for the feedback on the video! We'll update it to show the cleaning and labelling process :)