Remix.run Logo
cpard 4 hours ago

Replicating the Postgres WAL to S3 and Iceberg reliably is a hard problem but it’s not accurate to say that no ETL is needed here.

maybe you can say it’s more of an ELT pattern but anyone who’s interested into using this for realistic analytics they will have to transform the data at some point.

If an org is early enough to think that they can use a solution like this and just get in duckdb and start spitting out reports, they will be up for a really bad experience.

Please educate people to do the right thing and realize the scope of the work they are facing, it might feel that it hurts your growth in the short term but it will benefit you greatly in the mid-long term as a vendor.

kikimora 3 hours ago | parent | next [-]

IDK, AWS Zero ETL from Autora into Redshift really helped us at some point. You right that data transformation is very limited if not possible. But having data in an analytical store, being able to experiment with queries, understand what is wrong with your OLTP schema and then build ETL is way better than doing an upfront design.

cpard an hour ago | parent [-]

Of course it is. What you describe is one of the reasons that ELT became popular, if you couple it with a variant type and schema on read, you have a very powerful and flexible architecture.

But there’s no free lunch, building and maintains data infrastructure that is reliable requires work. Many companies don’t realise that when they start their analytical journey and aggressive marketing doesn’t help. That’s the point I was trying to make.

kikimora 32 minutes ago | parent [-]

I don’t disagree, just placing emphasis on a different aspect.

In an ideal world there is a tool that moves your schema into an analytical store “as is” with a single click. Then the same tool lets you add arbitrary transformations of the data. Surprisingly I have not come across such a tool. It is earthier “one click to move your data” or “any transformation you want” but only after a significant upfront investment :(

cpard 9 minutes ago | parent [-]

I think I didn’t articulate myself very well on my reply. I actually wanted to say that I agree with you and emphasise again the need for educating users for the complexity of these projects.

What you describe has been pitched by many different products for different parts of the data platform. Fivetran for example claims to do that for the extraction and loading part, good old Informatica was offering the ETL in a graphical interface etc.

The problem that many teams ended up having is the explosion of the tooling needed by data teams.

vira28 2 hours ago | parent | prev [-]

[flagged]