I really never understood how people could store very important information in ES like it was a database.

Even if they don't understand what ES is and what a "normal" database is, I'm sure some of those people run into issues where their "db" got either corrupted of lost data even when testing and building their system around it. This is and was general knowledge at the time, it was no secret that from time to time things got corrupted and indexes needed to be rebuilt.

Doesn't happen all the time, but way greater than zero times and it's understandable because Lucene is not a DB engine or "DB grade" storage engine, they had other more important things to solve in their domain.

So when I read stories of data loss and things going South, I don't have sympathy for anyone involved other than the unsuspecting final clients. These people knew or more or less knew and choose to ignore and be lazy.

▲

kentm 11 hours ago | parent | next [-]

> I really never understood how people could store very important information in ES like it was a database.

I agree.

Its been a while since I touched it, but as far as I can remember ES has never pretended to be your primary store of information. It was mostly juniors that reached for it for transaction processing, and I had to disabuse them of the notion that it was fit for purpose there.

ES is for building a searchable replica of your data. Every ES deployment I made or consulted sourced its data from some other durable store, and the only thing that wrote to it were replication processes or backfills.

▲

vjerancrnjak 10 hours ago | parent | prev | next [-]

They market it as a general purpose store. Successfully, even though hc cs wizards wouldn’t touch it ever, c suite likes it

Best example is IoT marketing, as if it can handle the load without bazillion shards, and since when does a text engine want telemetry

▲

simianwords 8 hours ago | parent | prev | next [-]

usually in companies, people have a main durable store of information that is then streamed to other databases that store a transformation of this data with some augmentation.

these new data stores don't usually require that level of durability or reliability.

▲

WASDx 10 hours ago | parent | prev | next [-]

I've managed a 100+ node cluster for years without seeing any corruption. Where are you getting this from?

▲

wdfx 9 hours ago | parent [-]

I'm actually struggling to imagine exactly what warrants a 100+ node cluster of ES?

	▲	simianwords 8 hours ago \| parent [-]
		we had something like this to scale out for higher throughput. just in the 10's of thousands requests per second required 100+ nodes simply because each query would have a expensive scatter and gather

▲

gloryjulio 10 hours ago | parent | prev [-]

We only used it on top of the primary databases, just like many other components for scaling or auxiliary functionalities. Not sure how others use it