Remix.run Logo
Thoughts on the Future of Stream Processing(epsio.io)
7 points by gikl 15 hours ago | 3 comments
PaulHoule 14 hours ago | parent [-]

Alternately: a stream processing system that knows how to tear itself down in the end so it gets the same answer every time is a batch processing system.

gikl 14 hours ago | parent [-]

What do you mean by "in the end"? :) IE a stream processor that knows how to "remove" the streaming part from itself?

PaulHoule 14 hours ago | parent [-]

A streaming system has data inside of it at any time. For instance, if it is joining data together to do

https://en.wikipedia.org/wiki/Complex_event_processing

it has data that is waiting to be joined with other data that hasn't been spit out yet.

You can process a batch by streaming it through a streaming system, but you can't just shut the system down at the end, you have to flush out all the stuff that is in flight, if you want to get the right answers. That is, a streaming system is a moving target and might not be entirely consistent at a single moment in time, and that's fine. If you call it a "batch" system the calculation has a definite beginning and end, that's the difference.

I built a hybrid system that used streaming ideas to solve batch problems and had to face this problem. Then I worked at a place that was developing a system a lot like the one I built except I knew what algebra mine supported and they refused to admit theirs had an algebra, and I had learned the hard way that I had to tear down the system carefully in the end to get correct answers and they would argue with me over whether that matters -- so of course their system never got the same answer twice but they were in such a hurry they didn't really care.

Customers did.