Remix.run Logo
tombert 3 hours ago

I'm a little sad that I got laid off from a previous job for a variety of reasons, but one big one was that there were discussions of letting me open source some very big changes I had made to the Kafka Streams library.

I rewrote a lot of stuff while keeping the API mostly compatible, focusing on emphasizing non-blocking IO with backpressure semantics available if necessary. It was really cool and enabled a lot of interesting stuff involving the state store and mixing+matching blocking and non-blocking IO in a way that was still relatively performant. I think it was really neat and it's one of the projects I am most proud of because I was able to squeeze out performance in a lot of places that were non-obvious.

I was pushing to allow us to release it to Github and/or make a PR to the upstream Kafka Streams project, but sadly they did layoffs before that was completed and afterwards there was really no "champion" to do that, so it's stuck in proprietary land.

I might still do it from scratch and FOSS it, it's been long enough to where I think I wouldn't get in trouble if I rewrote it and released it (there weren't any patents or anything attached to it), and there are a few things I'd like to change anyway (like getting rid of the dependency of Vert.x). Maybe if I ever get a week off I'll do that.

sheepscreek 3 hours ago | parent | next [-]

Fixing a bug in something open-source should be acceptable to most employers. However, if new functionality is being added, then it becomes an entirely different conversation.

I think it is good that you were taking the legal + compliance sign-offs before pursuing it.

tombert 2 hours ago | parent [-]

When I worked at Apple, they were extremely strict about contributions to FOSS stuff, even on your own time, even for simple stuff like bugfixes or opening a Github issue.

I am sure they have their reasons even if I don't agree with them, but it's made me very cautious about making PRs and the like while working at BigCos and making outside contributions to FOSS stuff.

This even more so, though, because of course I was doing it on company time, so I wouldn't really blame them for wanting to audit stuff to ensure I'm not divulging company secrets and the like.

wiether 2 hours ago | parent [-]

> even on your own time, even for simple stuff like bugfixes or opening a Github issue

During a recruitment process with a company a few years ago, they quite early asked for my GH profile and complained that I hadn't much content available.

Later, they asked me to do a small exercise and put it on my GH account.

When they sent me the contract, there was a clause telling that I would work for them exclusively and would not be allowed to contribute to anything else than company projects, even in my own time.

I didn't signed, and every person in the process seemed unable to understand what was wrong.

em-bee an hour ago | parent [-]

fortunately that's illegal in many jurisdictions, but i still would not sign unless they removed that passage.

i had one contract where i was able to replace the standard "we own all your work" passage into: "all your work will be released under the GPL"

vibbix 2 hours ago | parent | prev [-]

This would be amazing to see upstreamed! I did some similar work for optimizing batch calls for the Quartz Scheduler. I left before it got merged, but was able to follow up and get it in as it was already PR'd.

tombert an hour ago | parent [-]

Yeah, as I said I wouldn't mind reimplementing it anyway. I still more or less remember what I did so I don't think it would be too hard for me to reimplement if I get enough free time.

Kafka Streams is one of those things that I think is like 95% cool, but there are some design decisions I have mixed feelings on. One big thing is that while it does give an API for making new state stores out of the box and it's not too hard to write that, it is all dependent on blocking IO. This isn't that big of a deal with the built in RocksDB store because the latency isn't that high for it, so you can get away with everything being blocking, but if you want to substitute another store (e.g. Redis or even something like PostgreSQL), the naive version has you dealing with round trip latency for every item for a join, which can two or three orders of magnitude more expensive.

Less naively you can implement batching and the like in your driver, which is what my first version did, but you do eventually have to add blocking, and the Kafka Streams library doesn't really utilize virtual threads so you're paying the full cost of it at the end. Eventually I just found it more elegant to make Kafka Streams non-blocking-aware and adding built-in semantics for batching to automatically amortize the cost of these things.

Anyway, sorry, just kind of miss working on that project. I really should redo it.