Open sourcing Dicer: Databricks's auto-sharder

charleshn an hour ago | parent | next [-]

> Application pods learn the current assignment through a library called the Slicelet (S for server side). The Slicelet maintains a local cache of the latest assignment by fetching it from the Dicer service and watching for updates. When it receives an updated assignment, the Slicelet notifies the application via a listener API.

For a critical control plane component like this, I tend to prefer a constant work pattern [0], to avoid metastable failures [1], e.g. periodically pull the data instead of relying on notifications.

[0] https://aws.amazon.com/builders-library/reliability-and-cons...

[1] https://brooker.co.za/blog/2021/05/24/metastable.html

▲

khaki54 3 hours ago | parent | prev | next [-]

Seems weird to call it sharding since it's not sharding indexed datasets or anything like that. Is this just a tool to mitigate Databricks’ internal service-scaling challenges?

	▲	atuladya 2 hours ago \| parent [-]
		Right - this is not about sharding data/datasets. This is for sharding in-memory state that a service might have. The problem of building services at low cost, high scale, low latency and high throughput is common in many environments including our services at Databricks, and Dicer helps with that.

▲

ayf 5 hours ago | parent | prev [-]

Does anyone else have something similar?

What are some use cases that you found are useful?

▲

louis-paul 3 hours ago | parent | next [-]

Sounds related to Google Slicer: https://research.google/pubs/slicer-auto-sharding-for-datace...

▲

atuladya 2 hours ago | parent [-]

It is similar to Slicer in terms of the abstraction (I built Slicer at Google) but the architecture, implementation and algorithms have a lot of differences

	▲	bigwheels 2 hours ago \| parent [-]
		Did you also work on this databricks dicery?

▲

WookieRushing 2 hours ago | parent | prev | next [-]

These show up once you have a certain scale where it is either cost inefficient or the hot spots are very dynamic. They also try to avoid latency by being eventually consistent sidecars instead of proxies.

I’ve seen them used for traffic routing, storage system metadata systems, distributed cache etc

▲

vivek-jain 5 hours ago | parent | prev [-]

Sharded in-memory caching turns out to be rather useful at scale :)

Some of the key examples highlighted on our blog are Unity Catalog, which is essentially the metadata layer for Databricks, our Query Orchestration Engine, and our distributed remote cache. See the blog post for more!