I'm a bit confused here, do they have a single database they're writing to? Wouldn't it be easier and more reliable to shard the data per customer?

▲

hinkley 9 hours ago | parent | next [-]

When one customer is 50 times bigger than your average customer then sharding doesn't do much.

	▲	BatteryMountain 9 hours ago \| parent \| next [-]
		Combination of partitioning + sharding perhaps? Often times its is only a handful of tables that grows large, so even less so for a single large customer, thus sharding that customer out and then partitioning the data by a common/natural boundary should get you 90% there. Majority of data can be partitioned, and it doesn't have to be by date - it pays dividends to go sit with the data and reflect what is being stored, its read/write pattern and its overall shape, to determine where to slice the partitions best. Sometimes splitting a wide table into two or three smaller tables can work if your joins aren't too frequent or complex. Can also help if you can determine which of the rows can be considered hot or cold, so you move the colder/hotter data to a separate tables to speed up read/writes. There are always opportunities for storage optimization large datasets but it does take time & careful attention to get it right.
	▲	PunchyHamster 7 hours ago \| parent \| prev [-]
		It does if you have thousands of customers.

▲

atsjie 10 hours ago | parent | prev | next [-]

I wouldn't call that "easier" perse.

▲

thayne 8 hours ago | parent | prev [-]

Sharding is often not easy. Depending on the application, it may add significant complexity to the application. For example, what do you do if you have data related to multiple customers? How do you handle customers of significantly different sizes?

And that is assuming you have a solution for things like balancing, and routing to the correct shard.

▲

nextaccountic 2 hours ago | parent [-]

deja vu

did you comment exactly the same things some months ago?

	▲	thayne 22 minutes ago \| parent [-]
		Not that I recall