Why do you have to keep the dataset in memory? We had good distributed streaming datasets for a good while now, no?