| ▲ | jcgrillo 5 hours ago | ||||||||||||||||
> This can be a ton of data though, so we're trying to figure out what to compress and how. We also have the challenge of figuring out how to scrub logs of any potentially sensitive information. This is fundamentally a data modeling problem. Currently computer telemetry data are just little bags of utf-8 bytes, or at best something like list<map<bytes, bytes>>. IMO this needs to change from the ground up. Logging libraries should emit structured data, conforming to a user supplied schema. Not some open-ended schema that tries to be everything to everyone. Then it's easy to solve both problems--each field is a typed column which can be compressed optimally, and marking a field as "safe" is something encoded in its type. So upon export, only the safe fields make it off the box, or out of the VPC, or whatever--note you can have a richer ACL structure than just "safe yes/no". I applaud the industry for trying so hard for so long to make everything backwards compatible with the unstructured bytes base case, but I'm not sure that's ever really been the right north star. | |||||||||||||||||
| ▲ | quesera 4 hours ago | parent [-] | ||||||||||||||||
Grand solutions require broad coordination, and they often devolve back into a modified-but-equivalent version of the previous problem. :( Stream-of-bytes is classically difficult model to escape. Many have tried. | |||||||||||||||||
| |||||||||||||||||