| ▲ | xmddmx 7 days ago |
| I share the author's sentiment. I hate these things. True story: trying to reverse engineer macOS Photos.app sqlite database format to extract human-readable location data from an image. I eventually figured it out, but it was: A base64 encoded
Binary Plist format
with one field containing a ProtoBuffer
which contained another protobuffer
which contained a unicode string
which contained improperly encoded data (for example, U+2013 EN DASH was encoded as \342\200\223) This could have been a simple JSON string. |
|
| ▲ | tgma 7 days ago | parent | next [-] |
| > This could have been a simple JSON string. There's nothing "simple" about parsing JSON as a serialization format. |
| |
| ▲ | Zambyte 6 days ago | parent | next [-] | | Having attempted writing a JSON parser from scratch and a protobuf parser from scratch and only completing one of them, I disagree. | |
| ▲ | wvenable 6 days ago | parent | prev [-] | | Except that most often you can just look at it and figure it out. | | |
| ▲ | tgma 6 days ago | parent | next [-] | | Sure you can look at it[1], but you're not expected to look at Apple Photos database. The computer is. Write a correct JSON parser, compare with protobuf on various metrics, and then we can talk. [1]: although to be fair, I am older than kids whose first programming language was JavaScript, so I do not think of JSON object format with property names in quotes and integers that need to be wrapped as strings to be safe, etc., lack of comma after the last entry--to be fair this last one is a problem in writing, not reading JSON--as the most natural thing | | |
| ▲ | wvenable 6 days ago | parent [-] | | I'm also "older" but I don't think that means anything. > Sure you can look at it[1], but you're not expected to look at Apple Photos database. How else are you supposed to figure it out? If you're older then you know that you can't rely on the existence or correctness of documentation. Being able to look at JSON and understand it as a human on the wire is huge advantage. JSON being pretty simple in structure is as advantage. I don't see a problem with quoting property names! As for large integers and datetimes, yes that could be much better designed. But that's true of every protocol and file format that has any success. JSON parsers and writers are common and plentiful and are far less crazy than any complete XML parser/writer library. | | |
| ▲ | tgma 6 days ago | parent [-] | | > Being able to look at JSON and understand it as a human on the wire is huge advantage I don’t think this is a given at all. Depends on the context. I think it’s often overvalued. A lot of times the performance matters more. If human readability was the only thing that mattered, I would still not count JSON as the winner. You will have to pipe it to jq, realistically. You’d do the same for any other serialization format too. Inside Google where proto is prevalent, that is just as easy if not more convenient. The point is how hard or easy it is for an app’s end user to decipher its file database is not a design goal for the serialization library chosen by Apple Photos developers here. The constraints and requirements are all on different axis. |
|
| |
| ▲ | IshKebab 6 days ago | parent | prev [-] | | Sure but unless you want to embed an LLM in every JSON library, computers can't do that. |
|
|
|
| ▲ | bobbylarrybobby 7 days ago | parent | prev | next [-] |
| https://github.com/RhetTbull/osxphotos |
|
| ▲ | fluoridation 7 days ago | parent | prev | next [-] |
| I mean... you can nest-encode stuff in any serial format. You're not describing a problem either intrinsic or unique to Protobuf, you're just seeing the development org chart manifested into a data structure. |
| |
| ▲ | xmddmx 7 days ago | parent [-] | | Good points this wasn't entirely a protobuf-specific issue, so much as it was a (likely hierarchical and historical set of) bad decisions to use it at all. Using Protobuffers for a few KB of metadata, when the photo library otherwise is taking multiple GB of data, is just pennywise pound foolish. Of course, even my preference for a simple JSON string would be problematic: data in a database really should be stored properly normalized to a separate table and fields. My guess is that protobuffers did play a role here in causing this poor design. I imagine this scenario: - Photos.app wants to look up location data - the server returns structured data in a ProtoBuffer - there's no easy or reasonable way to map a protobuf to database fields (one point of TFA) - Surrender! just store the binary blob in SQLITE and let the next poor sod deal with it | | |
| ▲ | tgma 6 days ago | parent [-] | | You have to take into account the fact that iPhoto app has had many iterations. The binary plist stuff is very likely the native NSArchive "object archiving (serialization)" that is done by Obj-C libraries. They probably started using protobuf at some point later after iCloud. I suspect the unicode crap you are facing may even predate Cocoaization of the app (they probably used Carbon API). So it would make it a set of historical decisions, but I am not convinced they are necessarily bad decisions given the constraints. Each layer is likely responsible for handing edge cases in the application that you and I are not privy to. |
|
|
|
| ▲ | pjjpo 6 days ago | parent | prev | next [-] |
| The JSON version would have also had the wrong encoding - all formats are just a framing for data fed in from code written by a human. In mac's case, em dash will always be an issue because that's just what Mac decided on intentionally. |
|
| ▲ | seanw444 7 days ago | parent | prev | next [-] |
| That's horrendous. For some reason I imagine Apple's software to be much cleaner, but I guess that's just the marketing getting to my head. Under the hood it's still the same spaghetti. |
| |
| ▲ | ninkendo 6 days ago | parent [-] | | Yeah, the problem is Apple and all the other contemporary tech companies have engineers bounce around between them all the time, and they take their habits with them. At some point there becomes a critical mass of xooglers in an org, and when a new use case happens no one bothers to ask “how is serialization typically done in Apple frameworks”, they just go with what they know. And then you get protobuf serialization inside a plist. (A plist being the vanilla “normal” serialization format at Apple. Protobuf inside a plist is a sign that somebody was shoehorning what they’re comfortable with into the code.) |
|
|
| ▲ | 05 6 days ago | parent | prev [-] |
| It that's any consolation, in the current version's schema they are just plain ZLATITUDE FLOAT, ZLONGITUDE FLOAT in ZASSET table.. |