| ▲ | rfw300 13 hours ago | ||||||||||||||||
The author (author's operator?) does not understand the data they are working with. And in doing so, they inadvertently make the case against their own "dark factory" nonsense. For one, nothing about this project makes "every law" a commit. It just takes the _annual_ snapshots published by the House clerk and diffs chunks of those files against each other. A project which actually traced the edits in each annual snapshot to a specific passed bill would be incredibly cool (and is probably tractable now for the first time with current AI agents). This is not that! All this does, as far as I can tell, is parse a set of well-structured XML files into chunks and commit those chunks to Git. It's not literally nothing, but it's something that the author's own README credits multiple people doing years ago with ~100 line Python scripts. I don't mean to be overly harsh. But this is exactly the problem with treating your software as a "factory": you release something you do not understand, in a domain you did not care to learn. And we are all the poorer for it. | |||||||||||||||||
| ▲ | nickvido 12 hours ago | parent [-] | ||||||||||||||||
Oof. You’re not totally wrong. I’ve parsed XML with XSDs since the days of Java. I looked at the 100 line Ruby implementation of parsing these files and thought “ack. (Not ACK) why do I need all of this?!” Well it has a data loader, and hits APIs with retry logic, and has a CLI that can take arguments to run data downloads that can resume on fail, and yeah it parses the stupid XML with a “chapeau” tag - did you know that is French for hat? There is a tag that is the “hat” for a section and it is just like another title basically. So yeah, I would’ve had to learn all of that. But it also tests all of these things with actual tests. And the adversary complains if you write a test that isn’t actually testing anything meaningful. And if I needed to, I could reason about the architecture by reading the architecture design documents, which I have done at least a little bit and they are pretty nice, I have to admit. Anyways - it’s a next step in the evolution of the laws in GitHub which is actually interesting to see them change and imagine what we can do with more data overlayed. Sadly the other repos were not maintained so this is the latest laws and you can view the diff from one Congress to another. Or you can git blame one of the files and see how old certain sections are. The data we have right now only goes back to 2013. | |||||||||||||||||
| |||||||||||||||||