▲ | jakupovic 21 hours ago | |
I would like to know more. For example how did you get the county records? | ||
▲ | mapsperson 19 hours ago | parent [-] | |
One at a time. The county is the sole unit of authority for land records in the US (with a few exceptions). Luckily, these days, most of them publish this data via web services or APIs. I was able to automate a big chunk of this work by crawling county websites and looking for these web services that I could download from. But there is no agreed-upon schema standard -- they all store the data in different formats, schemas, etc. About 50% of the effort in maintaining a dataset like this is maintaining the mappings from the source data to the target schema. That's where I am making heavy use of LLMs. This turns out to be something they are very good at. I found gemma3 to have the best balance of reliability, ease of use, and speed for my use case. |