▲ | danbruc 7 days ago | |||||||
With such a system, how do you ensure that the extracted data matches the data in the source document? Run the process several times and check that the results are identical? Can it reject inputs for manual processing? Or is it intended to be always checked manually? How good is it, how many errors does it make, say per million extracted values? | ||||||||
▲ | glorpsicle 7 days ago | parent [-] | |||||||
Perhaps there's still value in the documents being transformed by this tool and someone reviewing them manually, but obviously the real value would be in reducing manual review. I don't think there's a world–for now–in which this manual review can be completely eliminated. However, if you process, say, 1 million documents, you could sample and review a small percentage of them manually (a power calculation would help here). Assuming your random sample models the "distribution" (which may be tough to define/summarize) of the 1 million documents, you could then extrapolate your accuracy onto the larger set of documents without having to review each and every one. | ||||||||
|