I've experienced this problem but I haven't come across papers about it. For this context, it would be interesting to compare the accuracy of transcribing one page at a time to batches of n pages.