Tagged: Text Correction
August 2, 2018 at 7:54 am #1836
Hi all — Not sure if this is the right forum for the question, but:
In the past few days, I’ve come across a few papers that have multiple scans of the same page. Here’s one example (two copies of page 1):
In some cases it’s a bit beneficial, because there are sections on each page where one part of the scan is clearer than on the other page. In other words, it’s not a case of one page being better and one being worse. How are you handling cases like this? Are you copying and pasting transcriptions so that both are transcribed?
August 2, 2018 at 2:49 pm #1837
Thanks for letting me know that you are encountering duplicate pages within the database. Although we try very hard to avoid this issue – the microfilm that we use to make the digital copies often has duplicate pages included, and there was no way to exclude retakes from the filming process. These dups need to be discovered after scanning and removed – and it appears we are not catching them all. It would be great if you could send us the information regarding the title/year-day/page when you come across these – so that we can continue to weed pages as necessary.
Regarding the OCR – since you are the first person to point this problem out to us, we do not have a procedure in place for migrating the OCR from one to another. However – you are spot on regarding the process – if one were to consolidate the accurate and helpful OCR to one page in preparation for the other page to be removed, cutting and pasting between the two records would be the way to go. You can, of course, do this yourself if you like – as OCR correction is open to everyone, or you can send the duplicate information to us and we will take care of it on our end. It is up to you.
Let me know how you would like to proceed – and you can send corrections/edits directly to me if you like at firstname.lastname@example.org.
I look forward to hearing from you!
August 3, 2018 at 7:03 am #1841
Thanks Regan! This is super helpful. Going forward, I’ll consolidate the corrected text onto one page, then email you about which pages to delete if that works. Or if you want to start a master forum post where folks can list pages to delete that also works.
[It actually is kind of nice when there are two copies of a page, since as I mentioned, sometimes each one has sections that are clearer than others.]
August 3, 2018 at 7:12 am #1842
Thanks again for your help with this. I will talk with Leigh Jeremias, the manager of the Historic Newspapers Collection and see what she wants to do about this going forward. I think you have some great ideas, and it is worth us putting our heads together and seeing what can be done.
Happy text correcting!
You must be logged in to reply to this topic.