Text Correcting: FAQs

Frequently Asked Questions

  1. Should I correct misspelled words?  It is a bit of a judgement call.  Generally we encourage users to correct the misspellings because the system retrieves articles based on searching the computer generated text.  For example if the town Urvan is misspelled in the original newspaper as “Urvana” it may affect user’s search results.  Since most people are going to search for Uravan as “Uravan” so their search would not retrieve the article on “Urvana.”
  2. Should I correct incomplete word or word connected by hyphen, for example “depart-ment”?  If the word has a hyphen the system will search the word by remove the hyphen and join the two parts together, to give the expected “department”.  This only happens if the hyphen exists at the end of “depart-“, so you do not want to remove the hyphen in these instances. If the word is split between two lines without a hyphen, please correct it by joining the two parts together.
  3. Do I have to correct all the blank spaces and miscellaneous punctuation and symbols? You can fix those if you want but those issues do not affect the searching (except in question #2) so it is not necessary.  Some users like to cleanup those types of OCR mistakes if only for the appearance reason.  Plus it’s sometimes easy enough to do while you’re correcting other mistakes.
  4. Can anyone correct the OCR errors in the newspaper textYes, anyone can correct text.  All you have to do is sign up for a free user account.
  5. Is it important to “exit” periodically? I have been saving every five minutes, and returning to the site nearly daily to find myself still logged in and ready to go. No you do not have to exit and logout of your account.  You do have to periodically save your progress.  But the system will remind you to save every five minutes.
  6. If I absolutely cannot read a line or several words, what should I do ?  Erase the garbled OCR and leave a blank space, leave the Garbled OCR, or what? I would just leave the garble. Another user might come along and be able to read it.
