Text Correcting: FAQs

Frequently Asked Questions

Where do I start?  Many correctors start with newspapers from their chosen area – either they live there or their ancestors lived there. However some correctors don’t base their corrections off a geographic area but rather a subject area they are researching – so they will correct articles based on subject regardless of what newspaper title the article appears. All users are welcome to correct in whatever way they would like. However, if you are looking for titles in need, any recent addition to CHNC (see our home page – New News) will likely need some amount of correcting. Those titles will always be a good place to start.

The CHNC database also includes statistics on articles, pages and issues that are the mostly complete.  See here.  This information can help guide your corrections as well as help us complete the corrections.

How do I know what has already been corrected?  In many cases it’s fairly obvious that the text has not been corrected. For example in the below, the OCR transcribed text is gibberish due to a poor microfilming process. Even with good OCR transcribed text, if you see random characters in the text (such as, / | / ), it likely hasn’t been corrected since most correctors will remove those characters.

In the example of the Aurora Democrat below, one can tell with just a cursory look that the transcribed text looks pretty good. Also, notice above the article title the list of “Contributors.” This means those 3 users have done some amount of correcting in the article.

When you are in text correcting mode, you will see a checkbox that notes whether the block of text has been completely corrected or not. Currently this information only displays when you are in text correcting mode.

Should I correct all the text of a given paragraph by typing the text into one line of text correcting interface?  It is easier for me if I do it this way.  We understand that it may seem time consuming to correct the text line by line.  However there is a reason they system requests you do it this way.  Correcting the text line by line ensures that the search results highlight the correct line of text. 

For example – Correcting line by line and the resulting search results

Example – Typing all of the corrected text into one line and the resulting search results.  All the corrected text of the two  paragraphs have been placed in the 3rd line of the text correcting window.

The system then believes the requested search term “scalp almost torn” is in the 3rd line of the first paragraph rather then 2nd to last line of the second paragraph. This results in an error in the highlight search results and causes user confusion.  

Should I correct misspelled words?  It is a bit of a judgement call.  Generally we encourage users to correct the misspellings because the system retrieves articles based on searching the computer generated text.  For example if the town Urvan is misspelled in the original newspaper as “Urvana” it may affect user’s search results.  Since most people are going to search for Uravan as “Uravan” so their search would not retrieve the article on “Urvana.”

Should I correct incomplete word or word connected by hyphen, for example “depart-ment”?  If the word has a hyphen the system will search the word by remove the hyphen and join the two parts together, to give the expected “department”.  This only happens if the hyphen exists at the end of “depart-“, so you do not want to remove the hyphen in these instances. If the word is split between two lines without a hyphen, please correct it by joining the two parts together.

Do I have to correct all the blank spaces and miscellaneous punctuation and symbols? You can fix those if you want but those issues do not affect the searching (except in question #2) so it is not necessary.  Some users like to cleanup those types of OCR mistakes if only for the appearance reason.  Plus it’s sometimes easy enough to do while you’re correcting other mistakes.

Can anyone correct the OCR errors in the newspaper textYes, anyone can correct text.  All you have to do is register for a free user account.

Is it important to “exit” periodically? I have been saving every five minutes, and returning to the site nearly daily to find myself still logged in and ready to go. No you do not have to exit and logout of your account.  You do have to periodically save your progress.  But the system will remind you to save every five minutes.

If I absolutely cannot read a line or several words, what should I do ?  Erase the garbled OCR and leave a blank space, leave the Garbled OCR, or what? I would just leave the garble. Another user might come along and be able to read it.

 

Skip to toolbar