CHNC’s Text Correction Champions

Forums OCR Text Correction CHNC’s Text Correction Champions

This topic contains 0 replies, has 1 voice, and was last updated by  Regan 4 months, 3 weeks ago.

  • Post
    Regan
    Keymaster

    Correcting OCR makes the details easier to find.

    The Colorado Historic Newspapers Collection (CHNC) is an outstanding resource for researchers and genealogists to learn about the daily activities of our Colorado history, and learn about the people who have contributed to making Colorado what it is today. Currently at over 1,400,000 pages of historic newspaper content, the CHNC provides a window into the past that cannot be found in text books, or reference material. Newspapers were how individuals and society shared there activities, their concerns, their viewpoints, and their beliefs. If you look at the society columns in these publications, you would think that it was the Facebook of their time.

    Optical Character Recognition (OCR) is one of the main ways that users and researchers can find the details of individuals and incidents of the past. OCR, is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. In the CHNC, all of the newspaper article images are converted into machine readable text, which is then used during the search process to retrieve relevant information for words entered in the search box. This OCR is the key to something being findable in the CHNC or not findable.

    The honest truth is that much of the OCR in the CHNC database is not very good. There are many reasons for this – some of which relate to the capabilities of the OCR generating software at the time of digitization, or the quality of the paper digitized was not very good with difficult to decipher font or ink bleed through from the other side. Regardless of the reason for the poor quality – this OCR can be corrected. The CHNC software has a built in feature that allows individuals like you the ability to correct the OCR text, making it a more accurate representation of the actual text of the newspaper article – and therefore infinitely more discoverable using the normal keyword search techniques.

    The CHNC has over 490 users correcting text for articles contained within the online collection. The system records the corrections of each of these correctors and keeps track of their activity. Collectively, these CHNC users have corrected over 2,728,531 lines of OCR text. With each correction made, the information from that article or newspaper page becomes easier to find through searching, and more valuable to other researchers following after.

    We warmly thank everyone who make even a single correction to the OCR within the CHNC database, as each correction makes even a single line of text or a word more discoverable to everyone else, but there are several champions that have made significant contributions to the database. These contributions take the form of over 100,000 lines of text corrected each. One individual has corrected more than 600,000 lines of text. We are so grateful for the dedication and contribution of our users. You all collectively make the database so much more valuable to everyone else. And a special thanks to our top six “Journeymen Editors” for giving so much back to the rest of the online community.

    Why do people correct? There are so many reasons – but here is just a sample:

    When I am correcting text, I feel like I am bring[ing] the people and events back to life, if only for a moment.

    For me, originally I was looking for information on grandparents in Routt County. … Since then I just correct because I realize that there are other people who are looking for their family histories as well.

    For those willing to donate their time and give back to the service in this way – here are some rewards that we offer:

    • Over 100,000 lines of text corrected – Acknowledgement on the Contributors list
    • Over 300,000 lines of text corrected – A small gift from CHNC staff
    • Over 500,000 lines of text corrected – a reel of microfilm for a title and time of your choice digitized and added to CHNC.

    None of these acts of appreciation begin to touch upon the gratitude that we feel for your involvement and contribution. W appreciate you and all you do and contribute to our collective resource. Thank you.

    Become a Journeyman Editor with the Colorado Historic Newspapers Collection. It’s easy, it’s fun, and it is so valuable to others throughout the state, the nation and the world.

    • This topic was modified 4 months, 3 weeks ago by  Regan.

You must be logged in to reply to this topic.

Skip to toolbar