2 Comments

  1. Jun
    Posted April 17, 2013 at 11:58 am | Permalink

    This is amazing! After having OCR’d countless german and dutch texts, I appreciate this so much. Will the software for recognizing and parsing out the sections be made open source at some point? I have noticed that Acrobat’s OCR technology is not as good as whatever Google Books uses, and that it has trouble with Serif scripts, mixing up the t’s and r’s, and the e’s and c’s.

    • Joe Shubitowski
      Posted April 18, 2013 at 10:26 am | Permalink

      Hi Jun,
      We have actually never discussed open sourcing the parsing code, but there is really no reason why we couldn’t. That said……the code is highly specific to the texts we are parsing so it is one of these “your mileage may vary” situation for being able to use the code effectively out of the box.

      I’ll talk with my development team about how we might package and document the code base to make it distributable.

      Best regards,
      Joe Shubitowski
      Head, Information Systems
      Getty Research Institute

One Trackback

Post a Comment

Your email is never published or shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • Facebook

  • Twitter

  • Tumblr

    • photo from Tumblr

      It’s only temporary!

      We’re saying “see you later” to Modern Rome as it travels abroad to reunite with its sibling paintings at the first major retrospective of Turner’s late work.

      #LateTurner

      08/29/14

  • Flickr