2 Comments

  1. Jun
    Posted April 17, 2013 at 11:58 am | Permalink

    This is amazing! After having OCR’d countless german and dutch texts, I appreciate this so much. Will the software for recognizing and parsing out the sections be made open source at some point? I have noticed that Acrobat’s OCR technology is not as good as whatever Google Books uses, and that it has trouble with Serif scripts, mixing up the t’s and r’s, and the e’s and c’s.

    • Joe Shubitowski
      Posted April 18, 2013 at 10:26 am | Permalink

      Hi Jun,
      We have actually never discussed open sourcing the parsing code, but there is really no reason why we couldn’t. That said……the code is highly specific to the texts we are parsing so it is one of these “your mileage may vary” situation for being able to use the code effectively out of the box.

      I’ll talk with my development team about how we might package and document the code base to make it distributable.

      Best regards,
      Joe Shubitowski
      Head, Information Systems
      Getty Research Institute

One Trackback

Post a Comment

Your email is never published or shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  • Facebook

  • Twitter

  • Tumblr

    • photo from Tumblr

      An Infrared reflectogram of a painting by Andrea Del Sarto reveals an architectural drawing beneath. Could it be a compositional underdrawing of a Pietro Perugino painting? 

      “What an odd discovery! It was one of those moments I’ll never forget. It’s humbling to realize how little we really know about major artists who worked so long ago, and a little glimpse such as this makes that all the more apparent.” —Getty Museum Drawings Curator Julian Brooks

      Read more on this discovery on The Getty Iris here.

      09/03/15

  • Flickr