2 Comments

  1. Jun
    Posted April 17, 2013 at 11:58 am | Permalink

    This is amazing! After having OCR’d countless german and dutch texts, I appreciate this so much. Will the software for recognizing and parsing out the sections be made open source at some point? I have noticed that Acrobat’s OCR technology is not as good as whatever Google Books uses, and that it has trouble with Serif scripts, mixing up the t’s and r’s, and the e’s and c’s.

    • Joe Shubitowski
      Posted April 18, 2013 at 10:26 am | Permalink

      Hi Jun,
      We have actually never discussed open sourcing the parsing code, but there is really no reason why we couldn’t. That said……the code is highly specific to the texts we are parsing so it is one of these “your mileage may vary” situation for being able to use the code effectively out of the box.

      I’ll talk with my development team about how we might package and document the code base to make it distributable.

      Best regards,
      Joe Shubitowski
      Head, Information Systems
      Getty Research Institute

One Trackback

Post a Comment

Your email is never published or shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • Facebook

  • Twitter

  • Tumblr

    • photo from Tumblr

      Eye-to-eye with a mystery man.

      He closely resembles painter Francois Boucher, whose eyes rendered paintings like this one

      In 18th century France, terracotta busts were popular additions to the home as they were relatively inexpensive, and fit for both middle class and wealthy consumers.

      See the full picture here.

      Eye-to-eye connects the peoples of yesterday to you through art.

      Bust of a Man, about 1760, Attributed to Jean-Jacques Caffieri. J. Paul Getty Museum.

      10/01/14

  • Flickr