1. Jun
    Posted April 17, 2013 at 11:58 am | Permalink

    This is amazing! After having OCR’d countless german and dutch texts, I appreciate this so much. Will the software for recognizing and parsing out the sections be made open source at some point? I have noticed that Acrobat’s OCR technology is not as good as whatever Google Books uses, and that it has trouble with Serif scripts, mixing up the t’s and r’s, and the e’s and c’s.

    • Joe Shubitowski
      Posted April 18, 2013 at 10:26 am | Permalink

      Hi Jun,
      We have actually never discussed open sourcing the parsing code, but there is really no reason why we couldn’t. That said……the code is highly specific to the texts we are parsing so it is one of these “your mileage may vary” situation for being able to use the code effectively out of the box.

      I’ll talk with my development team about how we might package and document the code base to make it distributable.

      Best regards,
      Joe Shubitowski
      Head, Information Systems
      Getty Research Institute

One Trackback

Post a Comment

Your email is never published or shared. Required fields are marked *


You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  • Facebook

  • Twitter

  • Tumblr

    • photo from Tumblr

      #ThyCaptionBe: You Look Like Hell

      You captioned this detail. And we’re revealing the full story now.

      Escaping the in-laws or medieval Sea World? It’s actually an extreme punishment for a dress code violation. 

      Here’s the full story:

      The Christian tale of Saint Josaphat is roughly based on the life of the Buddha in a kind of medieval game of telephone, in which the sources for the text passed through Christian circles in the Middle East in the 8th century before appearing in European versions in the 11th century. 

      Here an unsuitably dressed guest—we can see that his tattered clothing and scruffy facial hair have no place at the well-dressed gathering—is cast into the dark, open mouth of a terrifying animal. 

      To make matters worse, the story is a parable in which Barlaam, Josaphat’s Christian teacher, describes the sinful who do not make the cut at the Last Judgment.

      Holiday Lesson: Always check the dress code.

      #ThyCaptionBe is a celebration of modern interpretations of medieval aesthetics. You guess what the heck is going on, then we myth-bust.


  • Flickr