Eight hardcover books (one brown; one blue; and five in forest green) sit vertically on a beige metal bookshelf. Gold lettering on their spines display their titles

Early print volumes of some of the sales indexes that are now part of the Provenance Index databases.

The Getty Provenance Index® is a highly regarded digital humanities resource that documents the history of collecting and art markets across four centuries. A team at the Getty is at work on a three-year project to transform its 1.7 million records into linked open data, which will be accessible through a new search tool starting in 2019. Here research database editor Eric Hormell provides insight into the on-the-ground work of data standardization, which is a critical component of the project. —Ed.

A key part of the project to remodel the Getty Provenance Index is the transformation of our datasets into Linked Open Data (LOD). Transforming to LOD will allow our data to be more easily discovered on the web and more easily linked to data sets at other institutions. But this transformation to LOD required us to standardize many of our data elements that had previously been left to be understood in the context of the surrounding data.

As a database editor for the Provenance Index, I focus on data from the sales projects, which is the largest group of records in the Index. In this post I’ll describe some of the thorny data-standardization issues involved in the move to LOD through the lens of a single group of fields—ones that presented interesting challenges.

Getty Provenance Index

A Brief History of the Sales Projects

If you’re not familiar with the Getty Provenance Index, here’s a bit of history. Founded in the early 1980s by Burton Fredericksen, the first curator of paintings for the Getty Museum, the Provenance Index has grown over the past 30-plus years into a collection of online databases with 1.7 million records. Database records represent art objects drawn from archival source documents such as auction catalogs, inventories, and dealer stock books, ranging from the seventeenth to the twentieth centuries. This data can be used to trace the ownership of works of art and to examine patterns in collecting and art markets in Europe and the US.

The largest group of records is what we call the sales projects, which currently include 1.25 million object records from about 16,000 sales catalogs. The sales projects began in the 1980s as an attempt to record all the paintings appearing in British sales catalogs in the nineteenth century. The work was done decade by decade, starting from 1801, and, so far, we’ve gotten up to 1840. This work was followed by projects for French, Belgian, and Dutch sales. Then, through collaborations with other institutions, we produced projects for earlier sales, from the seventeenth and eighteenth centuries, and most recently, a twentieth-century German sales project. We also started to include additional object types beyond paintings, such as drawings and sculptures. You can see an overview of what’s covered in the sales projects here.

These sales can all be searched together online, but they were originally developed as individual projects over four decades, with the first projects produced as print publications. One of our challenges has been to make sure that all of these individual datasets work together and adapt to changing formats and technology. This challenge has continued into the current remodel project. Below you can see examples of how our data looked in our early print volumes compared to our current web-based platform.

Part of a printed page shows a list of works by Hendrik van Balen and Jan Breughel.

Sale lots appearing in an early print volume of the Provenance Index sales indexes.

A screenshot from the Getty Provenance Index Databases of the sale for Lot 0039 from Sale Catalog Br-862, which was for a painting by Jan Brueghel the Elder and Hendrik van Balen, The Holy Family in a landscape with angels, a pleasing composition, finely colored and exquisitely finished -- on copper.

A sale record as it appears on our current website. This record corresponds to the first item in the image above from the print volume.

Our country-based sales projects always had the same basic structure, and we attempted to capture the same basic information in a standard way. There are small differences between sales catalogs in different countries and periods, but they don’t cause a problem in our current web-based platform. For example, you can search by sale date, artist, buyer, seller, object type, title words, etc., across all the sales databases with no problem, other than the fact that the title words will be in the original language of the document.

Screenshot of the search screen for Sales Catalogs in the Getty Provenance Index Databases, which includes fields for "Artist Name," "Artist Nationality," "Lot Title/Description," "Auction House," "Lugt #," etc.

Sale Contents search screen for the current (2018) web platform of the Getty Provenance Index Databases.

Things changed, however, when we began making the shift to linked open data, and some of the differences in the data from country to country became an issue. In the LOD model, each data element needs a specific definition. And some of these elements rely on other elements in order to provide meaning. So changes in the way one element was handled often had a ripple effect.

I’ve been working on content in the Provenance Index databases for many years and am not an IT expert. Therefore, I will try to avoid highly technical descriptions, such as “The software development team took the thingy and did some stuff to it.” You’re welcome. Instead, I’ll focus on the content side of the standardization process and explore how one issue led to the next as we sought to create more granular definitions for concepts that were more ambiguous in our old data model.

I could have written about any of the fields you see in the search screen above—artists, owners, object types, subject matter, sale locations, etc. But instead, I’m going to write about something that doesn’t even appear on that search screen. Ooh, do I have your attention now? What could this mysterious data element be? Well, it’s—wait for it—sale price. Oh, yes. We’re going there.

The Many Types of Art Sales Prices

You read that right. There are different types of prices. I hope you are sitting down and holding on to something, because we are in for a bumpy ride. Many sales catalogs contain prices, but the prices don’t all mean the same thing. Moreover, prices can be written in by hand, they can come from published sale results, or they can be printed in a catalog.

On the left is a print page with sale lots from an auction; on the right is a lined page with handwritten notes about the sales.

An example of prices recorded by hand in the auctioneer’s copy of a catalog.

The top half of a page of from Weltkunst, a German art newspaper, for January 1931, that shows prices of published sales from three auctions.

An example of prices appearing in published sale results.

A photocopied image of two pages from a sales catalog that shows lot numbers with artist names and numbers in three columns labeled with the symbol for pound sterling, "s.", and "d."

An example of prices printed in a catalog.

Prices printed in an auction catalog obviously cannot be the actual auction results, because they were published before the auction took place. So the printed prices in auction catalogs are estimates, starting prices, or reserves. To add complexity, it is fairly unusual to have any prices printed in auction catalogs in the seventeenth, eighteenth, and early nineteenth centuries. Because of this rarity, we did not have a separate way to record this information. We simply recorded any price in our main price field and added a note if we needed to explain that it was a different “type” of price, like an estimate.

Price: 3. 10 | c pounds | d estimate

An example of a price field in our production database, with subfields “c” for currency and “d” for description.

In twentieth-century German catalogs, printed estimates and starting prices are common. So, when our Nazi-era German sales project started in 2011, individual data fields were added to record the different types of prices that could exist: price, estimated price, or starting price. In the remodeled Provenance Index, these will be linked to specific concepts and be standard across databases. So we had to go back to our older sales projects and identify which prices needed to be moved to new fields.

Three lines that read: Price, Estimated Price 3.10 | c pounds, Starting Price

Estimates were moved into a new estimated price field.

Prices vs. Transactions

After this initial pass, in which we separated price information into three fields, we realized that our main sale price field could still mean two very different things. A price often represents an actual purchase price, but it can also be what’s known as the “bought-in” price. When bidding does not reach the reserve price, the highest bid is often recorded; but without a sale, this “price” does not represent a transfer of ownership. That distinction—whether or not the price also represents an actual sale—is not made in the price field. Instead, we note this in our databases through the “transaction” field.

This is an example of how the information in one field is dependent on the information in an entirely separate field for meaning. If the transaction is noted as “sold,” then the price represents the purchase price. If the transaction is noted as “bought-in,” it is not a purchase price because a transfer of ownership did not occur.

For the remodel project, we decided that we would consider all the prices that appear in the sale price field to be “bid” prices. In other words, they represent a bid that was made at auction. It might be the winning bid; it might be a high bid that didn’t reach the reserve; or we might simply not know what the bid represents. The LOD modeling of the price will then be dependent on the transaction, so that only the “sold” transactions will have the price linked to a sale event.

Oh No, Another Price Type!

Once we had sorted out the purchase price, bought-in price, starting price, and estimated price, we thought we had taken care of all the price-type issues for the sales catalogs. But, as we were separating the prices into their distinct fields, another price type surfaced. Something we didn’t consider in the initial analysis was the fact that 267 sales in our sales databases (less than 2%) are not auctions at all. Most of these are sales by private contract.

Unlike auctions, in which objects are sold to the highest bidder at a specified time, in a private contract sale the works are exhibited in a gallery for an extended period, usually from a few weeks to a few months. Customers can view and purchase the works at any time during the exhibition for set or negotiated prices.

A catalogue of several first-rate pictures. The property of an emigrant nobleman, and of the remaining part of the truly-superb cabinet of a well-known collector; which will be sold on Monday, May 26, 1794, and following days, (Sundays excepted) by private contract, by a committee appointed for that purpose, at the Great Rooms, (late Cox's Museum) Spring Gardens; where the nobility and gentry, provided with tickets, may view the said pictures.

A catalog for an exhibition of paintings for sale “By Private Contract.”

In most cases we don’t have any prices at all for such sales, so the price type is often not an issue. But in a few cases there are prices printed in this type of catalog, and we realized that our existing price types didn’t adequately define them. Because there is no bidding, the prices printed in a private contract sale catalog are not starting or reserve prices. They also don’t necessarily represent the eventual sale price, because customers could negotiate them down.

Title page for catalog for an exhibition of drawings collected by the late Sir Joshua Reynolds, including mention of "Michel-angelo, Raffaelle, Coreggio, Titiano," with information about where and when the auction will take place.

Catalog for an exhibition of drawings collected by the late Sir Joshua Reynolds, which “will be sold at the prices marked in this catalogue.”

There are about 4,500 object records with this price type, out of 1.25 million sales records. By comparison, there are almost 100,000 records with estimated prices. Despite this relatively small number, we decided that we needed a new price type, which we would call the “asking” price. We requested that this term be added to the Getty’s Art & Architecture Thesaurus® so that it can be linked to a standard concept, which will be used in our new LOD model.

Click the [hierarchy] icon to view the hierarchy. Semantic View (JSON, RDF, N3/Turtle, N-Triples). ID: 300417241 Record Type: concept Pate Link: http://vocab.getty.edu/page/aat/300417241 [hierarchy icon] asking prcies (prices, economic concepts, ... Associated Concepts (hierarchy name)) Note: The prices asked for or set by a seller. Terms: asking prices (preferred, C.U.,English-P,D,U,PN) asking price (C,U,English,AD,U,SN) prices, asking (C,U,English,UF,U,U) Facet/Hierarchy Code: B.BM Hierarchical Position: [hierarchy icon] Associated Concepts Facet [hierarchy icon] .... Associated Concepts (hierarchy name) (G) [hierarchy icon] ....... social science concepts (G) [hierarchy icon] ........... economic concepts (G) ............... prices (G) [hierarchy icon] .................... asking prices (G)

Art & Architecture Thesaurus record for the concept “asking prices.”

Types of Sales

In order to identify asking prices, we first had to identify all of the private contract sales. Many were clearly labeled as such in the notes for the sale. But others only included language such as “for ready money” or “sold out of hand” or similar terms in French or German. Still others didn’t include any of these specific terms, but we could identify them through other factors. For example, The European Museum, an exhibition space in London that held long-term sale exhibitions during the late eighteenth and early nineteenth centuries, would often include only the month, year, or season for the beginning of the exhibition. So these vague dates were a clue about the type of sale. Once all these sales were identified, the catalogs had to be checked to see which ones had prices that should be considered asking prices.

Lotteries? Where Did These Come From?

Most of the non-auction sales in our databases are private contract sales. However, there are a few exceptions even to this. While I was searching through our 16,000 sales, I came across five catalogs for eighteenth-century lotteries, four German and one French. Lotteries were events in which participants would purchase tickets that gave them the chance to win one of the objects being raffled. They were a common way of dispersing art until the end of the seventeenth century, at which point auctions gained in popularity. But they continued to occur in the eighteenth and into the nineteenth centuries. In general, we have not included them in our sales projects, but apparently five crept in. Not only that, but one of the German lotteries includes prices that are valuations. Yes, valuations. Yet another price type!

Shared Prices

As I mentioned above, some of this price description information, such as the fact that a price was an estimate, had previously been included as a note attached to the main price field in our database. But other price description information also appeared in that price note field. The most common note explained that the object being viewed had been sold together with another object for a single price. Without the note it would appear that the price represented the purchase price for the individual object, when in fact, only a portion of the price would have been for that object. In the following example, you can see in the “Transaction” field that the price was actually for two separate lots.

At top: Lot 0005 from Sale Catalog Br-347. Artist Name: Dughet, Gaspard (Gaspard Poussin) (French and Italian), from catalog: Gaspar Poussin; Title / Description: Landscape and Figures, &c.; Object Type: Painting; Transaction: Sold, 0.7 pounds for lots 5 & 6; Buyer: Palsa, from catalog: Palsa; Lot Notes: This lot was sold with lot 6 by Goyen.; Sale Date: 1805 Jun 15; Auction House: Richardson (William); Sale Location: London, England; Lugt Number: 6977; See Also: Sale Description [hyperlinked]; Art Sales Catalogues Online [hyperlinked]

Sale record for a painting that was sold together with another painting for a shared price.

There are over 82,000 sales records with shared prices in our databases. This shared price information usually comes from hand annotations, often indicated by brackets that link two or more sale lots to a single price, as in the following example, in which the above lot 5 is shown to have been sold with lot 6:
Catalogue, &c. Next to the names of artists and titles of artworks are annotations in script. For example, the first reads Vandeveld, 1, Sea Piece, with the annotations "7"; Solomon, 17, on the same line. Lots that were sold together are indicated by handwritten brackets.

Sale catalog with handwritten annotations showing lots that were sold together for a shared price.

This concept was not a problem for the new model, as long as the specified lots were identified and could be linked together by matching the notes in the corresponding records. In order to create the link, though, the notes had to match perfectly, so this took some cleanup effort. The real problem was that there were also hundreds of lots that had notations for a shared price, but lacked any associated lot to share the price with. It’s easy to assume this was a mistake, but it wasn’t. These missing records occurred when an object that was in scope for the project was sold with another object that was out of scope, and therefore not included.

At top, the text reads: Lot 0240 from Sale Catalog Br-A2024. The table data reads: "Artist Name: Bartolozzi, Francesco (Italian) from catalog: Bartolozzi; Title / Description: Four friezes; Object Type: Drawing; Seller: Serres, Dominic from catalog: Dom. Serres, Esq. R.A.; Transaction: Sold, 0.1 pounds for lots 239 & 249 (CL); Buyer: Cash, from catalog: Cash (CL); Lot Notes: This lot was sold with lot 239, which is a print,; Sale Date: 1794 Mar 13 - 1794 Mar 15 (This Lot: Mar 13); Auction House: Christie's; Sale Location: London, England; Lugt Number: 5167

Object record with lot notes explaining that the object was sold together with another lot, which was a print.

As I mentioned at the beginning of this post, our sales projects record specific types of art objects. We have not included prints, books, most decorative arts, etc. So, for example, when a drawing lot was sold with a print lot, we would only have the drawing lot in our database and not the print lot. But the drawing lot would still include a shared price note indicating that it was sold together with another lot. The new model didn’t know what to do with this shared price note because it had nothing to link to.

The words "Computer says no..." are overlaid on a photo of a the head and shoulders of a frowning man, who is dressed like a woman with long brown hair, red glasses, and pink lipstick,

Apparently, the computer said “No.”

We decided that the solution would be to create shell records that would stand in for the lots that were out of scope.

Catalog entry for Lot 0239 from Sale Catalog Br-a2024. Lot notes read: "This record is blank because it represents a lot that was out of scope during the original data collection process, but was sold or purchased together with one or more in-scope lots. Thus, this record exist [sic] solely to allow for linking to in-scope lots."

Shell record representing an object that was out of scope for the project, but which was sold together with an object that was in scope.


Thank you for going on this action-packed adventure with me. As you can see, one seemingly simple data element can turn out to be remarkably complicated once you start digging in. And this post only covers a small part of the issues related to prices—don’t even get me started on currencies—which is itself only one small part of the overall data standardization process. But I hope this gives you a little bit of an idea of the work we’ve been doing as part of this remodel project, all with the aim of making Provenance Index records richer tools for research.