April 28, 2004

Kendall Grant Clark

Let's review where we've been and where we're going. In January I started this new column, Hacking the Library, by talking about dijalog lifestyles ("Geeks and the Dijalog Lifestyle"), my attempt to describe the challenges facing people roughly like myself who have one foot in the purely digital lifestyle best represented by Apple's Digital Hub idea, but who have the other foot weighed down by a couple hundred CDs, a thousand odd books, and assorted piles and files of papers.

I suggested that, in addition to the kind of XML and information management geekery that is pretty good at educating folks about, we also needed to start peeking over the walls to see what our library science friends are doing. I began reporting on my peeking last month (" The Library of Congress Comes Home") by introducing LCC@Home, a project to catalog and organize your personal, non-digital media collection.

In this month's column, I describe in six easy steps how to implement LCC@Home in all its concrete detail and messy glory.

We're Off to See the Wizard

Cataloging is daunting, non-trivial work. You don't have to read the 600 pages of Wynar's Introduction to Cataloging and Classification, you just have to avoid dropping it on your foot...cataloging ain't for wimps or amateurs! So after I—not a librarian, after all—decided to teach readers how to organize their personal libraries, I realized I would have to reveal some of my amateurish tricks. These all come down to organizing a personal library by avoiding all the hard work. Remember that for computer programmers, as for customs agents, laziness is a virtue.

There are at least three kinds of hard work implied by LCC@Home: cataloging, indexing, and labeling. At some point during my interminable graduate school career in the mid-90s, about the time my personal library hit the 2,000 book mark, I figured out three things pretty much simultaneously:

  1. Most of the books I owned were already cataloged by the Library of Congress (or, as I learned much later on, by a consortium of institutions that does LC cataloging). In other words, I don't have to catalog much, if at all..
  2. All of the books cataloged by the Library of Congress could be searched—by author, title, keyword, or an odd dozen other metadata bits—over the Web, either at the Library of Congress Online Catalog or at any one of hundreds of publicly accessible university library catalog sites. In other words, I don't have to build a computerized index or database of my collection.
  3. Most of the books I owned or were likely to own in the future already contained LC cataloging information. In other words, I don't really have to label anything.

You should award yourself five bonus points if you already knew these three things and realized their implications, or if you've already figured out that they correspond to the three kinds of hard work required by LCC@Home. In other words, LCC@Home is practically possible because you probably won't have to do any cataloging or indexing, and the labeling task is very tractable.

Knowing these little tricks convinced me that LCC@Home could be done; but I still needed some way to persuade you that it should be.

From Nicholson Baker...

While conspiring with myself and with others about the shape of these columns, I finally hit upon two different strategies to persuade you that implementing LCC@Home is a worthwhile way to spend some of your time. I call the strategies "the Nicholson Baker" and "the Martha Stewart".

First, in last month's column, I tried to excite your imagination by talking about what libraries are conceptually. In this way I hoped to appeal to the information scientist and library weenie inside all of us. Among other things, I said that

Libraries are (1) chunks of physical space, (2) highly organized and regimented, which exist, in part, to facilitate (3) the navigation of a virtual space, in this case, the information space of all (ideally, anyway) recorded human knowledge...Libraries are places, sites, locations in the physical world. A library is a place that you can visit, around and in and through which you can move...Libraries aren't merely spaces: they are highly regimented, organized, controlled spaces...A library is a habitation...a human dwelling place...where human projects, goals, purposes, and ends can be acted out...Libraries...are social spaces organized to aid people's navigation of another, a non-physical space, namely, the information space made up of and by all recorded human knowledge.

I was trying to convince you to pursue LCC@Home or something similar by doing my best impersonation of Nicholson Baker—whose book, which oddly enough I haven't yet read, Double Fold: Libraries and the Assault on Paper, forms something like the mostly-unacknowledged backdrop of the Hacking the Library series.

To Martha Stewart

Second, in this column I've already done my best Martha Stewart impersonation—that is, I've tried to convince you to undertake some impossibly hard, impractical domestic project by attempting to persuade you that it's actually quite trivial, if only you'll use my three secret tricks. This is roughly equivalent to Martha Stewart claiming that it's easy to cook a seven-course meal for your family and 10 closest friends because; after all, she can do it within the span of a single 30 minute television fantasia program.

Despite the conceptual complexity of the Nicholson Baker strategy, the Martha Stewart strategy suggests that LCC@Home isn't nearly as complex or daunting as it may seem. It's not that it's trivial. It's anything but trivial. The reason the Martha Stewart strategy works is because we're successfully offloading almost all of the complexity onto large public institutions, in the same way that Martha successfully offloaded almost all of the chopping, slicing, dicing, mixing, stirring, baking onto her army of underemployed culinary school graduates production staff. As I wrote last month, it doesn't make sense to undertake such a domestic improvement project by oneself. It's too hard, too expensive, too complex.

Six Easy Pieces

Without any further delay, here's the simple recipe for LCC@Home. Only six steps!

1. Survey

Summary: Form an initial impression of the distribution of your collection in terms of LCC top-level categories and major subcategories.

The idea here is to get started with something that's relatively easy. You're trying to form an overall impression of two things: how your books are distributed across LCC top-level category space, and how that distribution might map onto your domestic space. Recall from last month's column that the LCC top-level categories look like this:










K -- LAW












If you're a geek there's a good chance you have a considerable cluster of T and Q items. If you did American history as an undergrad, your library may bulge in the E and F sections. Law degree? Consider having a large K section. Avid fiction reader? P. And so on.

Why is this important? Well, it's easy, won't take long, and you'll only have five steps left when you're done. But, also, remember that we're arranging physical spaces; in this case, your domestic space. Home libraries and library projects need to play well with the dog's bed, the kids' collection of dinosaur toys, and Aunt Annie's herb garden. It's especially important to survey your collection if any special conditions apply in your case:

  1. A large collection, say, more than 1,000 books
  2. A collection with many irregularly shaped items. For example, many art and art history books are oversized. You need to plan for that earlier rather than later.
  3. A collection that will be distributed among several discontinuous physical spaces.

Let's consider (a) briefly. When I organized my personal library a few years ago, I started by making a diagram of my living space like the diagram in Figure 1.

Figure 1: Diagram of Living Space

The hatched black rectangles represent bookshelves. In my case I don't have enough living space to keep all of my bookshelves together, and I actually like having books in every room. But it's handy not to have to go from one room to another when looking for a book in a particular top-level category. So you will increase the usability of your personal library if you can avoid splitting top-level categories, but especially very crowded ones, across multiple rooms.

2. Allocate

Summary: Allocate physical and storage space (bookshelves, primarily) in a way that corresponds roughly with (1), taking into consideration your present and expected future interests.

So (2) is the physical counterpart to the conceptual work of (1). Once you've formed an impression of the lay of your collection's land, you need to begin reconfiguring the relevant chunks of your living space to take account of that impression. One crucial thing to recognize here is that, just as in institutional libraries, the LCC top-level categories do not have to be arranged in physical space in any specific way. In other words, just because the top-level categories are named for most of the letters in the Latin alphabet, that obligates no one to make, say, A and B sections adjacent to each other.

Thinking back to the five or six university libraries that I know very well, and to the 10 or 20 I have visited, I cannot recall a single one in which the alphabetic nature of the LCC top-level categories had anything whatever to do with how they were mapped onto physical space. In a big library the thing one does before heading off to the stacks to find an item is to grab a copy of the layout map, which is typically a diagram conceptually akin to the diagram in Figure 2.

Figure 2: Top-level Category Layout Map

This "layout map" (my term, not a libsci term of art) corresponds very roughly to the one I made as a result of completing (2) during my LCC@Home project. Because I have very many books falling within the B top-level category, I allocated roughly one-third of my available shelf space to books in that section. I also did this because I know that in the future I will continue to acquire books reflecting one of my stable, long-term interests, namely, philosophy. And philosophy books belong to the B top-level category.

Another consideration at this stage is to allocate shelf space to top-level categories depending on where in your living space you may want to use those items. It improves the usability of your collection if, all other things being equal, your technology and science books are near to your computer or study, that your culinary arts books are near the kitchen, and your oversized art books are in a commons area for use by and enjoyment of your guests.

But that way of arranging a collection reflects my interests and tastes. Your way of arranging your collection should reflect yours.

3. Gather

Summary: Gather item-labeling materials—including a variety of labels, stickers, and pens of various kinds—taking into consideration any special requirements presented by unusual items in your collection.

What you need to do to here depends entirely on how you want to complete the next step, (4). I chose to use sticky labels and felt tip pens to label the items of my collection. You may choose to tape or loosely place a 3x5 card into each item. Or you may have a fancy barcode printer and a reader, though this can be a tricky choice.

Rather than give specific advice about labeling items, I'll mention some of the issues that aren't obvious at first glance:

  • Choose a labeling technique appropriate for both the items of your collection and your domestic space. For example, if you have mostly mass-market books, then labels are a good choice. But if you have rare or valuable books, or you cultivate a particular aesthetic sensibility in your living space, you should choose a labeling technique that will not harm books, their resale value, or negatively affect your aesthetic sensibility. Of course you can always choose more than one labeling technique.
  • Even if your books are not valuable in the rare book sense, you don't want to mar them unnecessarily. I regret not doing more research before choosing ordinary mailing labels for my books. I should have chosen a non-acidic, archival quality label and ink. Unless there's a fire or flood, I'm going to own the bulk of my library for the rest of my life, and it will very likely outlast me. Choosing an archival quality labeling technique is smart.
  • Given that's audience is full of geeks, I want to say a few words about barcodes. It's a sexy solution: it makes certain automation tasks easier, including building a computerized index of only your collection. But it has significant labor costs, in addition to the equipment costs (you need at least one barcode reader and printer setup). The labor costs include keying-in rather than writing the LC catalog identifiers for each item, affixing them to each book, and then figuring out an alternative labeling technique for rare or valuable books. You also should ensure that, if you choose to barcode your collection, the alphanumeric LC catalog number is included on the barcode label. It's impossible to scan one's shelves rapidly if the only identifier on each item is machine but not humanly readable.
  • If I were writing this article in, say, 10 years, I'd be talking about RFID tags instead of barcodes. I think it's likely that in 10 years books will include active RFID tags, which will largely obviate the need to label them in order to manage your collection. Ubiquitous RFID tags in books seems more likely to me than the pure digital lifestyle scenario about the future of books, namely, that we'll all be reading books on some electronic paper device in 10 years, having foregone a 500 year old tradition of relishing the tactile pleasures of books as physical objects.

4. Label

Summary: For each item in your collection, find its unique LCC identifier and affix that identifier to the item, using the techniques and materials from (3).

(4) is the most interesting step because it's the simplest, conceptually, but also the step that represents 90% of the required work. Labeling the item, however, is what I have come to call a dijalog inflection point—the point at which the actual correlation between physical and virtual space is made concrete. Once labeled, a physical item, a book, has a unique identifier in a namespace, the Library of Congress Classification scheme, that can be manipulated by a computer.

There are some practical issues here. How do you find the LCC identifier for an item in your collection? The first step is to look at the verso of the book's title page, that is, the page opposite the title page. If you're lucky, you'll find a bit of the copyright page called the "Library of Congress Cataloging in Publication Data". The LC Cataloging in Publication (CIP) Data is the third of my LCC@Home tricks. (In some books you may want to look at the back of the book for the CIP if it's not on the verso of the title page.) Figure 3 is the CIP block for one of my favorite books, Arthur Danto's The Transfiguration of the Commonplace, among the most important philosophies of art from the contemporary analytic school.

Figure 3: CIP Data block

Not every book has a CIP Data block. Older books often don't. Nearly all books sold within the past 10 years will have one, as well as just about every book published by a university press. More and more these days it's hard to find a book published by houses other than small, boutique, or certain trade publishing houses that doesn't have a CIP Data block.

So why is the CIP Data block so important? Because, as you can see in the part of Figure 3 circled in red, the CIP Data block includes the unique LCC identifier for that book. That means that you don't have to do anything to find the identifier. It also means that labeling the book is as simple as copying that identifier onto a label, then affixing that label to, say, the book's spine. It also means you don't have to label the book explicitly, though having to open every book and find the CIP Data block is vastly less efficient than reading a label on the spine.

Most of the work of step (4) consists of locating the CIP Data block, writing the LCC identifier onto a label, and affixing the label to the book's spine. Fun? Not really? Rote? Yes, a bit. Able to be done while watching TV or listening to the radio? You bet. Complex? Not in the least.

But what about books, like the one in Figure 4, that don't have a CIP Data block? Vanity presses, some kinds of trade or corporate presses, or presses that publish only one or two authors aren't eligible to participate in the CIP program. And, obviously, books that are older than the CIP program won't have benefited from it.

Figure 4: A book without a CIP Data block

For a book like the one in Figure 4, you have a few options. You can try to find the book in the Library of Congress Online Catalog or in another large online catalog. And you may want to use the LCC identifier of a more recent edition of an older book. That's utterly anathema to real library science, but it's a perfectly legitimate shortcut for LCC@Home. It's also possible to use the ISBN in a book without a CIP Data block, such as in Figure 4, to find the LCC identifier. As that's the subject of next month's column, I won't say much more about it here.

More About CIP

Large publishers which handle lots of titles, lots of authors, and whose books are widely acquired by libraries are eligible to participate in the CIP program. This includes all university presses and most, if not all large commercial publishers. I suspect that about two-thirds of the new titles published each year and available for purchase in retail outlets contain a CIP Data block. As for the number of new books actually purchased each year, I suspect something like 90% or higher contain a CIP Data block.

The CIP process goes something like this: publishers send CIP data applications for each eligible title to the LC, which assigns an LC Control Number. Catalogers do descriptive cataloging, assign subject headings, and assign full LC and Dewey classification identifiers. This complete CIP data is sent back to the publisher, which puts some or all of this cataloging data onto the verso of the title page. A MARC record for the book is also sent to the large libraries, consortiums, and bibliographic vendors. Finally, the publisher sends a copy of the book to the LC, which then adds some final metadata— the number of pages and book's size—to the book's MARC record. After the records are updated and checked for consistency and accuracy, the new MARC records are redistributed.

5. Punt

Summary: Depending on the number and type of items in your collection that are not LCC cataloged, apply some other classification scheme, leave the items unidentified, or consider cataloging the item yourself.

Okay, this is where things get hard and the Martha Stewart strategy comes unraveled. What if you have items in your collection that haven't been assigned an LC identifier? You have four choices:

First, you can do your own LC cataloging, which is practically impossible and not a very good choice anyway. Second, you can see if the item has been assigned an identifier in some other classification scheme. That's unlikely but possible; it doesn't buy you a whole lot, practically. Third, you can do your own cataloging using a scheme other than LC. One of the modern synthetic schemes would be a good choice. Probably UDC would be best. Last, you can leave the item uncatalogued, which isn't a bad choice.

Which choice you make depends in part on the number of uncatalogued items in your collection. If it's fewer than the maximum number you can easily search through by hand, I would leave them uncatalogued. If it's the bulk of your collection and more than you can manage by hand, then LCC@Home isn't a very good choice for you. I'll leave you with a bit of advice: consider doing your own cataloging with UDC, but then be prepared to build your own computerized database of items. I'll have more to say about this case in a future column later this year, but for now you're on your own.

6. Arrange

Summary: Physically arrange the distribution of items matching LCC categories according to some locally-derived, sensible plan.

Now that you've labeled all of the items in your collection, it's time to place them on your shelves in a way that corresponds to the plan you developed in step (2). I did this in two steps: first, I put all the books that belonged to a top-level category near the shelves that were meant to house that category. Call that the gross sort. Then I did a fine sort. For each top-level category I sorted the books in order of their LCC identifier, which was now on a label on each book's spine.

Next to step (4), this is the most work, but it's not that hard. Now the only thing left to do is to maintain your LCC@Home implementation by repeating steps (4) through (6) for each new item you acquire.


What do I really want next? I want to include, for every book it sells, a label containing the LCC and Dewey identifiers so that I can easily maintain my LCC@Home collection. That would cost almost nothing for to do, since every book it ships already includes a print-out with various bits of data, on paper that is readily usable as a sticky label. I wonder if anyone at is thinking about that level or kind of customization?

If you decide you don't want to "copy catalog", which is the term for the technique I've described here, you should know about the LC's Cataloging Directorate. And if you do catalog yourself, then you'll need a subscription to the LC's Classification Web service, which is insanely expensive. In other words, you really don't want to do LC cataloging yourself, but there are some really fascinating issues and resources to explore here.

What's the biggest limitation of LCC@Home? In addition to having to do the work, when you're done you don't have a computerized database or index of your collection only. Your collection is, ignoring uncatalogued items, a subset of the universe of LC cataloged items. So you can use computerized databases of that universe to search your subset of it. But those search results will often include items not in your collection. The only way to verify whether items are in your collection is to remember your collection or go to your shelves and look. I'll address this issue in a future column when we look at some open source projects, most of which are XML-powered web applications, that can be used to build databases of your collection.

What's coming next month? Code. Actual live, running, working code. Well, maybe. My goal for the May column is to write a bit of Python script, deployable as a web application or as a command-line tool, to turn ISBN numbers into LC catalog numbers.


My thanks to the very talented designer and illustrator, Kate Krizan, who cooked up the clever illustrations in this column. She's projects.