Amy's Practicum Blog

Name:
Location: St. Louis, Missouri, United States

Monday, October 30, 2006

Metadata and cataloging

Today I spent a bit more time on the metadata mapping spreadsheet and sent it to Andrew and Cassandra to review. I have the feeling it's a bit more complicated than they were expecting...

I cataloged two more pieces in the bound volume.

Not much else exciting to report!

Hours today: 2 (4:15-6:15pm)
Hours this week: 2
Total hours completed: 105

Thursday, October 26, 2006

Yet even more metadata mapping

OW! My brain hurts! I transferred everything to a spreadsheet and added another field, BibClass index, since I realized we'll need to know what indexes should be called (if, indeed, we'll be able to add new ones). I emailed Jenn Riley at Indiana to ask some questions and see if they'd be willing to share the scripts and bib.map they used. She might not have been the right person to ask - hopefully she'll be willing to forward it on to the right person(s) if that's the case. I hope they'll be able to help. I don't know why I've been so reluctant to ask for their help, things might have gone a lot smoother and faster if I had done so sooner (or not, I guess we'll never know).

Oh, for there to be ONE metadata standard and format that would miraculously work for everything...!

I'll do more this weekend and send everything to Cassandra and Andrew to see what they think, and what the next step is.

Hours today: 2 (4:30-6:30pm)
Hours this week: 8.5
Total hours completed: 103

Wednesday, October 25, 2006

Metadata and cataloging

Today I met with Mark for a bit and we talked through a few more metadata issues, as I'm trying to get as much info to Andrew and Cassandra as I can to help with the transfer from MARCXML to BibClass. There are still a lot of unanswered questions. I think we may just need to try a few and see what happens! Mark made some very good points today as I was asking questions that were geared mainly to the records I'm dealing with... Yes, there's only so much I can do in my time frame but if this is going to serve as a guide for Gaylord in the future, with different parts of the collection, there are issues that may or may not be dealt with in the same way. Call numbers, for example - the part of the collection I'm working with doesn't have them, but other parts do (bound volumes, for example). The rights question is tricky - we do have a statement geared toward print Special Collections materials that we could link to, but it doesn't make any mention of digital manifestations at all. I'll need to talk with Brad more about that. It's kind of a tricky subject. He reminded me that the actual date of digitization could prove to be useful to someone at some point in the future, and even if we can't get it in there easily for this part of the collection, it would be good to know how to do it. Since DLXS will need to pull the pdfs themselves from the server, and those files contain the date information, there must be a way to automatically pull out that date and add it somehow. I really have no idea, but it sounds like there must be a way!

Without knowing how the [whatever it is that's going to transform the MARCXML stuff to BibClass] is going to treat indicators and subfields, it's a little hard to know how specific I need to be in my mappings. I guess I'll get as reasonably specific I can get, and let them know what I "hope" it can do, and then see what happens. I may be sorely disappointed and end up needing to do a lot of manual tweaking, but oh well!

I also did some cataloging today - these last pieces seem to be a bit more straightforward which is a nice change. I should have no problem finishing them this week or next. I hope that Mark will look at them and let me know if anything looks amiss. I came across an interesting problem today - turns out there was already a record in the suppl catalog for one I cataloged, but it wasn't immediately obvious. The copy already cataloged was missing the t.p. so some of the information was different. I looked at it, though, and realized it was, indeed, the same piece, just without the t.p.! I'm not sure how to handle it (I guess they both need to belong to the same record, which will be awkward since there's more info for one than for the other) - I'll have to ask Mark.

Hours today: 3.5
Hours this week: 6.5
Total hours completed: 101

Monday, October 23, 2006

Metadata mapping, again

Today I met with Cassandra to go over what I had come up with so far in terms of mapping from MARC to BibClass to DC. I have a good start, but there are some questions, and it's going to need to get much more detailed when it comes to mapping to specific BibClass indexes. It might end up being a trial and error sort of thing - we run it through and see what happens and then tweak from there. I need to go over this with Mark and then get back to Cassandra by the end of the week. I tried emailing Stephen Davison again to see if there is a required minimum number of records to be considered as a Data Provider. If we don't meet the minimum at this point, at least the procedure will (hopefully!) be set out for the future. And, I certainly will have learned a lot in the process!

After meeting with Cassandra, I continued to tweak the table a bit and tried to find some additional examples that would help clear up some questions. I found a few examples of rights management statements that might help us develop one of our own. I'm still perplexed by the date issue - I think we can get at the "date of digitization" of each pdf but it's going to be very time-consuming to look those up and add them manually. After further reading the SMC's guidelines, multiple date fields can be entered. The example gives a [date of digitization] field (in ISO YYYY-MM-DD format) as well as a date of publication as it appears in the MARC record.

Hours today: 3
Hours this week: 3
Total hours completed: 97.5

Wednesday, October 18, 2006

More metadata mapping

Spent some time today cleaning off the desk in "the dungeon" (as Emily so fondly calls it). Found that the computer had restarted again and everything I had fixed (twice now) was gone! Connexion authos, bookmarks, etc. At least documents seem to be staying put, but I'm backing them up on my flash drive just in case.

I spent several hours today adding MARC tags to my MARC/BibClass/DC table. I should probably transfer it to a spreadsheet at some point. Some tricky things I'm coming across and I'm not quite sure what to do with, or how to get there...

dc:source - call #, where the physical item resides. These don't have call numbers per se, but do have location code gzshb which translates to 'Sheet Mus StL Publ File' in the public view. Some indication of 'Balmer & Weber' needs to be given, since that's what the boxes are actually labeled. The example Lois Schultz gives is 'UCLA. Music Library. Archive of Popular American Music. SY118524 [call number].' So, I guess ours could be something like 'Washington University in St. Louis. Gaylord Music Library. St. Louis Publishers Sheet Music. Balmer & Weber'?

Local notes - should these be included? Things like the Keck number? Important descriptive things should be included, I would think, like if pages are missing from our copy, or if there is a signature or annotation. Also local added entries for people like donors...

dc:format is very strange - I can't tell if it's referring to the physical or the digital manifestation. Schultze gives 300 as one of the related MARC fields, so I guess the physical? This is the only place where the extent (number of pages, size, etc.) could be given. Other examples she gives are 'image/tiff' and 'image/jpeg' so I don't know if something like 'image/pdf' would also be appropriate for the digital manifestation?

dc:rights - not sure about the rights information. Indiana has a link to information about their sheet music collection, but there's nothing specific about rights. I'll try to look into this more. I'm pretty sure that all of what's been digitized thus far is in the public domain. Example Schultz gives is just a statement: 'University of California, Los Angeles, Music Library. c2002 The Regents of the University of California. All rights reserved.' Obviously this is if the institutions holds the copyright.

dc:date - strange that Schultz gives the recommendation to use YYYY-MM-DD ("date will be associated with the creation or availability of the resource") - she gives the 260 field as a related MARC field, but the date there is usually just a year, estimated year, or range of years.

Hours today: 3 (4:00-7:00pm)
Hours this week: 5
Total hours completed: 94.5

Monday, October 16, 2006

Metadata mapping

Today I spent several hours looking at some BibClass and Dublin Core info. I started a table so that we can picture how things are going to be mapped from MARC to BibClass to Dublin Core. I couldn't get the MARCXML file to open for some reason, which was rather frustrating. I have lot more work to do on this this week, and then Cassandra and I are going to meet on Monday to see where we are.

Hours today: 2 (4:30-6:30pm)
Hours this week: 2
Total hours completed: 91.5

Monday, October 09, 2006

Cataloging

Today Mark and I spent about an hour trying to figure out the 'Taming of the shrew' pieces and how they should be entered in terms of main entry and uniform titles, etc. Very confusing because it's not clear who is ultimately responsible for the music of the entire work - it seems that multiple people were responsible for different parts of it (it was a very good review of AACR2 Chapter 21). Thus, main entry goes to the particular person who seems to have written the music for the individual song, uniform title is simply the song title (plus Vocal score, since it seems reasonable that these were originally orchestrated), and added uniform title entry for Taming of the shrew (Opera) and Shakespeare...|tTaming of the shrew are warranted. Wow, my head is still spinning from that one...

So, I fixed those up. Mark also noticed that I had forgotten the engraver, so I added that information for all three pieces. I found lithographer and printer information on the next piece in the volume, so I added that to the record. I also spent a bit of time trying to figure out how to deal with the 'Swiss boy' song from the Tyrolese melodies. Most records in OCLC for the collection of these songs have main entry as Moscheles, even though it appears he just arranged them. These were songs (presumably Tyrolienne/Tyrolese folk songs) that a family called the Rainer family sang in concerts on a trip to England in 1827, making these folk melodies very popular. Moscheles then arranged these for one to four voices with piano. So, my inclination is to not make him main entry, use the song title as the main entry, and make 'Tyrolese melodies' a series statement and a title added entry. Another wrinkle is that the original German title is listed with the English title in the caption. Various records in OCLC treated this differently, and I'm not sure which is best. I'll review some rules and such. There is also an added "song" at the end, with different words (3 verses) to the same tune. Different author of the words and different arranger. Not sure how best to present that information...

I had no idea I'd be taking the entire time on cataloging today, but I definitely learned a lot!

I sent a message to Andrew and Cassandra today, asking if they had any more detailed information about BibClass, so that I could start the whole comparison/mapping thing from MARCXML and try to figure out how it's going to translate to Dublin Core. I found quite a bit of info on the website, as reported last week, but it would be nice if the allowed fields, their "tags" and what goes in them was nicely laid out in a table somewhere... That might be up to me!

Hours today: 3 (4:15-7:15pm)
Hours this week: 3
Total hours completed: 89.5

Tuesday, October 03, 2006

Scanning, cataloging

I tested the scanning software today, since the machine had been messed with. Just wanted to make sure everything was still working the way it's supposed to. All seems to be well.

I did some additional cleaning up of cataloging records. Still have some questions for Mark about the crazy 'Taming of the Shrew' pieces. It's very, very complicated and I still can't quite seem to figure out how to deal with main/added entries even though we've talked about it once already! It's just not very clear-cut how to deal with this particular issue. I also transferred over (well, copied and pasted is more like it!) the information for no. 11 - there was a record for another copy of it in a different bound volume that Mark had already cataloged in the regular catalog. I just brought over the info that is normally found in the suppl. catalog, though - I need to find out if that's what I was supposed to do or if I should have just brought over everything.

Not much else to report at this point...

Hours today: 3 (4:30-7:30pm)
Hours this week: 6
Total hours completed: 86.5

Monday, October 02, 2006

Cleaning up the machine, DLXS BibClass info, etc.

Today I spent some time talking with Brad about various things, cleaning up the computer I've been using down in the Spec room (Systems apparently replaced the machine since I was here last - some shortcuts were missing, I couldn't find any documents (including my revised scanning procedure - yikes!) but found them eventually, and all of the browser bookmarks are gone!). So, it took me a bit of time to clean things up as best I could. I made sure all the shortcuts that used to be on the desktop were there, added some shortcuts to the Start menu, I set Gaylord's main page as the home page on each of the browsers, and I started adding some bookmarks to Firefox. I hope no one had anything important bookmarked on this machine - we might be able to get them back from SOS, but I'm not sure...
Connexion was okay except that the authorizations had disappeared so I'll have to bring over that info tomorrow and fix it (unless Mark needs to fix it before I get here). At least version 1.6 is still on it - hooray! Several machines mysteriously migrated back to 1.5 recently...

I also spent some time looking at the DLXS website for more information about BibClass. I found a document that is essentially a simple tree structure for the bib.dtd. I think this will be helpful as we figure out how things are going to be mapped/transformed. I found information about a script that was used to transform MARC records from NOTIS into bib.dtd, but I don't know how useful this would be since a lot of information would need to be changed (a lot of it was specific to NOTIS). It's a nice idea, though. If I knew more about programming, I'd write a script to transform MARC records from III into bib.dtd!

I explored a little bit about Unicode also. Apparently if the data does not come to you in Unicode UTF-8 encoded XML, conversion (one or two steps) will be necessary. I was wondering how diacritics would transfer. I'm not really sure what will be needed in this regard. I'm embarassed to say I'm not even sure what encoding our MARCXML document ended up with - that's something I'll need to look into. It's probably just ASCII...?

I also read some more about broker20, the program that produces XML responses to OAI verbs as dictated by version 2.0 of the OAI protocol. This step seems relatively simple, once everything is living happily in BibClass (knock on wood).

I'm having trouble with some of the BibClass syntax. For example, I'm not sure how "collections" and "groups" relate, except that as far as I can tell, one can have several collections in a group? A collection is essentially a bibliographic index that consists of all bibliographic fields in SGML or XML format, conforming to bib.dtd. I'm not sure I quite follow that, either, now that I think about it...

Reading more about the BibClass DTD it was reiterated that it is not intended to replace the robustness of MARC. It is simple and succinct, and it is a "tight" data model which means that it often constrains which fields may or must be used as well as what sorts of information can go into those fields (in stark contrast to Dublin Core, which is very flexible (fields are repeatable and optional)).

Another useful thing I found is an example of a UMR (Univ. of Michigan Reports, I believe) record in BibClass DTD. It shows the populated data fields as the user would see them and then also as the code source appears. I think this will be very helpful in trying to figure out what's going to go where.

I learned that the basic fields currently include: author, title, entire record, publisher, place of publication, year (of publication), series, notes, collection ID, format, type, language, ID (of the record, like the III .b number I would assume), and dt (OAI-specified date of last update for a record). Apparently it IS possible to define additional fields or substitute values for fields already specified, but one must create a NEW bib.map instead of revising the old one. I'm not entirely clear on what the bib.map is to begin with, but it's good to know that things aren't quite as hard-and-fast as we perhaps initially thought.

Hours today: 3 (4:15-7:15pm)
Hours this week: 3
Total hours completed: 83.5