Thursday, October 4, 2007

The Affiliation Issue

Let's talk about my favourite repository issue, as seen on my blog. Author affiliation. How do you describe the institutional affiliation for each author on an item in your repository? In some repository software it's basically impossible (I'm looking at you Eprints and DSpace). In Fedora it's easy if you use a metadata schema that supports it, but once you start talking about harvesting via OAI-PMH (which by default uses flat Dublin Core metadata) it becomes a problem again.

Here's another quick trip through some of the issues that will face AANRO.

Here's a bit of raw AANRO data, a record for a single article in the format we received from AANRO-central.

"AG199600004" "Journal article" "1996" "Grazing and anthelmintic treatments to increase growth of Cashmere and Angora weaner goats" "Robertson JA (Victoria University of Technology, Centre for Bioprocessing and Food Technology, Melbourne)|Ritar AJ (Tasmanian Department of Primary Industry and Fisheries, Marine Research Laboratories, Taroona)|Evans G (University of Sydney, Department of Animal Science)" "Rural Industries Research and Development Corporation" "Australian Veterinary Journal, 1996-09, 74 (3), p246-248, 1 table, 7 refs, ISSN 0005-0423." "0005-0423" "A study was undertaken to examine whether growth of kids after weaning and internal parasitism were affected by grazing on pasture or fodder crop and by an initial treatment of ivermectin at weaning. Experiments were conducted at Cressy Tas on 21 Angora and 94 Cashmere kids, grazing forage rape, uncontaminated pastures, or pastures contaminated with nematodes, and treated with ivermectin anthelmintic at weaning or not treated. The results demonstrated the benefit to weaner goats of grazing a fodder crop of rape compared with pasture treatments. Grazing a crop of high nutritional value maintained a weight advantage and there were no production losses due to nematode infection. It was shown that careful grazing management and strategic use of anthelmintics reduces the reliance on and the cost of anthelmintics and probably decreases selection pressure on parasites for anthelmintic resistance while increasing rates of growth for weaner goats." "Goats|Kids|Cashmere|Angora|Animal parasitic nematodes|Anthelmintics|Ivermectin|Pest control|Grazing|Pastures|Rape|Fodder crops|Growth|Liveweight" "Cressy Tas|Tamar River (III18)|AER (1)"

Look at the info we have about one of the authors:

Robertson JA (Victoria University of Technology, Centre for Bioprocessing and Food Technology, Melbourne)

The author's name is followed by their affiliation. The first step in our transformation process is to put this into simple XML:

<author>Robertson JA (Victoria University of Technology, Centre for Bioprocessing and Food Technology, Melbourne)</author>

Then we transform it via a few more steps to this MODS (confession, we actually still have work do to on the process):

<name>

<namePart type="personal">Robertson JA</namePart>

<role>

<roleTerm type="text">author</roleTerm>

</role>

<affiliation>Victoria University of Technology, Centre for Bioprocessing and Food Technology, Melbourne</affiliation>

</name>

This says that J A Roberston authored this item while affiliated with the Victoria University of Technology, Centre for Bioprocessing and Food Technology, Melbourne. The same person might have multiple affiliations over time, though and the MODS schema allows us to keep track of this.

(I picked this item at random but it serves to illustrate another point. That uni has changed its name to Victoria University. It would be nice to be able to preserve the name of the institution as it used to be but also to be able to show that it's the same place. More on this currently impossible dream soon.)

As noted, it's hard deal with author affiliation nicely in Eprints and DSpace repositories. Our USQ Eprints team is working on the problem, though.

For non-federated repositories, VITAL and Fez can both handle metadata like this, although in VITAL there is no web-based editing of Fedora items. You have to load the MODS metadata into an XML editor, which makes it a task that's not for general users. We'll be exploring how this works in Fez real soon now, as soon as we can get Version 2 to run.

There are two other issues, though.

  1. Harvesting OAI-PMNH.

    Now it is possible to harvest any metadata stream, so in a potential federated architecture, some AANRO repositories might be able to supply MODS (or similar) metadata to harvesters.

    But there are other discovery services that AANRO, and by default they just use Dublin Core metadata, so to play with those you need to walk a delicate line between jamming metadata into the available transport mechanism to get it out there, and the user experience when people try to use the harvesting service to find things and discover that different services have used or misused different metadata fields in different ways.

    And given that some otherwise very good software can't handle nested metadata like MODS we need to think about how you might support AANRO contributing organizations who might be running such software.

    One way to get the information out there would be to do something like this:

    <dc:creator>Robertson JA (Victoria University of Technology, Centre for Bioprocessing and Food Technology, Melbourne)</dc:creator>

    I'm going to see what Neil Godfrey, USQ's metatdata maven has to say about this approach. Comment away Neil!

  2. Indexing in a portal

    Vital and Fez both have configuration that will let you search by affiliation, no problems, but what about in a federated architecture?

    Provided the affiliation information is in the OAI-PMH feed then it can be used to build an interface where you can search or browse for all items associated with a particular institution; an indexer could easily find the affiliation information that's in the brackets in the example above.

And finally, relevant to both of the above issues, there was an interesting presentation from Alison Dellit at the recent ARROW community day where she talked about how the National Library of Australia is building harvesting and search services that can adapt to the different ways that repositories serve up data, looks like a promising approach to me.

Copyright 2007 The University of Southern Queensland

Content license: Creative Commons Attribution-ShareAlike 2.5 Australia.

No comments: