Oai-Pmh Compliance of Indian Institutional Repositories

Assessing OAI-PMH Compliance of Indian Institutional Repositories

by K. P. Saxena*, Dr. Rochana Srivastava,

- Published in Journal of Advances in Science and Technology, E-ISSN: 2230-9659

Volume 6, Issue No. 12, Feb 2014, Pages 0 - 0 (0)

Published by: Ignited Minds Journals


ABSTRACT

Open Archives Initiative Protocol for Meta-dataHarvesting (OAI-PMH) is one of the key considerations while implementingrepositories. It is a HTTP based protocol developed by Open Archive Initiative(OAI) with the objective to achieve interoperability between repositories. Theprotocol allows service providers or metadata harvesting services to seamlesslyharvest metadata exposed by the repositories or data providers. A study ofselected functional Institutional Repositories archiving their Institutionalresearch output was conducted to test the compliance of repositories toOAI-PMH. The repositories’ base URLs were tested to find out the compliance. Itwas found that being a crucial component majority of repositories (77.55%)complied to test. Metadata of repositories not fully complying to theconformance standards set by OAI and other harvesting services like OAIster,BASE and CASSIR cannot be harvested. This affects the access and visibility ofrepositories as well as metadata archived. The Registration of repositorieswith OAI and other service providers will ensure OAI-PMH compliance andaccessibility of repositories.

KEYWORD

Open Archives Initiative Protocol, Meta-data Harvesting, OAI-PMH, repositories, compliance

INTRODUCTION

Institutional Repositories have emerged as the viable means to achieve the basic objectives of open access the access and visibility. Compliance of Institutional Repositories to OAI-PMH protocol is one of the requirements to achieve these objectives through interoperability of repositories. Interoperability is a relatively broader term defined by the Open Archive Initiative (2013) as “the ability of systems, services and organizations to work together seamlessly towards common or diverse goals.” With reference to Institutional Repositories (IRs) interoperability refers to the communication between distributed repository systems for achieving discovery and access to data and metadata. Compliance to OAI- PMH (Open Archives Initiative Protocol for meta-data Harvesting) is primary requirement for ensuring interoperability of IRs.

2.2.8.1OAI-PMH

The OAI-PMH is a low-barrier mechanism for interoperability between Service Providers and Data Providers (the repositories). The repositories expose their metadata via OAI-PMH and service providers or metadata harvesting services like OAIster, BASE and CASSIR etc. make service requests to harvest the metadata. OAI-PMH provides a mechanism for communication between two systems the Data Providers or repositories and Service Providers. Repositories are web based servers which respond to the HTTP based request of the harvester in the form of a XML encoded response. It is important for the repositories to expose their meta-data in any of the standard format like Dublin Core (DC). There are rich metadata standards like MARC-21available but Dublin Core (DC) is almost de facto standard. Metadata is at the core of OAI-PMH. The Unqualified Dublin Core is easily understandable set of 15 element sets used to describe the bibliographical information of a document. The unqualified Dublin Core Meta-data (oai_dc) is the minimum requirement for the repository to implement OAI-PMH. Some metadata elements are filled by the creator himself at the time of deposit but for implementing the vocabulary control and consistency in the metadata it must be reviewed by the repository managers. This descriptive metadata is used for the resource identification, resource discovery, and indexing of a digital resources. The digital objects are becoming more and more complex, they no longer consists of single file, but have a collection of several interrelated files. Repositories may use parallel more complex metadata schema like e.g. METS (Metadata Encoding and Transmission Standard), MODS (Metadata Object Description Schema), MPEG-21 DIDL (Digital Item Declaration Language), and PREMIS (Preservation Metadata Implementation metadata schemas into one overall description, lying in several layers mixing bibliographic, structural and preservation metadata as one set.

OAI-PMH Implementation

It is important for the repositories to expose their meta-data in any of the standard metadata format which can be encoded in XML but the minimum agreed upon requirement unqualified DC. Following is the minimum requirement for the implementation of OAI-PMH at the Data provider end.

I. Metadata Format Dublin Core or (DC) metadata standard is the first and foremost requirement for the functioning of OAI-PMH. Repositories may use rich metadata formats like MARC-21 to store metadata but when repositories receive OAI-PMH request the metadata format is dynamically changed to DC / oai_dc format. Unqualified DC is a set of 15 metadata element sets. Though all these elements are optional and repeatable but some harvesters define minimum sets of DC elements e.g. in OVAL (2013) the BASE OAI-PMH Validator, minimal DC elements defined are Date, Creator, Identifier, Type and Title. II. Sets Implementation of Sets provides a method to define groups of metadata within a repository. It is optional to define sets. Therefore sets are defined only when required to fulfill the needs of specific communities e.g. departments in a university. The sets may or may not be hierarchical depending upon the requirements. Both Data providers and Service providers support provisions of sets and their hierarchy. III. Base URL

The Base URL of repository is the unique identifier for repositories required for the implementation of OAI-PMH. All the OAI compliant repositories have a OAI base URL (e.g. http://oar.icrisat.org/cgi/oai2) for machine to machine communication. Repositories provide valid XML response to http request in the form of OAI-PMH verbs through this Base URL.

IV. Datestamp

OAI-PMH has provision of Datestamp for incremental harvesting. Metadata records have the Dataestamp for each metadata record when record is created or modified and response date for OAI-PMH requests. Datestamp has granularity of days and seconds. The repositories are expected to have seconds granularity. Datestamp is expressed with seconds granularity is expressed as 2013-09-16T09:05:50Z Repositories consist of large number of records and OAI-PMH provides mechanism for harvesting data from repositories without losing any data or changes in data during to incremental harvesting. During incremental harvesting from repositories for any incomplete harvest repositories issue a Resumption Token so that a Service Provider can resume harvesting from last harvest using Resumption Token without harvesting the already harvested records. A Resumption Token may consist of following optional elements.

  • ExpirationDate (date after which resumption token does not work)
  • CompleteListSize ( Size of result )
  • Cursor (records already disseminated)

VI. Compression

OAI-PMH has the provision of compressing data to enable lengthy harvesting of metadata. It is optional provision expressed by repositories in Identify response (without compression). Both service provider and data provider should support the defined compression gzip, compress or deflate etc.

VII. Error Messages

Each time if valid XML response is not received from a repository in six OAI-PMH verbs the error generator issues error messages according to the errors incurred. Sometimes if these verbs are missing or its syntax is incorrect the request results in error message bad Verb. If the required parameters (arguments) are missing or syntax is incorrect the request results in error message bad Argument. Other error messages are bad Resumption Token, no Records Match, no Metadata Formats, and no Set Hierarchy.

VIII. OAI-PMH Verbs

There is a set of six standard requests “OAI-PMH Verbs” a harvester or service provider can send to a repository or data provider. These requests in proper syntax and with required arguments are sent over the web using HTTP protocol. Once the HTTP request is received by the data provider its base URL responds with a valid XML response according to OAI-PMH Verb used and specified in the protocol. A service provider may any one verb or all the verbs but repositories are required to implement all the Verbs.

K. P. Saxena1 Dr. Rochana Srivastava2

Required parameters are compulsory to be used as an argument with the verbs while optional parameters may be used as optional arguments with verbs to limit the results. Use of set parameter will limit the results to specific sets. Use of 'from' and 'Until' parameters will limit the results to specified time periods only.

i. Identify

Identify request retrieves important information about the repository like repository name, its base URL, version of the protocol supported by repository, the e-mail address of the repository administrator and rights information. All this information is stored in the database of OAI. The valid response to identify request shows that repository is configured for OAI-PMH protocol. e.g.

Request: http://oar.icrisat.org/cgi/oai2?verb=Identify Response:

Request was of type Identify

AI-Identifier EPrints Description Content

http://oar.icrisat.org/policies.html

Submission Policy

No submission-data policy defined. This server has not yet been fully configured.

Metadata Policy

No metadata policy defined. This server has not yet been fully configured. Please contact the admin for more information, but if in doubt assume that NO rights at all are granted to this data.

Data Policy

No data policy defined. This server has not yet been fully configured. Please contact the admin for more information, but if in doubt assume that NO rights at all are granted to this data. This system is running eprints server software (EPrints 3.3.10) developed at the University of Southampton. For more information see http://www.eprints.org/ Therefore the response to OAI-PMH Verb “Identify” gives valuable information regarding the OAI-PMH protocol version implemented, date granularity, earliest datestamp, metadata-schema and rights information stored in repository database.

ii. ListMetadataFormats

ListMetdataFormats request must retrieve at least one metadata format supported by the repository. The oai_dc metadata format is the minimum is the minimum result retrieved. e.g. Request: http://oar.icrisat.org/cgi/oai2?verb=ListMetadataFormats Response: (Sample Metadata Formats)

Metadata Format

Similarly other supported Metadata Formats: Mets, Rdf, open biblio, didl, uketd_dc

iii. List Sets

List Sets request retrieves information about the sets of records defined in the repository.

Request: http://oar.icrisat.org/cgi/oai2?verb=ListSets

Response: (Sample Sets)

Set Set

All the defined sets are retrieved as above. iv. List Identifiers List Identifiers request may be used to harvest record headers rather than complete metadata records as in List Records. This request is issued with a date range from and until arguments. e.g. Request:http://oar.icrisat.org/cgi/oai2?verb=ListIdentifiers&metadataPrefix=oai&from2012-12-01until2013-10-10 Response: (Sample Identifiers)

OAI Record Header

All the OAI Record headers are retrieved depending upon the number of records and batch size.

v. List Records

List Records request is used to harvest metadata records. This request is issued with a date range from and until arguments. e.g. Request: http://oar.icrisat.org/cgi/oai2?verb=ListRecords&metadataPrefix=oai_dc Response: (Sample Record)

OAI Record: oai:oar.icrisat.org:1

Dublin Core Metadata (oai_dc)

K. P. Saxena1 Dr. Rochana Srivastava2

There are more results depending upon the number of records and list

VI. Get Records Get Records request is used to retrieve single record using record identifier. The repository must return a record in minimum oai_dc format. e.g. Request:http://oar.icrisat.org/cgi/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:oar.icrisat.org:1 Response: (Retrieves above mentioned record) All the OAI compliant repositories have a OAI base URL (e.g. http://oar.icrisat.org/cgi/oai2) for machine use besides the URL (e.g. http://oar.icrisat.org/) for human users which are commonly known. When http request is send in the form of OAI-PMH verb and other arguments appended to the URL the OAI base URL return XML response. Repositories responding with valid XML response and Error messages to the OAI-PMH verbs are called AOI-PMH compliant. The repositories providing OAI base URL may be tested directly for the compliance to OAI-PMH verbs directly through http requests. OAI conducts conformance tests on the repository before registering a repository. The OAI also validates the Base URL before registration. Following type of response is received while testing the validation of base URL of Eprints@CMFRI (http://eprints.cmfri.org.in/cgi/oai2/). Request: GET http://eprints.cmfri.org.in/cgi/oai2?verb=Identify Administrator email address madhan@nitrkl.ac.in [PASS] Correctly reports OAI-PMH protocol version 2.0 [PASS] baseURL supplied matches the Identify response [PASS] Datestamp granularity is 'seconds' [PASS] earliestDatestamp is 0001-01-01T00:00:00Z [PASS] oai-identifier description for version 2.0 is being used [PASS] namespace-identifier (repositoryIdentifier element) in oai-identifier declaration is generic.eprints.org Retrieved October 25, 2013 from http://www.openarchives.org/data/registerasprovider.html/ OAI Repository Explorer is a tool provided by OAI for testing the compliance of repositories to OAI-PMH through an interactive user interface. University of Illinois maintains a database of OAI-PMH compliant repositories, the OAI-PMH Data Provider Registry (http://gita.grainger.uiuc.edu/registry/). In the OVAL the OAI-PMH Validator of BASE (Bielefeld Academic Search Engine), following result is retrieved for testing the base URL of Eprints@CMFRI (http://eprints.cmfri.org.in/cgi/oai2/).

Repository Information

Name: CMFRI Admin: madhan@nitrkl.ac.in

Server communication

SUCCESS: Server supports both GET and POST requests. SUCCESS: OAI-PMH version is 2.0

XML Validation

SUCCESS: Identify response well-formed and valid. SUCCESS: List Records response well-formed and valid.

Harvesting

SUCCESS: Deleting strategy is "persistent" SUCCESS: List Records batch size is 100. SUCCESS: Resumption requests work RECOMMENDATION: resumption Token should contain expiration Date information. RECOMMENDATION: resumption Token should contain complete List Size information. SUCCESS: Incremental harvesting (day granularity) of List Records works. SUCCESS: Incremental harvesting (full granularity) of List Records works. UNVERIFIED: dc:language conformance to ISO 639 could not be checked: no dc:language element found SUCCESS: dc:date elements conform to ISO 8601. SUCCESS: Minimal DC elements (date, creator, identifier, type, title) are present.

Explanation of message categories:

  • INFO: Potentially interesting information
  • RECOMMENDATION: Deprecated protocol feature or content-related issue
  • WARNING: Protocol violation or malformed content data
  • UNVERIFIED: Feature could not be checked for some reason

Retrieved October 25, 2013 from http://oval.base-search.net/

The validation criterion set by the OAI-PMH validators may vary but it was observed that very few repositories were conforming to complete validation test. Therefore the response to OAI-MH was tested directly with base URL using OAI-PMH Verb “Identify”.

RESEARCH METHOD

The repositories being web based Open Access archives of digital contents were easily accessible through internet. The 49 functional Institutional Research Repositories were selected from various sources like OpenDOAR, ROAR and “CSIR-CENTRAL” Central platform for CSIR repositories. Compliance to Open Archives Initiative Protocol for meta-data Harvesting (OAI-PMH) is one of the important requirements for the Open Access Institutional Repositories. The OAI-PMH verb “Identify” was used to test the implementation of OAI-PMH in the repositories. It was important to know the correct base URL of the repositories to run the “Identify” command in the repositories. The base URL information available in OpenDOAR OAI Registry has been taken as the reference point for the base URL of the selected repositories. The base URLs were directly tested with OAI-PMH Verb “Identify” to see the response.

DATA COLLECTION

The base URLs of only 19 repositories were registered in OpenDOAR while 10 repositories were registered in OAI Registry. The OAI-PMH version 2.0 is the stable version of the protocol released in 2002. The 26 repositories under study were using OAI-PMH version 2.0 while rest of the 23 repositories were using version 1.0 of the protocol. As depicted in the Table the repositories using OAI-MPH version 2.0 depicted as oai2 in the base URL and repositories using version 1.0 were depicted as oai in the base URL. The following checklist depicts the base URL of selected repositories tested for valid XML response to OAI-PMH Verb “Identify” along with source (OAI Registry / OpeDOAR) where the base URL was listed. The repositories properly responding to responding to HTTP request have been depicted as compliance (√ ) .

K. P. Saxena1 Dr. Rochana Srivastava2

FINDINGS & SUGGESTIONS

Altogether the base URLs of 41 repositories (77.55%) were responding to verb “Identify”, rest of the repositories were not responding to OAI-PMH requests as their correct base URLs were either not known or the repositories were not OAI-PMH compliant. It is suggested that the repository administrators should register latest base URL of the repository with OAI registry to ensure OAI-PMH compliance. Only 10 repositories (20.40%) were registered in the OAI registry. The registration of repositories in the depositor’s database of metadata harvesters like OAIster, BASE or Indian harvesters CASSIR will also while 13 repositories (26.53%) under study were registered with CASSIR. Therefore it is not a common practice among the repository administrators to get their repository registered with OAI or other harvesters like OAISTER or Indian repository harvester CASSIR. Visibility of repositories in the OpenDOAR and ROAR is also important for locating a repository. Only 28 repositories (57.14%) under study were listed in the ROAR and OpenDOAR. It is therefore suggested that minimum implementation of OAI-PMH and registration with various registries and metadata harvesting services like OAIster, BASE, Scientific Commons and CASSIR etc. should be a necessary condition for the implementation of Open Access Institutional Repositories. The policy statements of various institutions and organizations implementing repositories may also make such provisions to ensure access and visibility of repositories.

REFERENCES:

1. CSIR Central (2013). Central Platform for Open Archive Repositories and Harvesters. Retrieved September 16, 2013 from http://www.csircentral.net/ 2. OAR@ICRISAT (2013). Open Access Repository of International Crops Research Institute for the Semi Arid Tropics. Retrieved October, 25 2013 from http://oar.icrisat.org/ 3. Open Archives Initiative (2013). Open Archives Initiative Protocol for Metadata Harvesting; Registering as Data Provider OAI-PMH Version 2.0. Retrieved October 25, 2013 from http://www.openarchives.org/pmh/ 4. OpenDOAR (2013). Directory of Open Access Repositories. Retrieved September 16, 2013 from http://www.opendoar.org/ 5. OVAL (2013). OVAL: BASE OAI-PMH Validator Retrieved October 25, 2013 from http://oval.base-search.net/ 6. ROAR (2013). Registry of Open Access Repositories. Retrieved September 16, 2013 http://roar.eprints.org/