The Metadata API is intended for fast, flexible, and reliable reading and writing of Internet Archive items.
Metadata Read API
The Metadata Read API is the fastest and most flexible way to retrieve metadata for items on archive.org. We’ve seen upwards of 500 reads per second for some collections!
Overview
Returns all of an item’s metadata in JSON.
Resource URL
http://archive.org/metadata/:identifier
Parameters
identifier
: The globally unique ID of a given item on archive.org.
Usage
For example, frenchenglishmed00gorduoft
is the identifier
for http://archive.org/details/frenchenglishmed00gorduoft. You can retrieve all of this item’s metadata from the Metadata API using the following curl command:
$ curl http://archive.org/metadata/frenchenglishmed00gorduoft
The Metadata API also supports HTTPS:
$ curl https://archive.org/metadata/frenchenglishmed00gorduoft
Sub-item Access
The Metadata API returns all of an item’s metadata by default. You can access specific metadata elements like so:
http://archive.org/metadata/:identifier/metadata http://archive.org/metadata/:identifier/server http://archive.org/metadata/:identifier/files_count http://archive.org/metadata/:identifier/files?start=1&count=2 http://archive.org/metadata/:identifier/metadata/collection http://archive.org/metadata/:identifier/metadata/collection/0 http://archive.org/metadata/:identifier/metadata/title http://archive.org/metadata/:identifier/files/0/name
Metadata Write API
The metadata write API is intended to make changes to metadata timely, safe and flexible.
It utilizes version 02 of the JSON Patch standard.
Overview
timely
- Callers receive results (success or failure) immediately.
- Changes are quickly reflected through the metadata read API.
safe
- All writes pass through the catalog, so all changes are recorded.
- All writes are checked before they’re submitted to the catalog.
- If there’s a problem, no catalog task is created. Goal: no redrows!
- All checks are repeated when the catalog task is executed.
flexible
- Supports arbitrary changes to multiple metadata targets through a unified API.
- Changes are easy — no string concatenation or libraries needed.
Resource URL
http://archive.org/metadata/:identifier
Parameters
identifier
: The globally unique ID of a given item on archive.org.
Targets
The Metadata Write API supports three kinds of target:
metadata
: Changes item_meta.xml (e.g. http://archive.org/metadata/:identifier/metadata
).
files/:filename
: Changes the file entry in the item’s files.xml (e.g. http://archive.org/metadata/:identifier/files
).
other
: Changes other.json (e.g. http://archive.org/metadata/:identifier/other
).
For XML targets (e.g. ‘metadata
‘ and ‘files
‘) patches should be composed against their JSON representation, as found in metadata read API results.
Usage
As an HTTP post/get
http://archive.org/metadata/:identifier
With the following url-encoded arguments:
-target
: The metadata target you would like to modify.
-patch
: The patch you are submitting to the Metadata API.
access
: Your IA-S3 access key.
secret
: Your IA-S3 secret key.
Authentication
NOTE: These calls must be made with appropriate authentication – at the moment, this means passing your Archive.org IA-S3 credentials. Please visit http://archive.org/account/s3.php to obtain your IA-S3 access key
and secret key
.
Patches
Patches are JSON strings. They should comply to the draft Json-Patch standard:
http://tools.ietf.org/html/draft-ietf-appsawg-json-patch-02
Examples
Writing to an item’s meta.xml
Add ‘scan_sponsor’ with value ‘Starfleet’ to target ‘metadata’ to the item metadata_test_item
:
#!/bin/bash ACCESS=<redacted> SECRET=<redacted> IDENTIFIER=metadata_test_item TARGET=metadata PATCH='{"add":"/scan_sponsor", "value":"Starfleet"}' curl --data-urlencode -target=$TARGET \ --data-urlencode -patch="$PATCH" \ --data-urlencode access=$ACCESS \ --data-urlencode secret=$SECRET \ http://archive.org/metadata/$IDENTIFIER
returns a JSON object, like the following:
{"success":true,"task_id":114350522,"log":"http://www.us.archive.org/log_show.php?task_id=114350522″}
or perhaps
{"error":"Some problem applying the patch"}
writing to files.xml entry
#!/bin/bash ACCESS=<redacted> SECRET=<redacted> IDENTIFIER=metadata_test_item TARGET='files/glogo.png' PATCH='{"add":"/camera", "value":"Canon A150″}' curl --data-urlencode -target=$TARGET \ --data-urlencode -patch="$PATCH" \ --data-urlencode access=$ACCESS \ --data-urlencode secret=$SECRET \ http://archive.org/metadata/$IDENTIFIER
Writing to metadata_test_item/foo_client.json
NOTE: Keys and values are binary-safe and unrestricted
#!/bin/bash ACCESS=<redacted> SECRET=<redacted> IDENTIFIER=metadata_test_item TARGET='foo_client' PATCH='{"add":"/of concern to foo", "value":{"foo-ness":["buckle", "shoe"]}}' curl --data-urlencode -target=$TARGET \ --data-urlencode -patch="$PATCH" \ --data-urlencode access=$ACCESS \ --data-urlencode secret=$SECRET \ http://archive.org/metadata/$IDENTIFIER
After the above call, a metadata read of metadata_test_item
will have a toplevel member ‘foo_client’ with value:
{"foo-ness":["buckle", "shoe"]}
Pingback: How to use the Virtual Machine for Researchers | Internet Archive Blogs