Memento Project
Memento is a United States National Digital Information Infrastructure and Preservation Program –funded project aimed at making Web-archived content more readily discoverable.
The project is being led by the Los Alamos National Laboratory and Old Dominion University.
Rather than expecting people to know about the growing number of Web archives, and to guess which archive might hold an older version of the resource they’re looking for, Memento proposes to make archived content discoverable via the original URL that the searcher already knew about. Essentially, Memento is an attempt to permit users to view any web page as it looked on a given date in the past.
Technical description
A variety of web archives exist, collecting specific revisions of web pages as they existed at a particular point in time. Memento allows a user to seamlessly transition between these archives in search of the best archived page matching the datetime for the page that they desire.Memento is defined in RFC 7089 as an implementation of the time dimension of content negotiation, as defined by Tim Berners Lee in 1996. HTTP accomplishes negotiation of content via headers. The table below shows the different headers available for HTTP that allow clients and servers to find the content that the user desires.
Request Header | Response Header | Dimension | Examples | Reference |
Accept | Content-Type | content-type of the representation | text/html text/plain image/png | RFC 7231 RFC 2616 |
Accept-Language | Content-Language | language of the representation | en en-US cz | RFC 7231 RFC 2616 |
Accept-Encoding | Content-Encoding | medium, typically compression, that the content has been encoded with | compress gzip deflate | RFC 7231 RFC 2616 |
Accept-Charset | Content-Type | the character set used by the web page | iso-8859-5 unicode-1-1 | RFC 7231 RFC 2616 |
Accept-Datetime | Memento-Datetime | time of the representation | Fri, 15 Aug 2014 13:43:03 GMT | RFC 7089 |
Memento provides the Accept-Datetime request header so that clients can provide a date to the server, and the server can provide the best archived version of a page for that date. This is referred to as datetime negotiation.
To understand Memento fully, one must realize that the Last-Modified header provided by HTTP does not necessarily reflect when a particular version of a web page came into existence. Also, the Last-Modified header may not exist in some cases. To provide more information, the Memento-Datetime header has been introduced to indicate when a specific representation of a web page was observed on the web.
The diagram above shows the 3 step process by which Memento finds the best archived web page for the datetime supplied by the user. The process works as follows:
- The Memento client contacts the original resource to see if it will return information about a TimeGate in the Link header.
- The Memento client then uses the Accept-Datetime request header to submit the datetime desired by the user to the URI-G discovered in the previous step. Most resources on the web do not return a URI-G yet, so most Memento clients use a predefined list of TimeGates to accomplish this step. The TimeGate then returns a 302 redirection status code and a Location header to tell the client where to find the archived resource.
- The Memento client then requests the archived resource like it would any other web page. The response for the URI-M contains a Memento-Datetime indicating when it was observed on the web.
Usage
One can find copies of page by simply navigating, in a web browser, to a link formatted, replacingurltoarchive
with the full URL of the page desired:JSON description of a Memento:
redirect to a Memento with a datetime that is close to a desired datetime: