Digital preservation

Digital preservation is a set of processes and activities that maintain information stored in digital formats in order to ensure continued access to information; it includes the preservation of materials resulting from Digital Reformatting, data migration, emulation, replication, refreshing, and metadata attachment. These measures for preservation must be taken due rapid and constant changes in software and hardware environments, deterioration of magnetic media such as CDs, DVDs, computer hard drives, and for other reasons.

The preservation of digital information is widely considered to require more constant, proactive, and ongoing attention than preservation of other media[1]. This constant input of effort, time, and money to handle rapid technological and organizational advance is considered the main stumbling block for preserving digital information. Indeed, while we are still able to read our written heritage from several thousand years ago, digital information created merely a decade ago is in serious danger of being lost.

Contents

The digitization of all forms of cultural heritage is becoming increasingly popular at all levels of society. The preservation of valuable information is an important responsibility that people now must bear for the sake of future generations. Furthermore, the preservation of cultural tradition is a process of constant renewal and reaffirmation, and digital preservation is not possible without the combined efforts of private and public sectors, and domestic and international organizations.

Challenges

A society's heritage has been historically recorded on many different materials, including stone, vellum, bamboo, silk, paper and others. Today, a large quantity of information exists in digital forms, including documents, musics, sounds, images in emails, blogs, social networking websites, web photo albums, and websites of various kinds and types. According to a report by the U.S. Library of Congress, 44 percent of the sites available on the internet in 1998 had vanished one year later[2].

The unique characteristic of digital forms makes it easy to create content and keep it up-to-date, but at the same time raises many difficulties in the preservation of this content. Margaret Hedstrom points out that “digital preservation raises challenges of a fundamentally different nature which are added to the problems of preserving traditional format materials.”[3]

Physical deterioration

The first challenge digital preservation faces is that the media on which digital contents stand are more vulnerable to deterioration and catastrophic loss. While acid paper is prone to deterioration in terms of brittleness and yellowness, deterioration does not become apparent for at least six decades; and when the deterioration really happens, it happens over decades. It is also highly possible to retrieve all information without loss after deterioration is spotted. The recording media for digital data deteriorate at a much more rapid pace, and once the deterioration starts, in most cases there is already data loss.

Digital obsolescence

Another challenge, perhaps a more serious and important one, is the problem of long-term access. Digital technology is developing extremely fast, and one retrieval and playback technology can become obsolete in a matter of years. When faster, more capable and cheaper storage and processing devices are developed, the older version gets replaced almost immediately. When a software or decoding technology is abandoned, or a hardware device is no longer in production, records created under the environment of such technologies are at great risk of loss, simply because they are not tangible any more. This process is known as digital obsolescence.

This challenge is exacerbated by the lack of established standards, protocols, and proven methods for preserving digital information[4]. For example, we used to save copies of data on magnetic tapes, but media standards for tapes have changed considerably over the years, and there is no guarantee that tapes will be readable in the future[5]. Hedstrom further explained that almost all digital library researches have been focused on “architectures and systems for information organization and retrieval, presentation and visualization, and administration of intellectual property rights” and that “digital preservation remains largely experimental and replete with the risks associated with untested methods.” While the rapid advance of technology threats access of digital contents in one dimension, the lack of digitizing standards affects the issue in another dimension.

Strategies

In 2006, the Online Computer Library Center (OCLC) developed a four-point strategy for the long-term preservation of digital objects that consisted of:

  • Assessing the risks for loss of content posed by technology variables such as commonly used proprietary file formats and software applications.
  • Evaluating the digital content objects to determine what type and degree of format conversion or other preservation actions should be applied.
  • Determining the appropriate metadata needed for each object type and how it is associated with the objects.
  • Providing access to the content[6].

There are several additional strategies that individuals and organizations may use to actively combat the loss of digital information.

Refreshing

Refreshing is the transfer of data between two types of the same storage medium so there are no bitrate changes or alteration of data [7]. For example, transferring census data from a gold preservation CD to a new one. This strategy may need to be combined with migration when the software or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing will likely always be necessary due to the deterioration of physical media.

Migration

Migration is the transferring of data to newer system environments[8] This may include conversion of resources from one format to another (e.g., conversion of Microsoft Word to PDF or OpenDocument), from one operating system to another (e.g., Solaris to Linux) or from one programming language to another (e.g., C to Java) so the resource remains fully accessible and functional. Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.

The National Archives Electronic Records Archives and Lockheed Martin are jointly developing a migration system that will preserve any type of document, created on any application or platform, and delivered to the archives on any type of digital media. In the system, files are translated into flexible formats, such as XML; they will therefore be accessible by technologies in the future[9] Lockheed Martin argues that it would be impossible to develop an emulation system for the National Archives ERA because the volume of records and cost would be prohibitive.

Replication

Creating duplicate copies of data on one or more systems is called replication. Data that exists as a single copy in only one location is highly vulnerable to software or hardware failure, intentional or accidental alteration, and environmental catastrophes like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Replicated data may introduce difficulties in refreshing, migration, versioning, and access control since the data is located in multiple places.

Emulation

Emulation is the replicating of functionality of an obsolete system.[10] For example, emulating an Atari 2600 on a Windows system or emulating WordPerfect 1.0 on a Macintosh. Emulators may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems. The feasibility of emulation as a catch-all solution has been debated in the academic community.[11]

Raymond A. Lorie, a scientist in the area of computer languages working with Almaden Research Lab in California, has suggested a Universal Virtual Computer (UVC) could be used to run any software in the future on a yet unknown platform[12]. The UVC strategy uses a combination of emulation and migration. The UVC strategy has not yet been widely adopted by the digital preservation community.

Jeff Rothenberg, a major proponent of Emulation for digital preservation in libraries, working in partnership with Koninklijke Bibliotheek and National Archief (National Library) of the Netherlands, has recently helped launch Dioscuri, a modular emulator that succeeds in running MS-DOS, WordPerfect 5.1, DOS games, and more[13].

Metadata attachment

Metadata is data on a digital file that includes information on creation, access rights, restrictions, preservation history, and rights management [14]. Metadata attached to digital files may be affected by file format obsolescence. ASCII is considered to be the most durable format for metadata [15] because it is widespread, backwards compatible when used with Unicode, and utilizes human-readable characters, not numeric codes. It retains information, but not the structure information is presented in. For higher functionality, SGML or XML should be used. Both markup languages are stored in ASCII format, but contain tags that denote structure and format.

Trustworthy digital objects

Digital objects that can speak to their own authenticity are called trustworthy digital objects (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic[16]. Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.

Digital sustainability

Digital sustainability encompasses a range of issues and concerns that contribute to the longevity of digital information[17]. Unlike traditional, temporary strategies and more permanent solutions, digital sustainability implies a more active and continuous process. Digital sustainability concentrates less on the solution and technology and more on building an infrastructure and approach that is flexible with an emphasis on interoperability, continued maintenance and continuous development[18]. Digital sustainability incorporates activities in the present that will facilitate access and availability in the future.

Digital Preservation Standards

To standardize digital preservation practice and provide a set of recommendations for preservation program implementation, the Reference Model for an Open Archival Information System (OAIS) was developed. The reference model (ISO 14721:2003) includes the following responsibilities that an OAIS archive must abide by:

  • Negotiate for and accept appropriate information from information producers.
  • Obtain sufficient control of the information provided to the level needed to ensure long-term preservation.
  • Determine, either by itself or in conjunction with other parties, which communities should become the designated community and, therefore, should be able to understand the information provided.
  • Ensure that the information to be preserved is independently understandable to the designated community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information.
  • Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, and which enable the information to be disseminated as authenticated copies of the original, or as traceable to the original.
  • Make the preserved information available to the designated community[19].

OAIS is concerned with all technical aspects of a digital object’s life cycle: ingest into and storage in a preservation infrastructure, data management, accessibility, and distribution. The model also addresses metadata issues and recommends that five types of metadata be attached to a digital object: reference (identification) information, provenance (including preservation history), context, fixity (authenticity indicators), and representation (formatting, file structure, and what “imparts meaning to an object’s bitstream”[20]. Prior to Gladney's proposal of TDOs was the Research Library Group's (RLG) development of “attributes and responsibilities” that denote the practices of a “Trusted Digital Repository” (TDR) The seven attributes of a TDR are: “compliance with the Reference Model for an Open Archival Information System (OAIS), Administrative responsibility, Organizational viability, Financial sustainability, Technological and procedural suitability, System security, Procedural accountability." Among RLG’s attributes and responsibilities were recommendations calling for the collaborative development of digital repository certifications, models for cooperative networks, and sharing of research and information on digital preservation with regards to intellectual property rights. [21].

Digital sound preservation standards

In January 2004, the Council on Library and Information Resources (CLIR) hosted a roundtable meeting of audio experts discussing best practices, which culminated in a report delivered March 2006. This report investigated procedures for reformatting sound from analog to digital, summarizing discussions and recommendations for best practices for digital preservation. Participants made a series of 15 recommendations for improving the practice of analog audio transfer for archiving:

  • Develop core competencies in audio preservation engineering. Participants noted with concern that the number of experts qualified to transfer older recordings is shrinking and emphasized the need to find a way to ensure that the technical knowledge of these experts can be passed on.
  • Develop arrangements among smaller institutions that allow for cooperative buying of esoteric materials and supplies.
  • Pursue a research agenda for magnetic-tape problems that focuses on a less destructive solution for hydrolysis than baking, relubrication of acetate tapes, and curing of cupping.
  • Develop guidelines for the use of automated transfer of analog audio to digital preservation copies.
  • Develop a web-based clearinghouse for sharing information on how archives can develop digital preservation transfer programs.
  • Carry out further research into nondestructive playback of broken audio discs.
  • Develop a flowchart for identifying the composition of various types of audio discs and tapes.
  • Develop a reference chart of problematic media issues.
  • Collate relevant audio engineering standards from organizations.
  • Research safe and effective methods for cleaning analog tapes and discs.
  • Develop a list of music experts who could be consulted for advice on transfer of specific types of musical content (e.g., determining the proper key so that correct playback speed can be established).
  • Research the life expectancy of various audio formats.
  • Establish regional digital audio repositories.
  • Cooperate to develop a common vocabulary within the field of audio preservation.
  • Investigate the transfer of technology from such fields as chemistry and materials science to various problems in audio preservation.[22]

Large-Scale digital preservation initiatives (LSDIs)

Many research libraries and archives have begun or are about to begin Large-Scale digital preservation initiatives (LSDI’s). The main institutions that have begun LSDIs are cultural institutions, commercial companies such as Google and Microsoft, and non-profit groups including the Open Content Alliance (OCA) and the Million Book Project (MBP). The primary motivation of these groups is to expand access to scholarly resources.

LSDIs: Library Perspective

Approximately 30 cultural entities, including the 12-member Committee on Institutional Cooperation (CIC), have signed digitization agreements with either Google or Microsoft. Several of these cultural entities are participating in the Open Content Alliance (OCA) and the Million Book Project (MBP). Some libraries are involved in only one initiative and others have diversified their digitization strategies through participation in multiple initiatives. The three main reasons for library participation in LSDIs are: Access, Preservation, and Research and Development. It is hoped that digital preservation will ensure that library materials remain accessible for future generations. Libraries have a perpetual responsibility for their materials and a commitment to archive their digital materials. Libraries plan to use digitized copies as backups for works in case they go out of print, deteriorate, or are lost and damaged.

Examples of digital preservation initiatives

  • National Digital Information Infrastructure and Preservation Program. The Library of Congress's National Digital Information Infrastructure and Preservation Program (NDIIPP) is dedicated to ensuring that the digital information that conveys our history and heritage is available and accessible for generations to come. As a pioneer in the field of digital information, the Library has continued to provide digitized access to its vast collections, especially through sites such as American Memory, America's Library, and Exhibits.
  • Portico. Portico, originally launched by JSTOR in 2002, is an electronic archiving service which provides "a permanent archive of electronic scholarly journals."
  • FDsys. FDsys is system being developed by the United States Government Printing Office to authenticate, preserve, and provide access to government information from all three branches of the Federal government.
  • Elsevier Science digital archive. In 2002, the Koninklijke Bibliotheek became the official digital archive for 7 terabytes of Elsevier Science journals.
  • LOCKSS. The LOCKSS Program ("Lots Of Copies Keep Stuff Safe"), under the auspices of Stanford University, develops and supports open-source software for digital preservation based on a distributed network of preservation appliances running a sophisticated voting protocol. Originally designed to preserve scholarly journals, the LOCKSS technology is now being used to preserve electronic theses and dissertations, government documents, books, blogs, websites, image collections, etc. The LOCKSS Program also runs its own preservation network ([1], [2]).
    • MetaArchive Project. Six universities (Emory University, the Georgia Institute of Technology, the Virginia Polytechnic Institute and State University, Florida State University, Auburn University and the University of Louisville) and the Library of Congress are developing "a cooperative for the preservation of at-risk digital content [about] the culture and history of the American South" in a private LOCKSS network.
    • Alabama Digital Preservation Network (ADPNet). In 2006, the Institute of Museum and Library Services (IMLS) awarded a two-year National Leadership Grant to the Network of Alabama Academic Libraries (NAAL) and seven Alabama institutions to build a low-cost distributed digital preservation network for the state. The resulting system, ADPNet, is a private LOCKSS network; its mission is to preserve digital content created by cultural heritage organizations in Alabama and to serve as a model for digital preservation networks in other states. The participating institutions are the Alabama Department of Archives & History, Auburn University, Spring Hill College, Troy University, the University of Alabama, the University of Alabama at Birmingham and the University of North Alabama.
    • ASERL ETDs. Eight universities of the Association of Southeastern Research Libraries (Florida State University, the Georgia Institute of Technology, North Carolina State University, the University of Kentucky, the University of Miami, the University of Tennessee, Vanderbilt University and the Virginia Polytechnic Institute and State University) are preserving each other's collections of electronic theses and dissertations (ETDs) in a private LOCKSS network.
    • GPO LOCKSS Pilot. The Government Printing Office conducted a pilot program to "manage, disseminate, and preserve access to Web-based Federal Government e-journals that are within the scope of the FDLP and IES" (Federal Depository Library Program and International Exchange Service), using LOCKSS technology. Pilot participants included 18 universities, the German National Library, the United States National Agricultural Library and the Government Printing Office ([3]).
    • Alaska State Publications Program. To continue complying with its obligations under Alaska state statutes to "make state publications freely available to Alaskans by distributing them to local depository libraries," the Alaska State Library is expanding its depository program to preserve Alaska State publications that are Web-only ([4]) by making them accessible to LOCKSS collection ([5]).
    • CLOCKSS. The CLOCKSS ("Controlled LOCKSS") is "a not-for-profit community partnership among publishers and libraries that is developing a distributed, validated, comprehensive archive that preserves and ensures continuing access to electronic scholarly content" using a private LOCKSS network. It mobilizes the resources of twelve large publishers (American Chemical Society, American Medical Association, American Physiological Society, Blackwell Publishing, Elsevier, Institute of Physics, Nature Publishing Group, Oxford University Press, SAGE Publications, Springer Science+Business Media, Taylor and Francis and John Wiley & Sons) and seven institutions (Indiana University, the New York Public Library, the OCLC, Rice University, Stanford University, the University of Virginia and the University of Edinburgh).
  • New media art preservation. Arts organizations (including the Solomon R. Guggenheim Museum, the Berkeley Art Museum, the Daniel Langlois Foundation for Art, Science and Technology, the New Museum of Contemporary Art's's Rhizome.org [6] and the Franklin Furnace Archive, amongst others) have been collaborating on various initiatives in the research of New media art preservation. Such initiatives include the Variable Media Network [7] and the Arching the Avant Garde project [8].
  • NDHA.The National Digital Heritage Archive (NDHA) Programme is a partnership between the National Library of New Zealand, Ex Libris Group and Sun Microsystems to develop 'Preservation' a digital archive and preservation management system. Established in 2004, the NDHA Programme is due to be completed in late 2009.
  • The Rose Goldsen Archive of New Media Art. The Archive was founded in 2002 by Timothy Murray and was named after the pioneering critic of the commercialization of mass media, the late Professor Rose Goldsen of Cornell University. The Archive hosts international art work produced on CD-Rom, DVD-Rom, video, digital interfaces, and the internet. Its collection of supporting materials includes unpublished manuscripts and designs, catalogues, monographs, and resource guides to new media art. The curatorial vision emphasizes digital interfaces and artistic experimentation by international, independent artists. Designed as an experimental center of research and creativity, the Goldsen Archive includes materials by individual artists and collaborates on conceptual experimentation and archival strategies with international curatorial and fellowship projects.
  • DSpace is an open source software that is available to anyone who has the World Wide Web. DSpace essentially takes data in multiple formats (text, video, audio, or data), distributes it over the web, indexes the data (for easy retrieval), and preserves the data over time. Posting data on DSpace is fairly simple, but it does require those who are posting it to have the copyright to the material or to have permission to post non-copyrighted data. The information entered into DSpace (title, author, publication information, and keywords) is called "Metadata." DSpace's main use for cataloging this "Metadata" is to preserve it over time.
  • PADI is a comprehensive archive of information on the topic of digital preservation from the National Library of Australia.

See also

Notes

  1. R. McLeod, P. Wheatley, and P. Ayris, (2006) Information for E-literature Abstract. accessdate 2008-08-14 LIFE
  2. "U.S. Congress Approves Library of Congress Plan for Preservation of Digital Materials", Library of Congress, 2003-02-07. Retrieved on 2008-08-13
  3. M. Hedstrom, (1997). Digital preservation: a time bomb for Digital Libraries. Retrieved August 13, 2008.
  4. D. M. Levy & C. C. Marshall, (1995). Going digital: a look at assumptions underlying digital libraries," Communications of the ACM 58 (4): 77-84.
  5. Myron Flugstad, (2007). "Website Archiving: the Long-Term Preservation of Local Born Digital Resources." Arkansas Libraries 64 (3) (Fall 2007): 5-7
  6. Online Computer Library Center, Inc. (2006). OCLC Digital Archive Preservation Policy and Supporting Documentation, 5. oclc.org. Retrieved August 14, 2008.
  7. Cornell University Library. (2005) Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems. Retrieved August 14, 2008.
  8. J. Garrett, et al. (1996). Preserving digital information: Report of the task force on archiving of digital information. Commission on Preservation and Access and the Research Libraries Group.
  9. Reagan,
  10. Jeff Rothenberg, (1998). Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. (Washington, DC, USA: Council on Library and Information Resources.)
  11. Stewart Granger, (2000). "Emulation as a Digital Preservation Strategy." D-Lib Magazine 6 (10).
  12. Raymond A. Lorie, (2001). "Long Term Preservation of Digital Information." Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01), 346-352.
  13. J. Hoeven, (2007). "Dioscuri: emulator for digital preservation." D-Lib Magazine 13 (11/12).
  14. NISO Framework Advisory Group. (2004). A Framework of Guidance for Building Good Digital Collections, 2nd edition, 27. NISO.org. Retrieved August 14, 2008.
  15. National Initiative for a Networked Cultural Heritage. (2002). NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. Retrieved August 14, 2008.
  16. H. M. Gladney, (2004). "Trustworthy 100-year digital objects: Evidence after every witness is dead." ACM Transactions on Information Systems 22 (3): 406–436.
  17. K. Bradley, (Summer 2007). "Defining digital sustainability." Library Trends 56 (1): 148-163
  18. Sustainability of Digital Resources. (2008). TASI: Technical Advisory Service for Images. Retrieved August 14, 2008.
  19. Consultative Committee for Space Data Systems. (2002). Reference Model for an Open Archival Information System (OAIS). (Washington, DC: CCSDS Secretariat), 3-1
  20. (2005) Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems. Cornell University Library. Retrieved August 14, 2008.
  21. Research Libraries Group. (2002). Trusted Digital Repositories: Attributes and Responsibilities. Retrieved August 14, 2008.
  22. Council on Library and Information Resources. Publication 137: Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for Transferring Analog Discs and Tapes March 2006. CLIR.org. Retrieved August 14, 2008.

References

All links retrieved August 14, 2008.

External links

All links retrieved October 23, 2017.

Credits

New World Encyclopedia writers and editors rewrote and completed the Wikipedia article in accordance with New World Encyclopedia standards. This article abides by terms of the Creative Commons CC-by-sa 3.0 License (CC-by-sa), which may be used and disseminated with proper attribution. Credit is due under the terms of this license that can reference both the New World Encyclopedia contributors and the selfless volunteer contributors of the Wikimedia Foundation. To cite this article click here for a list of acceptable citing formats.The history of earlier contributions by wikipedians is accessible to researchers here:

Note: Some restrictions may apply to use of individual images which are separately licensed.