A distributed Integrity Catalog for digital repositories
2014
Abstract
Digital repositories, either digital preservation systems or archival systems, periodically check the integrity of stored objects to assure users of their correctness. To do so, prior solutions calculate integrity metadata and require the repository to store it alongside the actual data objects. This integrity metadata is essential for regularly verifying the correctness of the stored data objects. To safeguard and detect damage to this metadata, prior solutions rely on widely visible media, that is unaffiliated third parties, to store and provide back digests of the metadata to verify it is intact. However, they do not address recovery of the integrity metadata in case of damage or attack by an adversary. In essence, they do not preserve this metadata. We introduce IntegrityCatalog, a system that collects all integrity related metadata in a single component, and treats them as first class objects, managing both their integrity and their preservation. We introduce a treap-based persistent authenticated dictionary managing arbitrary length key/value pairs, which we use to store all integrity metadata, accessible simply by object name. Additionally, IntegrityCatalog is a distributed system that includes a network protocol that manages both corruption detection and preservation of this metadata, using administrator-selected network peers with two possible roles. Verifiers store and offer attestations on digests and have minimal storage requirements, while preservers efficiently synchronize a complete copy of the catalog to assist in recovery in case of a detected catalog compromise on the local system. We describe our prototype implementation of IntegrityCatalog, measure its performance empirically, and demonstrate its effectiveness in real-world situations, with worst measured throughput of approximately 1K insertions per second, and 2K verified search operations per second.
References (44)
- Clueweb12 dataset. http://www.lemurproject.org/clueweb12 .
- ADYA, A., BOLOSKY, W. J., CASTRO, M., CERMAK, G., CHAIKEN, R., DOUCEUR, J. R., HOWELL, J., LORCH, J. R., THEIMER, M., AND WATTENHOFER, R. FARSITE: Federated, available, and reliable storage for an incompletely trusted en- vironment. In Proc. 5th Symposium on Operating System De- sign and Implementation (5th OSDI'02) (Boston, Massachusetts, USA, Dec. 2002), D. E. Culler and P. Druschel, Eds., USENIX Association.
- ANAGNOSTOPOULOS, A., GOODRICH, M. T., AND TAMAS- SIA, R. Persistent authenticated dictionaries and their applica- tions. Lecture Notes in Computer Science 2200 (2001), 379-393.
- BITTON, D., AND GRAY, J. Disk shadowing. In vldb (Aug. 1988), pp. 331-338.
- CHEN, LEE, GIBSON, KATZ, AND PATTERSON. RAID: High- performance, reliable secondary storage. CSURV: Computing Surveys 26 (1994).
- CLARKE, I., SANDBERG, O., WILEY, B., AND HONG, T. W. Freenet: a distributed anonymous information storage and re- trieval system. In International Workshop on Design Issues in Anonymity and Unobservability (2000), pp. 311-320.
- COX, L. P., MURRAY, C. D., AND NOBLE, B. Pastiche: Making backup cheap and easy. In Proceedings of the 5th ACM Sympo- sium on Operating System Design and Implementation (OSDI- 02) (New York, Dec. 9-11 2002), Operating Systems Review, ACM Press, pp. 285-298.
- COX, L. P., AND NOBLE, B. D. Samsara: honor among thieves in peer-to-peer storage. In Proceedings of the nineteenth ACM symposium on Operating systems principles (New York, Oct. 19- 22 2003), vol. 37, 5 of Operating Systems Review, ACM Press, pp. 120-132.
- CROSBY, S. A. Efficient Tamper-Evident Data Structures for Untrusted Servers. Ph.d., RICE UNIVERSITY, Dec. 2009.
- CROSBY, S. A., AND WALLACH, D. S. Super-efficient aggre- gating history-independent persistent authenticated dictionaries. In ESORICS (2009), M. Backes and P. Ning, Eds., vol. 5789 of Lecture Notes in Computer Science, Springer, pp. 671-688.
- DABEK, F., KAASHOEK, M. F., KARGER, D. R., MORRIS, R., AND STOICA, I. Wide-area cooperative storage with CFS. In SOSP (2001), pp. 202-215.
- DRISCOLL, J. R., SARNAK, N., SLEATOR, D. D., AND TAR- JAN, R. E. Making data structures persistent. In ACM Sympo- sium on Theory of Computing (STOC '86) (Baltimore, USA, May 1986), ACM Press, pp. 109-121.
- ETSI. Etsi ts 101 903: Xml advanced electronic signatures (xades), 2009.
- GONDROM, T., BRANDNER, R., AND PORDESCH, U. Evidence Record Syntax (ERS). RFC 4998 (Proposed Standard), Aug. 2007.
- GOODSON, G. R., WYLIE, J. J., GANGER, G. R., AND RE- ITER, M. K. Efficient byzantine-tolerant erasure-coded storage. In DSN (2004), IEEE Computer Society, pp. 135-144.
- HABER, S., AND KAMAT, P. A content integrity service for long- term digital archives. Tech. Rep. HPL-2006-54, Hewlett Packard Laboratories, May 18 2006.
- HAEBERLEN, A., MISLOVE, A., AND DRUSCHEL, P. Glacier: Highly durable, decentralized storage despite massive correlated failures. In NSDI (2005), USENIX.
- HAMMING, R. W. Error detecting and error correcting codes. Bell System Technical J. 29 (Apr. 1950), 147.
- HARRISON. Implementation of the substring test by hashing. CACM: Communications of the ACM 14 (1971).
- HASAN, R., SION, R., AND WINSLETT, M. The case of the fake picasso: Preventing history forgery with secure provenance. In FAST (2009), vol. 9, pp. 1-14.
- KARP, AND RABIN. Efficient randomized pattern-matching al- gorithms. IBMJRD: IBM Journal of Research and Development 31 (1987).
- KOOPMAN, P. 32-bit cyclic redundancy codes for internet appli- cations. In DSN (2002), IEEE Computer Society, pp. 459-472.
- KUBIATOWICZ, J., BINDEL, D., CHEN, Y., CZERWINSKI, S. E., EATON, P. R., GEELS, D., GUMMADI, R., RHEA, S. C., WEATHERSPOON, H., WEIMER, W., WELLS, C., AND ZHAO, B. Y. Oceanstore: An architecture for global-scale persistent storage. In ASPLOS (2000), L. Rudolph and A. Gupta, Eds., ACM Press, pp. 190-201.
- LILLIBRIDGE, M., ELNIKETY, S., BIRRELL, A., BURROWS, M., AND ISARD, M. A cooperative internet backup scheme. In USENIX Annual Technical Conference, General Track (2003), USENIX, pp. 29-41.
- MANIATIS, P., AND BAKER, M. Secure history preserva- tion through timeline entanglement. In Proceedings of the 11th USENIX Security Symposium (SECURITY-02) (Berkeley, CA, USA, Aug. 5-9 2002), USENIX Association, pp. 297-314.
- MANIATIS, P., ROSENTHAL, D. S. H., ROUSSOPOULOS, M., BAKER, M., GIULI, T., AND MULIADI, Y. Preserving peer replicas by rate-limited sampled voting. In Proceedings of the nineteenth ACM symposium on Operating systems principles (New York, Oct. 19-22 2003), vol. 37, 5 of Operating Systems Review, ACM Press, pp. 44-59.
- MENEZES, A. J., VAN OORSCHOT, P. C., AND VANSTON, S. A., Eds. Handbook of Applied Cryptography. CRC Press, 1996.
- MERKLE, R. A digital signature based on a conventional encryp- tion function. In CRYPTO (1987).
- MUNISWAMY-REDDY, K.-K., MACKO, P., AND SELTZER, M. I. Provenance for the cloud. In FAST (2010), vol. 10, pp. 15- 14.
- NAOR, M., AND YUNG, M. Universal one-way hash functions and their cryptographic applications. In Proceedings of the 21st Annual Symposium on Theory of Computing (STOC '89) (New York, May 1989), ACM Association for Computing Machinery, pp. 33-43.
- PARK, A., AND BALASUBRAMANIAN, K. Providing fault toler- ance in parallel secondary storage systems. Tech. Rep. CS-TR- 057-86, Department of Computer Science, Princeton University, Nov. 1986.
- PATTERSON, D. A., GIBSON, G., AND KATZ, R. H. A case for redundant arrays of inexpensive disks (RAID). In Proceed- ings of the 1988 ACM SIGMOD International Conference on Management of Data (Washington, DC, USA, May 26-28 1988), pp. 109-116.
- PETERSON, W. W., AND WELDON, E. J. Error-Correcting Codes. MIT Press, 1972.
- QUINLAN, S., AND DORWARD, S. Venti: A new approach to archival data storage. In Proceedings of the FAST '02 Conference on File and Storage Technologies (FAST-02) (Berkeley, CA, Jan. 28-30 2002), USENIX Association, pp. 89-102.
- RABIN, M. O. Fingerprinting by Random Polynomials. Cen- ter for Research in Computing Technology, Harvard University, 1981.
- SCHWARZ, T. S. J., AND MILLER, E. L. Store, forget, and check: Using algebraic signatures to check remotely adminis- tered storage. In 26th IEEE International Conference on Dis- tributed Computing Systems (26th ICDCS'06) (Lisboa, Portugal, July 2006), IEEE Computer Society, p. 12.
- SEIDEL, AND ARAGON. Randomized search trees. ALGRTH- MICA: Algorithmica 16 (1996).
- SHAH, M. A., SWAMINATHAN, R., AND BAKER, M. Privacy- preserving audit and extraction of digital contents. IACR Cryp- tology ePrint Archive 2008 (2008), 186.
- SONG, S., AND J ÁJ Á, J. New techniques for ensuring the long term integrity of digital archives. In DG.O (2007), J. B. Cushing and T. A. Pardo, Eds., vol. 228 of ACM International Conference Proceeding Series, Digital Government Research Center, pp. 57- 65.
- SUBBIAH, A., AND BLOUGH, D. M. An approach for fault tol- erant and secure data storage in collaborative work environments. In StorageSS (2005), V. Atluri, P. Samarati, W. Yurcik, L. Brum- baugh, and Y. Zhou, Eds., ACM, pp. 84-93.
- WALDMAN, M., RUBIN, A. D., AND CRANOR, L. F. Pub- lius: a robust, tamper-evident, censorship-resistant Web publish- ing system. In Proceedings of the Ninth USENIX Security Sympo- sium, August 14-17, 2000, Denver, Colorado (pub-USENIX:adr, 2000), USENIX, Ed., USENIX.
- ZHANG, Y., RAJIMWALE, A., ARPACI-DUSSEAU, A. C., AND ARPACI-DUSSEAU, R. H. End-to-end data integrity for file sys- tems: A ZFS case study. In FAST (2010), R. C. Burns and K. Kee- ton, Eds., USENIX, pp. 29-42.
- ZLOTNICK, F. ZFS: The last word in file systems. In The Con- ference on High Speed Computing (Salishan Lodge, Gleneden Beach, Oregon, Apr. 2006), LANL/LLNL/SNL, p. 25.
- ZOBEL, J., MOFFAT, A., AND SACKS-DAVIS, R. Storage man- agement for files of dynamic records. In Australian Database Conference (1993), pp. 26-38.