Academia.eduAcademia.edu

Outline

Index-based persistent document identifiers

2005

https://doi.org/10.1023/B:INRT.0000048494.05013.6A

Abstract

The infrastructure of a typical search engine can be used to calculate and resolve persistent document identifiers: a string that can uniquely identify and locate a document on the Internet without reference to its original location (URL). Bookmarking a document using such an identifier allows its retrieval even if the document's URL, and, in many cases, its contents change. Web client applications can offer facilities for users to bookmark a page by reference to a search engine and the persistent identifier instead of the original URL.

References (33)

  1. Ashman, H., Electronic Document Addressing: Dealing with Change, ACM Comput- ing Surveys, 32(3), 201-212 (2000).
  2. Barabási, A.-L., Albert, R., & Jeong, H., Scale-free characteristics of random net- works: the topology of the world-wide web, Physica, A(281), 69-77 (2000).
  3. Berners-Lee, T., Masinter, L., & McCahill, M., RFC 1738: Uniform Resource Loca- tors (URL), 1994 (Dec.). Updated by RFC1808, RFC2368 (Fielding, 1995; Hoff- man et al., 1998). Status: PROPOSED STANDARD.
  4. Brin, S., & Page, L., The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks, 30(1-7), 107-117 (1998), Seventh International World Wide Web Conference Proceedings (WWW7).
  5. Cerny, V., Thermodynamical Approach to the Traveling Salesman Problem: an Ef- ficient Simulation Algorithm, Journal of Optimization Theory and Applications, 45, 41-51 (1985).
  6. Chankhunthod, A., Danzig, P. B., Neerdaels, C., Schwartz, M. F., & Worrell, K. J., A Hierarchical Internet Object Cache, in USENIX Technical Conference Proceed- ings, Usenix Association, Berkeley, CA, 1996.
  7. Fielding, R., RFC 1808: Relative Uniform Resource Locators, 1995 (June). Updates RFC1738 (Berners-Lee et al., 1994). Updated by RFC2368 (Hoffman et al., 1998). Status: PROPOSED STANDARD.
  8. Forrest, S., Genetic Algorithms, ACM Computing Surveys, 28(1), 77-83 (1996).
  9. Garey, M. R., & Johnson, D. S., Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, 1979.
  10. Glover, F., Tabu Search -Part I, ORSA Journal on Computing, 1, 190-206 (1990).
  11. Goldberg, D. E., Genetic Algorithms: In Search of Optimization & Machine Learning, Addison-Wesley, 1989.
  12. Goldberg, D. E., Genetic and Evolutionary Algorithms Come of Age, Communications of the ACM, 37(3), 113-119 (1994).
  13. Grefenstette, J. J., Optimization of Control Parameters for Genetic Algorithms, IEEE Transactions on Systems, Man, and Cybernetics, 16(1), 122-128 (1986).
  14. Hitchcock, S., Carr, L., Harris, S., Hey, J. M. N., & Hall, W., Citation linking: im- proving access to online journals, pages 115-122 of Proceedings of the 2nd ACM international conference on Digital libraries, 1999.
  15. Hoffman, P., Masinter, L., & Zawinski, J., RFC 2368: The mailto URL scheme, 1998 (July). Updates RFC1738, RFC1808 (Berners-Lee et al., 1994; Fielding, 1995). Status: PROPOSED STANDARD.
  16. Holland, J. H., Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Michigan, 1975.
  17. Karr, C. L., Genetic Algorithms for Modelling, Design, and Process Control, pages 233-238 of CIKM '93. Proceedings of the Second International Conference on Information and Knowledge Management, ACM, 1993.
  18. Knuth, D. E., The Art of Computer Programming, second edn., Vol. 2: Seminumerical Algorithms, Addison-Wesley, Reading, MA, 1981.
  19. Koulamas, C., Antony, S. R., & Jaen, R., A Survey of Simulated Annealing Applica- tions to Operations Research Problems, Omega International Journal of Manage- ment Science, 22(1), 41-56 (1994).
  20. Lawrence, S., & Giles, C. L., Searching the Web: General and Scientific Information Access, IEEE Communications, 37(1), 116-122 (1999).
  21. Lawrence, S., Giles, C. L., & Bollacker, K., Digital Libraries and Autonomous Citation Indexing, IEEE Computer, 32(6), 67-71 (1999).
  22. Lawrence, S., Pennock, D. M., Flake, G. W., Coetzee, F. M., Glover, E., Nielsen, F. Å., Kruger, A., & Giles, C. L., Persistence of Web References in Scientific Research, IEEE Computer, 34(2), 26-31 (2001).
  23. Moffat, A., Economical Inversion of Large Text Files, Computing Systems, 5(2), 125- 139 (1992).
  24. Park, S.-T., Pennock, D., Giles, L., & Krovetz, R., Analysis of Lexical Signatures for Finding Lost or Related Documents, pages 11-18 of Proceedings of the 25th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, New York: ACM Press, for ACM, 2002.
  25. Phelps, T. A., & Wilensky, R., Robust Hyperlinks: Cheap, Everywhere, Now, in Pro- ceedings of Digital Documents and Electronic Publishing (DDEP00), 2000.
  26. Pitkow, J. E., Summary of WWW Characterizations, World Wide Web, 2(1-2), 3-13 (1999).
  27. Schneier, B., Applied Cryptography, second edn., Wiley, New York, 1996.
  28. Spinellis, D., The Design and Implementation of a Legal Text Database, pages 339- 348 of Karagiannis, D. (ed), DEXA 94: 5th International Conference on Database and Expert Systems Applications, Springer-Verlag, 1994. Lecture Notes in Com- puter Science 856.
  29. Spinellis, D., The Decay and Failures of Web References, Communications of the ACM, 46(1), 71-77 (2003).
  30. Takeda, M. K. K., Information Retrieval on the Web, ACM Computing Surveys, 32(2), 144-173 (2000).
  31. Van Laarhoven, P. J. M., & Aarts, E. H. L., Simulated Annealing: Theory and Appli- cations, D. Reidel, Dordrecht, The Nethelands, 1987.
  32. Wagner, M., Google Defies Dot-com Downturn, TechWeb, Apr. (2001), Online http://www.techweb.com/wire/story/TWB20010427S0011 (current June 2002).
  33. Zobel, J., Heinz, S., & Williams, H. E., In-memory Hash Tables for Accumulating Text Vocabularies, Information Processing Letters, 80(6), 271-277 (2001).