Information Retrieval on the World Wide Web
1997, IEEE Internet Computing
Abstract
T he World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months.
References (23)
- G. Salton, Automatic Text Processing, Addison-Wesley, Reading, Mass., 1989.
- W.B. Croft, "Experiments with Representation in a Document Retrieval System," Information Technology, Vol. 2, No. 1, 1983, pp.1-21.
- C.E. Shannon, "Prediction and Entropy in Printed English," Bell Systems J., Vol. 30, No. 1, 1951, pp. 50-65.
- S. E. Robertson and K. Sparck-Jones, "Relevance Weighting of Search Terms," J. Am. Soc. of Information Sciences, 1976, pp.129-146.
- G.A. Miller, "WordNet: A Lexicon Database for English," Comm. ACM, Vol. 38, No. 11, Nov. 1995, pp. 39-41.
- G.S. Jung and V.V. Raghavan, "Connectionist Learning in Constructing Thesaurus-like Knowledge Structure," Working Notes of AAAI Symp. on Text-Based Intelligent Systems, Mar. 1990, Palo Alto, Calif., pp. 123-127. REFERENCES
- W.B. Frakes and R. Baeza-Yates, eds., Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs, N.J., 1992.
- G. Salton, Automatic Text Processing, Addison-Wesley, Reading, Mass., 1989.
- T. Radecki, "Fuzzy Set Theoretical Approach to Document Retrieval," Information Processing and Management, Vol. 15, 1979, pp. 247-259.
- V. Raghavan and S.K.M. Wong, "A Critical Analysis of Vector Space Model for Information Retrieval," J. Am. Soc. Information Science, Vol. 37, No. 5, 1986, pp. 279-287.
- O. Etzioni and D. Weld, "Intelligent Agents on the Internet: Fact, Fiction, and Forecast," IEEE Expert, Vol. 10, No. 4, 1995, pp. 44-49.
- O. Etzioni, "The World-Wide Web: Quagmire or Gold Mine?" Comm. ACM, Vol. 39, No. 11, Nov. 1996, pp. 65-68.
- D. Harman, "Relevance Feedback Revisited," Proc. 15th Ann. Int'l ACM SIGIR Conf., ACM Press, New York, 1992, pp. 1-10.
- J. Bhuyan et al., "An Adaptive Information Retrieval System Based on User-Oriented Clustering," submitted to ACM Transactions on Information Systems, Jan. 1997.
- D.E. O'Leary, "The Internet, Intranets, and the AI Renaissance," Computer, Vol. 30, No. 1, Jan. 1997, pp. 71-78.
- A. Farquhan et al., "Collaborative Ontology Construction for Information Integration," Tech. Report: KSL-95-63, Knowledge Systems Laboratory, Dept. of Computer Science, Stanford Univ., Stanford, Calif., Aug. 1995.
- M.F. McTear, "User Modeling for Adaptive Computer Systems: A Survey of Recent Developments," Artificial Intelligence Review, Vol. 7, 1993, pp. 157-184.
- V. Gudivada and S. Tolety, "A Multiagent Architecture for Information Retrieval on the World-Wide Web," to appear in Proc. Fifth RIAO Conf. Computer Assisted Information Searching on the Internet, Centre de Hautes Etudes Internationales d'Informatique Documentaires, Paris, 1997.
- B. Schatz et al., "Federating Diverse Collections of Scientific Literature," Computer, Vol. 29, No. 5, May 1996, pp. 28-36.
- Venkat N. Gudivada is a senior database designer at Dow Jones Markets. His research interests are in multimedia information retrieval and heterogeneous distributed database management. He received his PhD in computer science from the University of Southwestern Louisiana in 1993.
- Vijay V. Raghavan is a distinguished professor of computer science at the University of Southwestern Louisiana. His research focuses on information retrieval strategies for text and image databases. He received a B. Tech in mechanical engineering from the Indian Institute of Technology, Madras; an MBA from McMaster University; and a PhD in computing science from the University of Alberta. Raghavan is currently an ACM National Lecturer. He is a member of the ACM and the IEEE.
- William I. Grosky is professor and chair of the Computer Science Depart- ment at Wayne State University in Detroit, Michigan. His research interests include multimedia information systems, hypermedia, and Web technology. He received a BS in mathematics from MIT in 1965, MS in applied mathematics from Brown University in 1968, and PhD in engineering and applied science from Yale University in 1971. Grosky is currently on the editorial boards of IEEE MultiMedia, Pattern Recognition, and the Journal of Database Management.
- Rajesh Kasanagottu is a graduate student at the University of Missouri, Rolla. He received his bachelor's degree in computer science from Osmania University, India. His research interests include software agents, search engines, and information retrieval. Readers may contact Raghavan at the Center for Advanced Computer Studies, University of Southwestern Louisiana, Lafayette, LA 70504, USA; raghavan@cacs.usl.edu.