AROOOGA: An Audio Search Engine for theWorldWideWeb
2004, International Computer Music Conference
Abstract
Existing search engines use web crawlers to gather web pages. The extracted information is used to build indexes, which are later used to answer user queries. This approach is useful for general queries, but ignores the special properties of sound files, making it difficult to accurately locate specific sound files on the web. AROOOGA, or the Articulated Resource for Obsequious Opinionated Observations into Gathered Audio, is a web crawling system designed specifically to find and analyze audio resources on the web. The AROOOGA web crawler uses both audio information and the associated web pages to produce higher-quality search indexes for music information retrieval. Information about sound files on the web is discussed, and some preliminary search results are included.
References (13)
- Anonymous (2004). Gnu wget. http://www.gnu.org/ software/wget/wget.html.
- Boldi, P., B. Codenouti, M. Santini, and S. Vigna (2002). Ub- icrawler: A scalable fully distributed web crawler. In Pro- ceedings of the Eighth Australian World Wide Web Confer- ence, pp. n.p.
- Brin, S. and L. Page (1998). The anatomy of a large-scale hyper- textual web search engine. Proceedings of the 7th Interna- tional World Wide Web Conference, 107-17.
- Burke, R. (2001). Salticus: guided crawling for personal digital libraries. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 88-89.
- Chakrabarti, S., B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan (1998). Automatic resource list compila- tion by analyzing hyperlink structure and associated text. In Proceedings of the 7th International World Wide Web Con- ference. n.p.
- Davison, B. D. (2000). Topical locality in the web. In Proceed- ings of the 23rd Annual Conference on Research and Devel- opment in Information Retrieval, pp. 272-79.
- Gray, M. (1995). Measuring the growth of the web: 1993 to 1995. http://www.mit.edu/people/mkgray/ growth/.
- Heydon, A. and M. Najork (1999). Mercator: A scalable, exten- sible web crawler. World Wide Web 2(4), 219-29.
- Rowe, N. (2002). Marie-4: a high-recall, self-improving web crawler that finds images using captions. Intelligent Systems, IEEE 17(4), 8-14.
- Sclaroff, S., L. Taycher, and M. La-Cascia (1997). Imagerover: a content-based image browser for the world wide web. In IEEE Workshop on Content-based Access of Image and Video Libraries, pp. 2-9.
- Shkapenyuk, V. and T. Suel (2002). Design and implementation of a high-performance distributed web crawler. In Proceed- ings of the 18th International Conference on Data Engineer- ing, pp. 249-54.
- Tzanetakis, G. and P. Cook (2000). MARSYAS: A framework for audio analysis. Organized Sound 4(3), 169-75.
- Vinet, H., P. Herrera, and F. Pachet (2002). The Cuidado project: New applications based on audio and music content descrip- tion. In Proceedings of the International Computer Music Conference, pp. 450-454. ICMA.