An ontology-based semantic extraction approach for B2C ecommerce
2011, The International Arab Journal of Information Technology
Abstract
Although varieties of investigations have been done on human semantic interactions with Web resources, no advanced and considerable progresses have been achieved. It could be said that comparative shopping systems are the last generations of B2C eCommerce systems that connect to multiple online stores and collect the information requested by the user. In some cases, the information is extracted from the online store sites through keyword search and other means of textual analysis. These processes make use of assumptions about the proximity of certain pieces of information. These heuristic approaches are error-prone and are not always guaranteed to work. In this paper, we propose an ontology-based approach to extract the products' information and the vendors' price from their public Web sites' pages. Although most vendors on the Web present their products' information in HTML documents that are not semantic formats. However, our approach is based on understanding semantics of HTML documents and extracting the information automatically.
References (22)
- Angele J., Monch E., Oppermann H., Staab S., and Wenke D., "Ontology-based Query and Answering in Chemistry: Ontonova@Project Halo," in Proceedings of the 2 nd International Semantic Web Conference, Berlin, 2003.
- Arasu A. and Garcia-Molina H., "Extracting Structured Data from Web Pages," in Proceedings of the ACM SIGMOD International Conference on Management of Data, California, 2003.
- Chung H., Song Y., Han K., Kim S., Yoon D., Lee J., and Rim H., "A Practical QA System in Restricted Domains," in Proceedings of the ACL Workshop on Question Answering in Restricted Domains, Spain, pp. 566-568, 2004.
- Clark P., Thompson J., and Porter B., "A Knowledge-based Approach to Question Answering," in Proceedings of AAAI'99 Fall Symposium on Question-Answering Systems, pp. 43-51, 1999.
- Cluet S., Delobel C., Siméon J., and Smaga K., "Your Mediators Need Data Conversion!," in Proceedings of ACM SIGMOD, pp. 177-188, 1998.
- Crescenzi V., Mecca G., and Merialdo P., "RoadRunner: Towards Automatic Data Extraction from Large Web Sites," in Proceedings of the International Conference on Very Large Data Bases, pp. 109-118, Italy, 2001.
- Darrudi E., Rahgozar M., and Oroumchian F., "Human Plausible Reasoning for Question Answering Systems," in Proceeding of Advances in Intelligent Systems Theory and Applications, Luxembourg, 2004.
- Geng J. and Yang J., "AUTOBIB: Automating the Extraction of Bibliographic Information on the Web," Computer Journal of International Database Engineering and Application
- Symposium, vol. 3, no. 2, pp.155-157, 2004.
- Golgher B., Laender F., Da S., and Ribeiro-Neto A., "An Example-based Environment for Wrapper Generation," in Proceeding of the 2 nd International Workshop on the World Wide Web and Conceptual Modeling, pp. 152-164, 2000.
- Hammer J., Garcia-Molina H., Cho J., Crespo A., and Aranha R., "Extracting Semistructured Information from the Web," in Proceeding of the Workshop on Management of Semistructured Data, pp. 18-25, 1997.
- Kalakota R. and Whinston B., Electronic Commerce, A Manager's Guide, Addison Wesley Professional, 1997.
- Knoblock A., Lerman K., Minton S., and Muslea I., "Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach," Computer Journal of IEEE Data Engineering Bulletin, vol. 23, no. 4, pp. 33-41, 2000.
- Kushmerick N., "Wrapper Induction: Efficiency and Expressiveness," Computer Journal of Artificial Intelligence, vol. 118, no. 1, pp. 15-68, 2000.
- Kushmerick N., Weld S., and Doorenbos B., "Wrapper Induction for Information Extraction," in Proceeding of the International Joint Conference on Artificial Intelligence, Japan, pp. 729-737, 1997.
- Liu L., Pu C., and Han W., "XWRAP: An XML Enabled Wrapper Construction System for Web Information Sources," in Proceedings International Conference on Data Engineering, California, pp. 22-26, 2000.
- Price Comparisons, Product Reviews in NexTag, http://www.nextag.com, Last Visited 2009.
- Price Comparisons, Product Reviews in Pricescan, http://www.pricescan.com, Last Visited 2009.
- Ribeiro-Neto A., Laender F., and Da Silva S., "Topdown Extraction of Semi-Structured Data," in Proceeding of the 6 th Symphony on String Processing and Information Retrieval, Mexico, pp. 176-183, 1999.
- Sahuguet A. and Azavant F., "Looking at the Web through XML Glasses," in Proceeding of the 4 th IFCIS International Conference on Cooperative Information Systems, pp. 148-159, 1999.
- Seo H., Yang J., and Choi J., "Knowledge-based Wrapper Generation by Using XML," In IJCAI- Workshop on Adaptive Text Extraction and Mining (ATEM2001), Seattle, USA, pp.1-8, 2001.
- Structure Checker Program, Tidy Tool for HTML Correction, http://www .w3.org /People/ Raggett/tidy, Last Visited 2009.