Academia.eduAcademia.edu

Outline

Privacy preserving C4.5 using Gini index

2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science

https://doi.org/10.1109/NCETACS.2011.5751385

Abstract

Now-a-days privacy has become a major concern; the goals of security like confidentiality, integrity and availability do not ensure privacy. Data mining is a threat to privacy. Researchers today focus on how to ensure privacy while performing data mining task. As Data mining algorithms are typically complex and furthermore the input usually consists of massive data sets, the generic protocols in such a case are of no practical use and therefore more efficient protocols are required. This paper focus on the problem of decision tree learning with the popular C4.5 algorithm.C4.5, an extension of ID3 is a very popular decision tree building method in data mining. Entropy and Gini index are two different criteria used in ID3.While there is quite little work in privacy preserving ID3 using entropy and not much has been done for Gini index. This paper propose modified protocols based on secure multiparty computation for privacy preserving C4.5 using Gini index over distributed partitioned data, where the protocols do not require any third party server. However, some communication overhead is necessary so that the parties can carry out the secure protocols. The result like ROC(Receiver Operating characteristic) graph and detail accuracy through cost counting index is shown.

References (17)

  1. Han and Morgan Kamber. Data mining: Concepts and Techniques. Morgan Kaufmann Series in Data Management Systems. Kaufmann Publishers, 2 nd edition, March 2006. ISBN :81-8147-049-4.
  2. Thair Nu Phyu. Survey of Classification Techniques in Data Mining. Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS), VolI:pages 232-236, March 18 -20 2009. Hong Kong.
  3. http://www.cs.waikato.ac.nz/~ml/weka/index_documentation.html.W eka tutorial of .arrf and .csv ¯le format[online ]available.
  4. Y.Lindell and B.Pinkas. Privacy Preserving Data Mining. In CRYPTO, vol-1880:pages 36-54, 2000.
  5. Agrawal R.and Srikant R. Privacy-Preserving Data Mining. In:. The 2000ACM SIGMOD Conference on Management of Data, pages 439- 450, 2000.Dallas,TX.
  6. Bertino E. Fovino I.N. Provenza L.P. Saygin Y. Theodoridis Y.: Verykios,V.S. State-of-the-art in privacy preserving data mining. ACM SIGMODRecord, (33(1)):pages 50-57, 2004.
  7. Du W. and Z.: Zhan. Building Decision Tree Classifier on Private Data.In:. The 2002 IEEE International Conference on Privacy, Security and Data Mining, pages 1-8, Australia 2002. (2002).
  8. J. Vaidya and C. Clifton. Privacy-Preserving Decision Trees over Vertically Partitioned Data. In Data and Application Security (DBSec), pages 139-152,2005.
  9. Y.-L. Luo M.-J.Xiao, L.-S.Huang and H.Shen. Privacy Preserving ID3 Algorithm over Horizontally Partitioned Data. In Parallel and Distributed Computing, Application and Technologies, pages 239- 243, 2005.
  10. David Heckerman. Bayesian networks for data mining. Data Min. Knowl. Discov., 1(1):pages 79-119, 1997.
  11. Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang,Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu,Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. Top10 algorithms in data mining. Knowl. Inf. Syst., 14(1):pages 1-37, 2007.
  12. Yanguang Shen, Hui Shao, and Jianzhong Huang. Research on Privacy Pre-serving Distributed C4. 5 Algorithm. In IITAW '09: Proceedings of the 2009Third International Symposium on Intelligent Information Technology Application Workshops, pages pages 216- 218, 2009. Washington, DC, USA.
  13. Pinkas B.:. Cryptographic Techniques for Privacy-Preserving Data Mining.ACM SIGKDD Explorations, page 4(2), 2002.
  14. W. Du and Z. Zhan. Building Decision Tree Classifier on Private Data.In CRPITS14:Proceedings of the IEEE international conference on Privacy.
  15. Ottawa Samet, S. Miri A. Univ. of Ottawa. Privacy Preserving ID3 using Gini Index over Horizontally Partitioned Data. Appear Computer Systems and Applications, 2008. AICCSA 2008. IEEE/ACS International Conference, Vol I:pages 645-651, April4 2008. Doha,ISBN: 978-1-4244-1967-8.
  16. M.Naor and Pinkas. Efficient Oblivious Transfer And Polynomial Evalution.In STOC'99: Proceeding of the thirty first annual ACM Symposium on theoryof computing,ACM Press, pages 245-254, 1999. New York,NY,USA.
  17. E. Suthampan and S. Maneewongvatana. Privacy Preserving Decision Tree inMulti Party Environment. In Asia Information Retrieval Symposium (AIRS) pages, 727-732, 2005.