A methodology for semi-automatic classification schema building
2009, Arxiv preprint arXiv: …
Abstract
This paper describe a methodology for semi-automatic classification schema definition (a classification schema is a taxonomy of categories useful for automatic document classification). The methodology is based on: (i) an extensional approach useful to create a typology starting from a document base, and (ii) an intensional approach to build the classification schema starting from the typology. The extensional approach uses clustering techniques to group together documents on the basis of a similarity measure, whereas the intensional approach uses different operations (aggregation, reduction, generalization specialization) to define classes.
References (57)
- ADANSON, M., Familles des plantes, Paris 1763.
- ALBERTUS MAGNUS, De Vegetabilis, Köln 1250.
- APTER, D., The Politics of Modernization, Chicago 1965.
- BECKER, H. S., Constructive Typology in the Social Sciences, in "American Sociological Review" 1940, V, 1, pp. 40-45.
- BERGER, J. E ZELDITCH, M., Review of T. Parsons' Sociological Theory and Modern Society, in "American Sociological Review", 1968, XXXIII, 3, pp. 446- 450.
- BLALOCK, H. M., Social Statistics, New York 1960 (tr. it.: Statistica per la ricerca sociale, Bologna 1970).
- CAIN, A. J., Classification: Biological, in The New Encyclopedia Britannica, vol. IV, London 1974, pp. 683-691.
- CAPECCHI, V., I modelli di classificazione e l'analisi della struttura latente, in "Quaderni di Sociologia", 1964, XIII, 3, pp. 289-340.
- CAPECCHI, V. e MOELLER, F., Some Applications of Entropy to the Problems of Classification, in "Quality & Quantity" 1968, II, 1-2, pp. 63-84.
- CAVALLI, A., La funzione dei tipi ideali e il rapporto fra conoscenza storica e sociologica, in ROSSI, P. (ed.), Max Weber e l'analisi del mondo moderno, Torino 1981, pp. 27-52.
- COHEN, M. R. e NAGEL, E., An Introduction to Logic and Scientific Method, New York 1934.
- COLLINS, R., Conflict Sociology: Toward an Explanatory Science, New York 1975.
- DUFRENOY, P. A., Traité de minéralogie, Paris 1845.
- DURKHEIM, E. e MAUSS, M., De quelques formes primitives de classification, in "L'année sociologique", 1902, VI, pp. 1-71 (tr. ingl.: Primitive Classification, London 1963).
- FOX, J., Selective Aspects of Measuring Resemblance for Taxonomy, in HUDSON, H. C.
- ed.), Classifying Social Data, San Francisco 1982, pp. 127-151.
- GIL, F., Sistematica e classificazione, in Enciclopedia Einaudi, vol. VIII, Torino 1981, pp. 1024-1044.
- GILMOUR, J. S. L., Taxonomy and Philosophy, in HUXLEY, J. (ed.), The New Systematics, Oxford 1940.
- GLASER, B. G., Theoretical Sensitivity: Advances in the Methodology of Grounded Theory, Mill Valley 1978.
- GREENBERG, J. M., The Nature and Uses of Linguistic Typologies, in "International Journal of American Linguistics" 1957, XXIII, 2, pp. 68-72.
- HEMPEL, C. G., Typological Methods in the Natural and the Social Sciences, in "Proceedings of the American Philosophical Association", 1952, pp. 65-86.
- HEMPEL, C. G., Fundamentals of Taxonomy, in HEMPEL, C. G., Aspects of Scientific Explanation, Glencoe 1965, pp. 137-154 (tr. in it. Aspetti della spiegazione scientifica, Milano 1965).
- HEMPEL, C. G. e OPPENHEIM, P., Der Typusbegriff im Lichte der neuen Logik, Leyden 1936.
- HENNIG, W., Phylogenetic Systematics, Urbana 1979.
- HUDSON, H. C., Cluster and Factor Analysis of Cultural Data from Continuous Geographical Areas, in HUDSON, H. C. (ed.), Classifying Social Data, San Francisco 1982, pp. 56-83.
- HUXLEY, J. (ed.), The New Systematics, Oxford 1940.
- KAPLAN, A., The Conduct of Inquiry, San Francisco 1964.
- KOERNER, S., Classification Theory, in The New Encyclopedia Britannica, vol. IV, London 1974, pp. 691-694.
- LAZARSFELD, P. F., Some Remarks on the Typological Procedures in Social Research, in "Zeitschrift für Sozialforschung" 1937, VI, pp. 119-139.
- LAZARSFELD, P. F. E BARTON, A. H., Qualitative Measurement in the Social Sciences: Classifications, Typologies, and Indices, in LERNER, D. e LASSWELL H. D. (eds.), The Policy Sciences, Stanford 1951, pp. 155-192.
- LENZEL, V. F., Procedures of Empirical Science, Chicago 1938.
- LEVY-BRUHL, L., Les fonctions mentales dans les sociétés inférieures, Paris 1910.
- LINNAEUS, C., Systema naturae, Stockholm 1735.
- LUNDBERG, G. A., The Concept of Law in the Social Sciences, in "Philosophy of Science", 1938, V, 2, pp. 189-203.
- MALINOWSKI, B., A Scientific Theory of Culture and Other Essays, Chapel Hill 1944.
- MAY, R. W., Discriminant Analysis in Cluster Analysis, in HUDSON, H.
- C. (ed.), Classifying Social Data, San Francisco 1982, pp. 39-55.
- McKINNEY, J. C., Constructive Typology and Social Theory, New York 1966.
- NOWAK, S., Understanding and Prediction, Dordrecht 1976.
- PIAGET, J. e INHELDER, B., La genèse des structures logiques élémentaires chez l'enfant: classifications et sériations, Neuchâtel 1959.
- RADFORD, A. E. E ALTRI, Vascular Plants Systematics, New York 1974.
- ROSSI, P., Introduzione, in WEBER, M., Il metodo delle scienze storico- sociali, Torino 1958, pp. 9-43.
- RUNCIMAN, W. G., A Critique of Max Weber's Philosophy of Social Science, Cambridge 1972.
- SANDRI, G., On the Logic of Classification, in "Quality & Quantity" 1969, III, 1-2, pp. 80-124.
- SARTORI, G., Concept Misformation in Comparative Politics, in "American Political Science Review" 1970, LXIV, 4, pp. 1033-53.
- SAUSSURE, F. de, Cours de linguistique générale, Paris 1916.
- SCHEFFLER, I., Science and Subjectivity, Indianapolis 1967 (tr. it.: Scienza e soggettività, Roma 1983).
- SCHELTING, A. von, Die logische Theorie der historischen Kulturwissenschaft von Max Weber und im besonderen sein Begriff des ideal Types, in "Archiv für Sozialwissenschaft und Sozialpolitik" 1922, XLIX, pp. 725- 752.
- SCHLEGEL, F. von, Ueber die Sprache und Weisheit der Indier, Heidelberg 1808.
- SMELSER, N. J., Comparative Methods in the Social Sciences, Englewood Cliffs 1976 (tr. it.: La comparazione nelle scienze sociali, Bologna 1982).
- SNEATH, P. H. A., Some Thoughts on Bacterial Classification, in "Journal of General Microbiology", 1957, XVII.
- SOKAL, R. R., Distance as a Measure of Taxonomic Similarity, in "Systematic Zoology", 1958, X, 1: 70-79.
- SPENCER, H., The Principles of Sociology, London 1892.
- TIRYAKIAN, E. A., Typologies, in International Encyclopedia of the Social Sciences, vol. XVI, London & New York 1968, pp. 177-185.
- WATKINS, J. W. N., Ideal Types and Historical Explanation, in "British Journal for the Philosophy of Science" 1952, III, 1, pp. 22-43.
- WEBER, M., Die Objektivität sozialwissenschaftlicher und sozialpolitischer Erkenntnis, in "Archiv für Sozialwissenschaft und Sozialpolitik", 1904, XIX, pp. 22-87 (tr. it. L'oggettività conoscitiva della scienza sociale e della politica sociale, in WEBER, M., Il metodo delle scienze storico-sociali, Torino 1958, pp. 53-141).
- WEBER, M., Wirtschaft und Gesellschaft. Grundriss der verstehenden Soziologie, Tübingen 1922 (tr. it.: Economia e Società, Milano 1961).