Academia.eduAcademia.edu

Outline

Experiences in developing a spatio-temporal information system

2003, IN OFFICIAL STATISTICS

Abstract

The Italian National Statistics Institute is currently integrating its various legacy spatio-temporal data collections. The SIT-IN project has delivered a first release, whose development relied on web and relational technologies to manage data heterogeneity. The final system provides users with many different classes of functions with which to analyse and visualise territorial data. It can be viewed as a spatiotemporal data warehouse, where space and time are the main access dimensions to statistical data, but also where the ...

References (107)

  1. References
  2. McClean, S. I., Scotney, B. W. and Greer, K. C. R., 'A scalable approach to integrating heterogeneous aggregate views of distributed databases', accepted for IEEE trans. knowledge and data engineering, 2002.
  3. Sadreddini M. H., Bell, D. A. and McClean, S. I., 'Architectural considerations for providing statistical analysis of distributed data', Information and Software Technology, Vol. 32, 1990, pp. 459-469.
  4. Sadreddini, M. H., Bell, D. A. and McClean, S. I., 'A model for integration of raw data and aggregate views in heterogeneous statistical databases', Database Technology, Vol. 4, No 2, 1991, pp. 115-127.
  5. Sadreddini, M. H., Bell, D. A. and McClean, S. I., 'A framework for query optimisation in distributed statistical databases', Information and Software Technology, Vol. 34, No 6, 1992a, pp. 363-377.
  6. Sadreddini, M. H., Bell, D. A. and McClean, S. I., 'Providing statistical functionality in a distributed environment', Westlake, A., Banks, R., Payne, C. and Orchard, T. (eds), Survey and statistical computing, North Holland, 1992b, pp. 467-476.
  7. Scotney, B. W. and McClean, S. I., 'Using database technology to facilitate statistical analysis of distributed data', New techniques and technologies for statistics II, IOS Press, Amsterdam, 1997, pp. 203-213.
  8. Scotney, B. W., McClean, S. I. and Rodgers, M. C., 'Optimal and efficient integration of heterogeneous summary tables in a distributed database', The Journal of Data and Knowledge Engineering, Vol. 29, 1999, pp. 337-350.
  9. Scotney, B. W. and McClean, S. I., 'Efficient knowledge discovery through the integration of heterogeneous data', Information and Software Technology, Vol. 41, 1999, pp. 569-578.
  10. Sundgren, B., 'An information systems architecture for national and international statistical organisations', April 1997.
  11. Van Bracht, E., de Jonge, E. and Kaper, E., 'Cristal data objects -An object model for cubic, raw, or intermediate statistical data', Statistics Netherlands, March 2000.
  12. Van Bracht, E. and Sluis, W., 'Towards an international standard for multi-dimensional tables', Statistics Netherlands, June 2000.
  13. Vuscan, M., 'The application of data warehouse techniques in a statistical environment', seminar on integrated statistical information systems and related matters (ISI 2000), Riga, Latvia, May 2000.
  14. References
  15. Agrawal, R. and Srikant, R., 'Fast algorithms for mining association rules', Proceedings of the 20th VLDB conference, Santiago, Chile, 1994.
  16. Bergadano, F. and Gunetti, D., Inductive logic programming: from machine learning to software engineering, The MIT Press, Cambridge, MA, 1996.
  17. Bock, H. H. and Diday, E. (eds.), Analysis of symbolic data -Exploratory methods for extracting statistical information from complex data, Studies in classification, data analysis, and knowledge organisation series, Vol. 15, Springer- Verlag, Berlin, 2000.
  18. Ceri, S., Gottlob, G. and Tanca, L., 'What you always wanted to know about Datalog (and never dared to ask)', IEEE transactions on knowledge and data engineering, Vol. 1, No 1, 1989, pp. 146-166.
  19. Dehaspe, L. and De Raedt, L., 'Mining association rules in multiple relations', Lavrac, N and Dzeroski, S. (eds), Inductive logic programming, LNCS 1297, Springer-Verlag, Berlin, 1997, pp. 125-132.
  20. Dehaspe, L. and Toivonen, H., 'Discovery of frequent Datalog patterns', Data mining and knowledge discovery, Vol. 3, No 1, 1999, pp. 7-36.
  21. De Raedt L. and Dzeroski, S., 'First order jk-clausal theories are PAC-learnable' Artificial Intelligence, Vol. 70, 1994, pp. 375-392.
  22. De Raedt, L., Interactive theory revision, Academic Press, London, 1992.
  23. Dzeroski, S. and Lavrac, N. (eds), Relational data mining, Springer-Verlag, Berlin, 2001.
  24. Egenhofer, M. J. and Herring, J. R., 'Categorising binary topological relations between regions, lines, and points in geographic databases', Egenhofer, M. J., Mark, D. M. and Herring, J. R. (eds.), The nine intersection: formalism and its use for natural-language spatial predicates, 1994, pp. 183-271.
  25. Fayyad, U. M., Piatetsky-Shapiro, G. and Smyth, P., 'From data mining to knowledge discovery: an overview', Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (eds), Advances in knowledge discovery in databases, AAAI Press/The MIT Press, 1996, pp. 1-34.
  26. Garey, M. R. and Johnson, D. S., Computers and intractability, W. H. Freeman and Co., San Francisco, California, 1979.
  27. Güting, R. H., 'An introduction to spatial database systems', VLDB Journal, Vol. 3, No 4, 1994, pp. 357-399.
  28. Han, J. and Fu, Y., 'Discovery of multiple-level association rules from large databases', Dayal, U., Gray, P. M. D. and Nishio, S. (eds), VLDB'95 - Proceedings of the 21st international conference on very large databases, Morgan-Kaufmann, 1995, pp. 420-431.
  29. Han, J., Fu, Y., Wang, W., Chiang, J., Gong, W., Koperski, K., Li, D., Lu, Y., Rajan, A., Stefanovic, N., Xia, B. and Zajane, O. R., 'DBMiner: a system for mining knowledge in large relational databases', Proceedings of the 1996 international conference on data mining and knowledge discovery (KDD'96), Portland, Oregon, 1996, pp. 250-255.
  30. Han, J., Koperski, K., Stefanovic, N., 'GeoMiner: a system prototype for spatial data mining', Peckham, J. (ed.), Sigmod 1997 -Proceedings of the ACM- Sigmod international conference on management of data, Sigmod, Record 26, No 2, 1997, pp. 553-556.
  31. Helft, N., 'Inductive generalisation: a logical framework', Bratko, I. and Lavrac, N. (eds), Progress in machine learning, Sigma Press, 1987, pp. 149- 157.
  32. Koperski, K., Adhikary, J. and Han, J., 'Spatial data mining: progress and challenges', Proceedings of the workshop on research issues on data mining and knowledge discovery, Montreal, Canada, 1996.
  33. Lavrac, N. and Dzeroski, S., Inductive logic programming: techniques and applications, Ellis Horwood, Chichester, 1994.
  34. Lisi, F. and Malerba, D., 'Efficient discovery of multiple-level patterns', Atti del Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati SEBD 2002, 2002, pp. 237-250.
  35. Ludl, M.-C. and Widmer, G., 'Relative unsupervised discretisation for association rule mining', Zighed, D. A., Komorowski, H. J. and Zytkow, J. M. (eds), Principles of data mining and knowledge discovery, LNCS 1910, Springer- Verlag, 2000, pp. 148-158.
  36. Malerba, D., Esposito, F., Lanza, A. and Lisi, F. A., 'Machine learning for information extraction from topographic maps', Miller, H. J. and Han, J. (eds), Geographic data mining and knowledge discovery, Taylor and Francis, London, 2001, pp. 291-314.
  37. Malerba, D., Lisi, F. A., Appice, A. and Sblendorio, F., 'Mining spatial association rules in census data: a relational approach', Proceedings of the ECML/PKDD'02 workshop on mining official data, University Printing House, Helsinki, 2002, pp. 80-93.
  38. Mannila, H. and Toivonen, H., Levelwise search and borders of theories in knowledge discovery, Data mining and knowledge discovery, Vol. 1, No 3, 1997, pp. 259-289.
  39. Muggleton, S. (ed), Inductive logic programming, Academic Press, London, 1992.
  40. Nienhuys-Cheng, S.-H. and deWolf, R., Foundations of inductive logic programming, Springer, Heidelberg, Germany, 1997.
  41. Nijssen, S. and Kok, J. N., 'Faster association rules for multiple relations', Nebel, B. (ed), Proceedings of the 17th international joint conference on artificial intelligence, Morgan Kaufmann, 2001, pp. 891-896.
  42. Plotkin, G., 'A note on inductive generalisation', Machine intelligence, No 5, 1970, pp. 153-163.
  43. Saporta, G., 'Data mining and official statistics', Atti della Quinta Conferenza Nazionale di Statistica, Rome, 2000, pp. 15-17
  44. References
  45. Critchlow, T., Ganesh, M. and Musick, R., 'Metadata based mediator generation', Proceedings of the third IFCIS conference on cooperative information systems (CoopIS'98), 1998, pp. 168-176.
  46. Fang, D., Hammer, J. and McLeod, D., 'The identification and resolution of semantic heterogeneity in multidatabase systems', Proceedings of international workshop on interoperability in multidatabase systems, Kyoto, April 1991.
  47. Hornsby, K. and Egenhofer, M. J., 'Identity-based change: A foundation for spatio- temporal knowledge representation', International Journal of Geographical Information Science, Vol. 14, No 3, 2000, pp. 207-224.
  48. King, R., Novak, M., Och, C. and Vélez, F., Sybil: Supporting heterogeneous database interoperability with lightweight alliance, NGITS, 1997.
  49. Mendelzon, A. O. and. Vaisman, A. A., 'Temporal queries in OLAP', International conference on very large data bases (VLDB'00), Cairo, Egypt, 10-14 September 2000, pp. 242-253.
  50. Naumann, F, Leser, U. and Freytag, J. C., 'Quality-driven integration of heterogeneous information systems', technical report, Informatik Bericht 117, Humboldt University, 1999.
  51. Navathe, S. B. and Donahoo, M. J., 'Towards intelligent integration of heterogeneous information sources', Proceedings of the sixth international workshop on database re- engineering and interoperability.
  52. Neven, F., Van den Bussche, J., Van Gucht, D. and Vossen, G., 'Typed query languages for databases containing queries', Information Systems, Vol. 24, No 7, 1999, pp. 569-595.
  53. Paolucci, M., Sindoni, G., De Francisci, S. and Tininini, L. 'Sit-in on heterogeneous data with Java, http and relations', Workshop on Java and databases: persistent options, in conjunction with NetObject.Days conference, 2000.
  54. Pissinou, N., Snodgrass, R. T., Elmasri, R., Mumick, I. S., Tamer Özsu, M., Pernici, B., Segev, A., Theodoulidis, B. and Dayal U., 'Towards an infrastructure for temporal databases', Sigmod Record, Vol. 23, No 1, March 1994, pp. 35-51.
  55. Sheth, A. P. and Larson, J. A., 'Federated database systems for managing distributed, heterogeneous, and autonomous databases', ACM computing surveys, Vol. 22, No 3, 1990, pp. 183-23.
  56. AAPOR, Standard definitions: final dispositions of case codes and outcome rates for surveys, AAPOR, Ann Arbor, Michigan, 2000.
  57. Brick, J. M., Montaquila, J. and Scheuren, F. (2002), 'Estimating residency rates for undetermined telephone numbers', Public Opinion Quarterly, No 66, pp. 18-39.
  58. De Heer, W., 'International response trends: results of an international survey', Journal of Official Statistics, Vol. 15, 1999, pp. 129-142.
  59. European Science Foundation, The European Social Survey (ESS) -A research instrument for the social sciences in Europe, European Science Foundation, Strasbourg, 1999.
  60. Ezzati-Rice, T., 'An alternative measure of response rate in random digit dialling surveys that screen for eligible sub-populations', paper presented to the 'International workshop on household survey non-response', Budapest, October 2000.
  61. Frankel, L., 'The report of the CASRO task force on response rates', Wiseman F (ed.), Improving data quality in a sample survey, Marketing Science Institute, Cambridge MA, 1983.
  62. Groves, R. M., Survey errors and survey costs, Wiley Interscience, New York, 1989.
  63. Groves, R. M. and Couper, M. P., Non-response in household interview surveys, John Wiley & Sons, New York, 1998.
  64. Groves, R. M., Dillman, D. A., Little, R. and Eltinge, J., Survey non-response, John Wiley & Sons, New York, 2002.
  65. Kviz, F. J., 'Toward a standard definition of response rate', Public Opinion Quarterly, Vol. 41, 1977, pp. 265-267.
  66. Lessler, J. T. and Kalsbeek, W. D. (1992), Nonsampling error in surveys, John Wiley & Sons, New York.
  67. Lynn, P., Laiho, J., Martin, J. and Beerten, R., 'A project to standardise response rate estimation in the UK', paper presented to the 'International workshop on household survey non-response', Budapest, October 2000.
  68. Lynn, P., Beerten, R., Laiho, J. and Martin, J., 'Recommended standard final outcome categories and standard definitions of response rate for social surveys', Working Papers of the Institute for Social and Economic Research, Paper 2001-23, University of Essex, Colchester, 2001.
  69. Nicolaas, G. and Lynn, P., 'Random digit dialling in the UK: viability revisited', Journal of the Royal Statistical Society -Series A (Statistics in Society), No 165, 2002, pp. 297-316.
  70. Platek, R. and Gray, G. B., 'On the definitions of response rates', Survey Methodology, No 12, 1986, pp. 17-27.
  71. Rydenstam, K. and Wadeskog, A., 'Evaluation of the European time use pilot survey', Eurostat Time Use Surveys Task Force, Document E2/TUS/5/98, Eurostat, Luxembourg, 1998.
  72. Smith, T., Standards for final disposition codes and outcome rates for surveys, NORC/University of Chicago, 2000, http://www.fcsm.gov/papers/smith.html.
  73. Smith, T., 'Developing non-response standards', Groves, R., Dillman, D., Eltinge, J. and Little, R. (eds), Survey non-response, Wiley, New York, 2002.
  74. Statistics Canada, Standards and guidelines for reporting of non-response rates: definitions, framework and detailed guidelines, Statistics Canada, Ottawa, 1993.
  75. Thomson, K., Nicolaas, G., Bromley, C. and Park, A., Welsh Assembly election study, 1999: technical report, National Centre for Social Research, London, 2001.
  76. Hales, J. and Stratford, N., 1996 British Crime Survey (England and Wales): technical report, National Centre for Social Research, London 1996.
  77. 3. Refusal at introduction/before interview
  78. 5. Broken appointment, no re-contact
  79. Other non-interview 5.
  80. 'Household' is an ambiguous term as it can mean surveys taking place in households rather than establishments or surveys of households rather than individuals. Although the former is used internationally, especially in NSIs that carry out both sorts of surveys, here we use the term 'surveys of households' in the latter sense. We have also developed a separate categorisation for surveys of individuals (not shown). This schema is suitable for both address- based (area-based samples and samples from address lists) and register-based samples. In the latter case, some categories do not apply.
  81. References
  82. Agresti, A., Categorical data analysis, John Wiley & Sons, New York, 1990.
  83. Aitchison, J., The statistical analysis of compositional data, Chapman & Hall, London, 1986.
  84. Fienberg, S. E. and Tanur, J. M., 'Experimental and sampling structures: parallels diverging and meeting', International Statistical Review, Vol. 55, No 1, 1987, pp. 75-96.
  85. Fienberg, S. E. and Tanur, J. M., 'From the inside out and the outside in: combining experimental and sampling structures', The Canadian Journal of Statistics, Vol. 16, No 2, 1988, pp. 135-151.
  86. Fienberg, S. E. and Tanur, J. M., 'Combining cognitive and statistical approaches to survey design', Science, Vol. 243, 1989, pp. 1017-1022.
  87. Groves, R. M. and Couper, M. P., Non-response in household interview surveys, John Wiley & Sons, New York, 1998.
  88. Hilbink, K., Van Berkel, C. and Van den Brakel, J.A., 'Methodology of the Dutch labour force survey, 1987-1999'. research paper, BPA No 2297-00-RSM, Department of Statistical Methods, Statistics Netherlands, Heerlen, 2000.
  89. Lehmann, E. L., Nonparametrics: statistical methods based on ranks, McGraw- Hill, New York, 1975.
  90. Lemaître, G. and Dufour, J., 'An integrated method for weighting persons and families', Survey Methodology, Vol. 13, 1987, pp. 199-207.
  91. Nieuwenbroek, N. J., 'An integrated method for weighting characteristics of persons and households using the generalised regression estimator', research paper, BPA No 8445-93-M1, Department of Statistical Methods, Statistics Netherlands, Heerlen, 1993.
  92. Särndal, C. E., Swensson, B. and Wretman, J., Model assisted survey sampling, Springer-Verlag, New York, 1992.
  93. Van den Brakel, J. A., 'Design and analysis of experiments embedded in complex sample surveys', Ph.D thesis, Erasmus University of Rotterdam, 2001.
  94. Van den Brakel, J. A. and Renssen, R. H., Design and analysis of experiments embedded in sample surveys, Journal of Official Statistics, Vol. 14, No 3, 1998, pp. 277-295.
  95. Van den Brakel, J. A. and Renssen, R. H., 'Analysis of experiments embedded in complex sampling designs', research paper 0110, BPA No 10801-01-TMO, Department of Statistical Methods, Statistics Netherlands, Heerlen, 2001. Statistical research at Statistics Norway
  96. Johan Heldal, Jan Bjørnstad, Anne Gro Hustoft, Dinh Q. Pham, Dag Roll-Hansen and Li-Chun Zhang Division for Statistical Methods and Standards, Statistics Norway
  97. References
  98. Bjørnstad, J. F., On the generalisation of the likelihood function and the likelihood principle, Journal of the American Statistical Association, Vol. 91, 1996, pp. 791-806.
  99. Dale, T. and Lagerström, B. O., 'The effect of interviewers' attitudes on their work results'.
  100. Heldal, J., 'Confidentiality problems related to survey data in Norway and some possible approaches', Working Paper 41, joint ECE/Eurostat work session in confidentiality, 2001a.
  101. Heldal, J., 'A ranking approach to confidentiality in survey data', Proceedings of the annual meeting of the American Statistical Association, 5-9 August 2001, 2001b.
  102. Keilman, N. and Pham, D. Q., Predictive intervals for age-specific fertility, European Journal of Population, Vol. 16, 2000, pp. 41-66.
  103. Neyman, J., 'On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection', Journal of the Royal Statistical Society, Vol. 97, 1934, pp. 558-625.
  104. Thomsen, I. and Holmøy, A. M. K., 'Combining data from surveys and administrative record systems: the Norwegian experience', International Statistical Review, Vol. 66, 1998, pp. 201-221.
  105. Thomsen, I. and Zhang, L.-C., 'The effects of using administrative registers in economic short-term statistics: the Norwegian labour force survey as a case study', J. Off. Statist., Vol. 17, 2001, pp. 285-294.
  106. Zhang, L-C., 'Some Norwegian experience with small area estimation', Statist.Trans., Vol. 4, 2000, pp. 649-664.
  107. Zhang, L.-C., 'A method of weighting adjustment for survey data subject to non- ignorable non-response', Discussion Paper 311, Statistics Norway, 2001.