Knowledge and data fusion in probabilistic networks

Kathryn Laskey

Outline

Knowledge and data fusion in probabilistic networks

Kathryn Laskey

2003

Abstract

Probability theory provides the theoretical basis for a logically coherent process of combining prior knowledge with empirical data to draw plausible inferences and to refine theories as observations accrue. Increases in the expressive power of languages for expressing probabilistic theories have been accompanied by refinements and adaptations of Bayesian learning methods to handle the more expressive constructs. These innovations have established Bayesian learning as a unifying theoretical framework for learning in intelligent systems, and have given rise to practical techniques that are receiving wide application. This paper describes theory and methods for exact and approximate learning of probabilistic theories from a combination of background knowledge and observations. The concepts and methods can be adapted to any knowledge representation framework that can express probability distributions over interpretations of a first-order logic. We focus specifically on methods to learn theories that can be expressed in the Multi-Entity Bayesian Network (MEBN) probabilistic logic. MEBN logic is sufficiently general to represent a probability distribution over interpretations of any set of statements that can be expressed in firstorder predicate calculus. Bayesian inference provides both a proof theory for combining prior knowledge with observational evidence to derive plausible conclusions and a learning theory for refining a representation as observational evidence accrues. A formal specification is provided for the MEBN logic. A semantics is based on random variables provides a logically coherent foundation for open world reasoning. The paper describes modifications of standard Bayesian learning methods to handle the repeated structures that occur in MEBN theories. Methods are given for specifying domain knowledge as MEBN fragments with structure and parameter prior distributions.

FAQs

How does knowledge-data fusion enhance learning in probabilistic networks?add

The paper reveals that integrating expert knowledge with empirical data significantly improves the performance of intelligent agents by allowing them to adapt and refine their models effectively under diverse conditions.

What is the role of MEBN logic in probabilistic reasoning?add

MEBN logic extends Bayesian networks by allowing the representation of complex first-order theories, which enhances expressive power and modularity in learning probabilistic models.

How do Bayesian networks learn structure and parameters from data?add

The study discusses general methods that learn both the structural elements and parameters of Bayesian networks from observations, allowing for a coherent integration of expert guidance.

What methods facilitate knowledge integration in MEBN learning?add

The paper describes techniques such as using prior distributions over structures and parameters based on expert input to enhance the fusion of knowledge and data.

What challenges arise from incomplete data in MEBN logic?add

MEBN logic tackles the bias introduced by non-ignorable missing data through observability modeling and expert assessments to ensure accurate parameter estimates despite data limitations.

Figures (8)

Figure 1: Bayesian Network for Diagnostic Task

Table 1: Correspondence Between MEBN and First-Order Logic Syntactic Elements Definition 2: Let V = {Vx}xgg be a symbol system with instance labels [], random variable labels

Note that in the model of Figure 2, all the arguments of parent random variables are functionally related to the arguments of their children (e.g., the conditioning constraint r=MachineLocation(m) specifies a functional relationship between the argument of RoomTemp(r) and EngineStatus(m)). This means that each local distribution can be specified just as in an ordinary Bayesian network, by defining a conditional distribution on states of child nodes given Figure 2: Multi-Entity Bayesian Network for Diagnostic Task

Figure 3: MEBN Fragment Combining Influences from Multiple Parent Instance

For many commonly applied distributions, there is an additive decomposition of [](x,.c, into separate components for each element [}; of a parameter vector. When the prior hyperparameter exhibits the same additive decomposition, we say the problem exhibits local parameter independence. When there is local parameter independence, the learning problem can be decomposed into separate learning problems for each parameter of the local distribution. When the local distributions are exponential family distributions and the prior distributions are conjugate distributions that decompose additively, there is a natural approach to knowledge- data fusion in which we regard the domain expert as a source of data and represent his or her input as a collection of sufficient statistics, one for each of the local distributions to be specified. When data relevant to a parameter vector [}, are expected to be plentiful or when inferences of interest are relatively insensitive to [],, we can economize on expert effort by using a standard non-informative prior distribution, i.e., one in which the number of “virtual prior instances” represented by the hyperparameter [}, is small. When there is stronger prior information relative to the amount of data, informative prior distributions can be elicited. The effort invested in modeling and eliciting prior distributions for different parts of the model can be adjusted according to the sensitivity of inferences of interest to specification of the prior distribution. we ae ~ wy “aplag ‘i. a

= ~ — Identifying and adjusting for biases due to observability is an important role for expert knowledge in knowledge-data fusion. Sensitivity analysis (Laskey, 1995; Castillo, et al., 1997; Coupe and van der Gaag, 1999) can be used to determine bounds on the potential bias due to systematically missing observations. When the bias may be large, the expert can be asked to construct a model of the observability process and to provide judgmental assessments of the parameters of the observability model. These judgmental assessments could be used directly, or they could be used to plan additional data collection efforts designed to estimate the degree of bias. Modeling the observation process and collecting additional data to estimate parameters of the observability model can dramatically reduce the bias due to systematically missing observations.

A random variable for which there are no observations at all is called a hidden or Jatent variable. Latent variables may correspond to observable features for which no data are available Fre: ioe eal ARR Atma that se wae Sam balan

References (90)

Fahiem Bacchus. Representing and Reasoning with Probabilistic Knowledge. MIT Press, Cambridge, Massachusetts, 1990.
Fahiem Bacchus, Adam Grove, Joseph Y. Halpern and Daphne Koller. From statistical knowledge bases to degrees of belief. Artificial Intelligence 87: 75-143, 1997.
Olav Bangsø and Pierre-Henri Wuillemin. Object Oriented Bayesian Networks: A Framework for Topdown Specification of Large Bayesian Networks and Repetitive Structures. Technical Report CIT-87.2-00-obphw1, Department of Computer Science, Aalborg University, 2000.
Olav Bangsø, Helge Langseth, and Thomas Nielsen. Structural Learning in Object Oriented Domains. In Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference, pages 282-286. AAAI Press, Menlo Park, California, 2001.
Matthew J. Beal and Zoubin Gharamani, The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures, Proceedings of the Seventh Valencia Conference on Bayesian Statistics, in press. Available electronically from http://www.gatsby.ucl.ac.uk/~zoubin/papers.html.
Thomas Binford and Tod S. Levitt. Evidential reasoning for object recognition, I E E E Transactions Pattern Analysis and Machine Intelligence, in press.
Xavier Boyen and Daphne Koller. Tractable inference for complex stochastic processes. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 1998.
Michael P. Brown, Richard Hughey, Anders Krogh, I. Saira Mian, Kimmen Sjolander, and David Haussler. Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families. In Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, pages 47-55, AAAI/MIT Press, Menlo Park, California, 1993.
Wray Buntine. Operations for Learning with Graphical Models. Journal of Artificial Intelligence Research, 2: 159-225, 1994.
Enrique Castillo, José Manuel Gutiérrez, and Ali S. Hadi. Sensitivity Analysis in Discrete Bayesian Networks. IEEE Transactions on Systems, Man and Cybernetics, 27: 412-423, 1997. Eugene Charniak and Robert Goldman. A Bayesian Model of Plan Recognition. Artificial Intelligence, 64: 53-79, 1993.
Ming-Hui Chen, Qui-Man Shao, and Joseph G. Ibrahim. Monte Carlo Methods in Bayesian Computation. Springer-Verlag, New York, New York, 2000.
Gregory Cooper and Edward Herskovits. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9: 309-347, 1992.
Veerle M. H. Coupe and Linda C. van der Gaag. Properties of Sensitivity Analysis of Bayesian Belief Networks. Universitet Utrecht UU-CS-1999-29, 1999.
Robert G. Cowell, A. Phillip Dawid, and David J. Spiegelhalter. Sequential Model Criticism in Probabilistic Expert Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence,15(3): 209-219, 1993.
Robert G. Cowell, A. Phillip Dawid, Steffen L. Lauritzen, and David J. Spiegelhalter, Probabilistic Networks and Expert Systems. Springer-Verlag, New York, NY, 1999.
Bruce d'Ambrosio. Local expression languages for probabilistic dependency. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 1991.
Bruce d'Ambrosio, Masami Takikawa, and Daniel Upper. Dynamic Situation Modeling. In DARPA Information Survivability Conference and Exposition, 2001. Available electronically via http://www.iet.com/Projects/jspi/.
Ernest Davis. Representations of Commonsense Knowledge. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1990.
A. Philip Dawid. Statistical Theory: the Prequential Approach. Journal of the Royal Statistical Society, A 147: 278-292, 1984.
Morris DeGroot and Mark J. Schervish. Probability and Statistics (3 rd edition). Addison Wesley, Boston, Massachusetts, 2002.
Arthur P. Dempster, Nan M. Laird, and Donald Rubin. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39: 1-38, 1977.
David Draper. Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society, B 57: 45-97, 1995.
Marek J. Druzdzel and Fransisco Diaz. Criteria for Combining Knowledge from Different Sources in Probabilistic Models. In Notes from the Workshop on Fusion of Domain Knowledge with Data for Decision Support, Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, California, 2000.
Marek J. Druzdzel and Herbert A. Simon. Causality in Bayesian belief networks. In Proceedings of the Ninth Annual Conference on Uncertainty in Artificial Intelligence (UAI-93), pp. 3-11, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 1993.
Marek J. Druzdzel and Linda C. van der Gaag. Building probabilistic networks: "Where do the numbers come from?" IEEE Transactions on Knowledge and Data Engineering, 12: 481-486, 2000.
Robert J. Elliott, Lakhdar Aggoun and John B. Moore. Hidden Markov Models: Estimation and Control. Springer-Verlag, New York, New York, 1995.
Nir Friedman. The Bayesian Structural EM Algorithm. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 1998.
Nir Friedman. Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers, San Mateo, California, 1997.
Nir Friedman and Moises Goldszmidt. Sequential update of Bayesian network structure. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 1997.
Nir Friedman and Moises Goldszmidt. Learning Bayesian Networks with Local Structure. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 1996.
Nir Friedman and Daphne Koller. Being Bayesian about Network Structure. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 2000.
Daniel Geiger and David Heckerman. Advances in probabilistic reasoning. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 1991.
Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman and Hall, London, 1995.
Michael R. Genesereth and Nils J. Nilsson. Logical Foundations of Artificial Intelligence. Morgan Kaufmann Publishers, San Mateo, California, 1987.
Christian Genest and J. Zidek. Combining probability distributions: A critique and annotated bibliography. Statistical Science 1(1), 114-148, 1986.
Lise Getoor, Nir Friedman, Daphne Koller, and Avi Pfeffer. Learning Probabilistic Relational Models. In Saso Dzeroski and Nada Lavrac, editors. Relational Data Mining, Springer- Verlag, New York, New York, 2001.
Lise Getoor, Daphne Koller, Benjamin Taskar, and Nir Friedman. Learning Probabilistic Relational Models with Structural Uncertainty. In Proceedings of the ICML-2000 Workshop on Attribute-Value and Relational Learning:Crossing the Boundaries, Stanford, California, 2000.
W. Gilks, A. Thomas, and David J. Spiegelhalter. A language and program for complex Bayesian modeling. The Statistician, 43: 169-178, 1994.
W. Gilks, Sylvia Richardson, and David J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman and Hall, London, 1996.
Ulf Grenander. Elements of Pattern Theory. Johns Hopkins University Press, Baltimore, MD, 1995.
Liang Gu and Kenneth Rose, Sub-state tying in tied mixture hidden Markov models, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1013-1016, Istanbul, Turkey, Jun. 2000.
Robin Hanson. Combinatorial information market design. Information Systems Frontiers 5(1):105-119, 2003.
David Heckerman. Probabilistic Similarity Networks. MIT Press, Cambridge, MA, 1991.
David Heckerman. A Tutorial on Learning with Bayesian Networks. Microsoft, Redmond, Washington, 1996.
David Heckerman, Daniel Geiger, and David Maxwell Chickering. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20: 197-243, 1995.
Jennifer Hoeting, David Madigan, Adrian Raftery, and Chris Volinsky. Bayesian Model Averaging. Statistical Science, 14(4): 382-417, 1999.
William H. Jefferys and James O. Berger, Ockham's razor and Bayesian analy\sis, American Scientist 80: 64-72, 1992.
Finn V. Jensen. Bayesian Networks and Decision Graphs. Springer-Verlag, New York, New York, 2001.
Michael I. Jordan, editor. Learning in Graphical Models. MIT Press, Cambridge, Massachussets, 1999.
Robert Kass and Adrian Raftery. Bayes Factors. Journal of the American Statistical Association, 90(430): 773-795, 1995.
Daphne Koller and Avi Pfeffer. Object-Oriented Bayesian Networks. In Uncertainty in Artificial Intelligence: Proceedings of the Thirteenth Conference, pages 302-313, Morgan Kaufmann Publishers, San Francisco, California, 1997.
Helge Langseth and Olav Bangsø, Parameter learning in object oriented Bayesian networks. Annals of Mathematics and Artificial Intelligence, 31(1/4): 221-243, 2001.
Helge Langseth and Thomas Nielsen. Fusion of Domain Knowledge with Data for Structured Learning in Object-Oriented Domains. Journal of Machine Learning Research, 2003
Kathryn B. Laskey. Learning Extensible Multi-Entity Directed Graphical Models. In Proceedings of the Workshop on Artificial Intelligence and Statistics, 1999.
Kathryn B. Laskey. Sensitivity Analysis for Probability Assessments in Bayesian Networks. IEEE Transactions in Systems, Man and Cybernetics, 25(6): 901-909,1995.
Kathryn B. Laskey, Bruce d'Ambrosio, Tod Levitt, and Suzanne M. Mahoney. Limited Rationality in Action: Decision Support for Military Situation Assessment. Minds and Machines, 10(1): 53-77, 2000.
Kathryn B. Laskey and Suzanne M. Mahoney. Network Engineering for Agile Belief Network Models. IEEE Transactions in Knowledge and Data Engineering, 2000.
Kathryn B. Laskey and Suzanne M. Mahoney. Network Fragments: Representing Knowledge for Constructing Probabilistic Models. In Uncertainty in Artificial Intelligence: Proceedings of the Thirteenth Conference, Morgan Kaufmann Publishers, San Mateo, California, 1997.
Kathryn B. Laskey, Suzanne M. Mahoney, and Edward Wright. Hypothesis Management in Situation-Specific Network Construction. In Uncertainty in Artificial Intelligence: Proceedings of the Seventeenth Conference, Morgan Kaufmann Publishers, San Mateo, California, 2001.
Kathryn B. Laskey and James M. Myers. Population Markov Chain Monte Carlo. Machine Learning, 50(1): 175-196, 2003.
Stephan Lauritzen. Graphical Models. Oxford Science Publications, Oxford, 1996. Stephan Lauritzen. The EM algorithm for graphical association models with missing data. Computational Statistics & Data Analysis, 19: 191-201, 1995.
Tod S. Levitt, C. Larrabee Winter, Charles J. Turner, Richard A. Chestek, Gil J. Ettinger, and Steve M. Sayre. Bayesian Inference-Based Fusion of Radar Imagery, Military Forces and Tactical Terrain Models in the Image Exploitation System/Balanced Technology Initiative. International Journal of Human-Computer Studies, 42, 1995.
Roderick JA Little and Donald B. Rubin. Statistical Analysis with Missing Data. Wiley, New York, NY, 1987.
David Madigan and Jeremy York. Bayesian Graphical Models for Discrete Data. International Statistical Review, 63: 215-232, 1995.
Suzanne M. Mahoney. Network Fragments. PhD thesis, George Mason University, Fairfax, Virginia, 1999.
Suzanne M. Mahoney and Kathryn B. Laskey. Representing and Combining Partially Specified Conditional Probability Tables. In Uncertainty in Artificial Intelligence: Proceedings of the Fifteenth Conference, Morgan Kaufmann Publishers, San Mateo, California, 1999.
Suzanne M. Mahoney and Kathryn B. Laskey. Constructing Situation Specific Networks. In Uncertainty in Artificial Intelligence: Proceedings of the Fourteenth Conference, Morgan Kaufmann Publishers, San Mateo, California, 1998.
Ron Musick. Minimal Assumption Distribution Propagation in Belief Networks. In Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence, pages 251-258, Morgan Kaufmann Publishers, San Mateo, California, 1993.
Liem Ngo and Peter Haddawy. Answering Queries from Context-Sensitive Probabilistic Knowledge Bases. Theoretical Computer Science, 171:147, 1996.
Richard E. Neapolitan and James R. Kenenvan. Investigation of Variances in Belief Networks. In Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 1991.
Ann Nicholson, Tal Boneh, Tim Wilkin, Kaye Stacey, Liz Sonenberg, and Vicki Steinle. A Case Study in Knowledge Discovery and Elicitation in an Intelligent Tutoring Application. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, California, 2001.
Agnieszka Onisko, Marek J. Druzdzel, and Hanna Wasyluk. Learning Bayesian Network Parameters from Small Data Sets: Application of Noisy-OR gates. In Working Notes of the Workshop on Bayesian and Causal Networks: From Inference to Data Mining, 12th European Conference on Artificial Intelligence, Berlin, Germany. Available online at http://www.pitt.edu/~druzdzel/publ.html, 2000.
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco, California, 1988.
Judea Pearl. Causality. Cambridge University Press, Cambridge, UK, 2000.
Kim-Leng Poh. Utility-based Categorization. PhD thesis, Stanford University, Stanford, California, 1993.
Marco Ramoni and Paola Sebastiani. Robust learning with missing data. Machine Learning, 45: 147-170, 2001.
Christian Robert. The Bayesian Choice (2 nd edition). Springer-Verlag, New York, New York, 2001. Christian Robert and George Casella. Monte Carlo Statistical Methods. Springer-Verlag, New York, New York, 1999.
Stuart Russell. Rationality and Intelligence. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995.
Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice-Hall, New York, New York, 1995.
Stuart Russell and Devika Subramanian. Provably Bounded Optimal Agents. Journal of Artificial Intelligence Research, 2, 1995.
David Schum. Evidential Foundations of Probabilistic Reasoning. Wiley, New York, New York, 1994.
Dana S. Scott. A type-theoretic alternative to CUCH, ISWIM, OWHY. Manuscript, 1969. Later published in Theoretical Computer Science 121: 411-440, 1993.
John F. Sowa. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, California, Brooks/Cole Thomson Learning, 2000.
David J. Spiegelhalter and Adrian Smith. Bayes Factors and Choice Criteria for Linear Models. Journal of the Royal Statistical Society B, 42: 213-220, 1980.
David J. Spiegelhalter, Andrew Thomas, and Nicky Best. Computation on Graphical Models. Bayesian Statistics, 5: 407-425,1996.
Harald Steck and Tommi Jaakkola. On the Dirichlet prior and Bayesian regularization. Proceedings of the Neural Information Processing Society, 2002a.
Harald Steck and Tommi Jaakola. Unsupervised active learning in large domains. Proceedings of the Seventeenth Annual Conference on Uncertainty in Artificial Intelligence, 2002b.
Masami Takikawa, Bruce d'Ambrosio and Ed Wright, Real-time inference with large-scale temporal Bayes nets, Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2001. Henri Thiel. Principles of Econometrics. Wiley, New York, New York, 1971.
Manfred Warmuth and Olivier Bousquet. Tracking a small set of experts by mixing past posteriors, Journal of Machine Learning Research 3:363-296, 2002.
Joe Whittaker.. Graphical Models in Applied Multivariate Statistics. Wiley, Chichester, 1990.

Knowledge and data fusion in probabilistic networks

Sign up for access to the world's latest research

Abstract

FAQs

Related papers

References (90)

Related papers

Related topics