Academia.eduAcademia.edu

Outline

Coordination of parallel learning processes in animals and robots

https://doi.org/10.3389/FNBEH.2012.00079

Abstract
sparkles

AI

This HDR manuscript presents research at the intersection of Computational Neuroscience and Cognitive Robotics, focusing on understanding how animals and robots adapt behaviorally in dynamic environments. The study aims to uncover the behavioral and neural correlates of learning processes while inspiring the design of autonomous learning robots. The research examines the combination of model-based and model-free reinforcement learning to explain behavior during conditioning and navigation. It proposes computational solutions for coordinating parallel learning processes and highlights the benefits of cross-disciplinary collaboration in refining neuro-inspired robotic models and confirming their efficacy in real-world applications.

References (647)

  1. Adams, S., Kesner, R. P., and Ragozzino, M. E. (2001). Role of the medial and lateral caudate-putamen in mediating an auditory conditional response association. Neurobiol. Learn. Mem. 76, 106-116.
  2. Albertin, S. V., Mulder, A. B., Tabuchi, E., Zugaro, M. B., and Wiener, S. I. (2000). Lesions of the medial shell of the nucleus accumbens impair rats in finding larger rewards, but spare reward-seeking behavior. Behav. Brain Res. 117, 173-183.
  3. Alexander, G. E., Crutcher, M. D., and DeLong, M. R. (1990). Basal ganglia-thalamocortical circuits: parallel substrates for motor, ocu- lomotor, "prefrontal" and "limbic" functions. Prog. Brain Res. 85, 119-146. Arleo, A., and Gerstner, W. (2000). Spatial cognition and neuro- mimetic navigation: a model of hippocampal place cell activity. Biol. Cybern. 83, 287-299.
  4. Arleo, A., and Rondi-Reig, L. (2007). Multimodal sensory integration and concurrent navigation strategies for spatial cognition in real and artifi- cial organisms. J. Int. Neurosci. 6, 327-366.
  5. Atallah, H. E., Lopez-Paniagua, D., Rudy, J. W., and O'Reilly, R. C. (2007). Separate neural substrates for skill learning and performance in the ventral and dorsal striatum. Nat. Neurosci. 10, 126-131.
  6. Balleine, B. (2005). Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic cir- cuits. Physiol. Behav. 86, 717-730.
  7. Balleine, B., and Dickinson, A. (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical sub- strates. Neuropharmacology 37, 407-419.
  8. Balleine, B., and Killcross, S. (1994). Effects of ibotenic acid lesions of the nucleus accumbens on instru- mental action. Behav. Brain Res. 65, 181-193.
  9. Banquet, J. P., Gaussier, P., Quoy, M., Revel, A., and Burnod, Y. (2005). A hierarchy of associa- tions in hippocampo-cortical sys- tems: cognitive maps and naviga- tion strategies. Neural Comput. 17, 1339-1384.
  10. Barnes, T. D., Kubota, Y., Hu, D., Jin, D. Z., and Graybiel, A. M. (2005). Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature 437, 1158-1161.
  11. Bauter, M. R., Brockel, B. J., Pankevich, D. E., Virgolini, M. B., and Cory- Slechta, D. A. (2003). Glutamate and dopamine in nucleus accum- bens core and shell: sequence learning versus performance. Neurotoxicology 24, 227-243.
  12. Bayer, H. M., and Glimcher, P. W. (2005). Midbrain dopamine neu- rons encode a quantitative reward prediction error signal. Neuron 47, 129-141.
  13. Berke, J. D., Breck, J. T., and Eichenbaum, H. (2009). Striatal versus hippocampal represen- tations during win-stay maze performance. J. Neurophysiol. 101, 1575-1587.
  14. Bolam, J. P., Bergman, H., Graybiel, A. M., Kimura, M., Plenz, D., Seung, H. S., et al. (2006). "Microcircuits in the striatum," in Microcircuits: The Interface Between Neurons and Global Brain Function, eds S. Grillner and A. M. Graybiel (Cambridge, MA: MIT Press), 165-190.
  15. Bornstein, A. M., and Daw, N. D. (2011). Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374-380.
  16. Bornstein, A. M., and Daw, N. D. (2012). Dissociating hippocam- pal and striatal contributions to sequential prediction learning. Eur. J. Neurosci. 35, 1011-1023.
  17. Botreau, F., and Gisquet-Verrier, P. (2010). Re-thinking the role of the dorsal striatum in egocentric/response strategy. Front. Behav. Neurosci. 4:7. doi: 10.3389/neuro.08.007.2010
  18. Brown, L., and Sharp, F. (1995). Metabolic mapping of rat striatum: somatotopic organization of sen- sorimotor activity. Brain Res. 686, 207-222.
  19. Burgess, N., Recce, M., and O'Keefe, J. (1994). A model of hippocam- pal function. Neural Netw. 7, 1065-1081.
  20. Chang, Q., and Gold, P. E. (2003). Switching memory systems during learning: changes in patterns of brain acetylcholine release in the hippocampus and striatum in rats. J. Neurosci. 23, 3001-3005.
  21. Chang, Q., and Gold, P. E. (2004). Inactivation of dorsolateral stria- tum impairs acquisition of response learning in cue-deficient, but not cue-available, conditions. Behav. Neurosci. 118, 383-388.
  22. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B., and Uchida, N. (2012). Neuron-type-specific sig- nals for reward and punishment in the ventral tegmental area. Nature 482, 85-88.
  23. Corbit, L. H., and Balleine, B. W. (2011). The general and outcome-specific forms of pavlovian-instrumental transfer are differentially mediated by the nucleus accumbens core and shell. J. Neurosci. 31, 11786-11794.
  24. Corbit, L. H., Muir, J. L., and Balleine, B. W. (2001). The role of the nucleus accumbens in instrumental conditioning: evidence of a func- tional dissociation between accum- bens core and shell. J. Neurosci. 21, 3251-3260.
  25. Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., and Dolan, R. J. (2011). Model-based influences on humans' choices and striatal predic- tion errors. Neuron 69, 1204-1215.
  26. Daw, N. D., Niv, Y., and Dayan, P. (2005). Uncertainty-based com- petition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704-1711.
  27. Dayan, P., and Balleine, B. (2002). Reward, motivation, and reinforce- ment learning. Neuron 36, 285-298.
  28. Dayan, P., and Niv, Y. (2008). Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185-196.
  29. De Leonibus, E., Costantini, V. J. A., Massaro, A., Mandolesi, G., Vanni, V., Luvisetto, S., et al. (2011). Cognitive and neural determinants of response strategy in the dual- solution plus-maze task. Learn. Mem. 18, 241-244.
  30. De Leonibus, E., Oliverio, A., and Mele, A. (2005). A study on the role of the dorsal striatum and the nucleus accumbens in allocentric and ego- centric spatial memory consolida- tion. Learn. Mem. 12, 491-503.
  31. Devan, B. D., and White, N. M. (1999). Parallel information processing in the dorsal striatum: relation to hip- pocampal function. J. Neurosci. 19, 2789-2798.
  32. Dickinson, A. (1980). Contemporary Animal Learning Theory. Cambridge, UK: Cambridge University Press.
  33. Dickinson, A. (1985). Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. B Biol. Sci. 308, 67-78.
  34. Dollé, L., Sheynikhovich, D., Girard, B., Chavarriaga, R., and Guillot, A. (2010). Path planning versus cue responding: a bio-inspired model of switching between navi- gation strategies. Biol. Cybern. 103, 299-317.
  35. Euston, D. R., Tatsuno, M., and McNaughton, B. L. (2007).
  36. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147-1150.
  37. Faure, A., Haberland, U., Condé, F., and Massioui, N. E. (2005). Lesion to the nigrostriatal dopamine system dis- rupts stimulus-response habit for- mation. J. Neurosci. 25, 2771-2780.
  38. Foster, D., Morris, R., and Dayan, P. (2000). Models of hippocam- pally dependent navigation using the temporal difference learning rule. Hippocampus 10, 1-16.
  39. Foster, D. J., and Wilson, M. A. (2006). Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680-683.
  40. Franz, M. O., and Mallot, H. A. (2000). Biomimetic robot navigation. Rob. Auton. Syst. 30, 133-153.
  41. Gallistel, C. R. (1990). The Organization of Learning. Cambridge, MA: MIT Press.
  42. Gaussier, P., Revel, A., Banquet, J. P., and Babeau, V. (2002). From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biol. Cybern. 86, 15-28.
  43. Girard, B., Tabareau, N., Pham, Q. C., Berthoz, A., and Slotine, J. J. (2008). Where neuroscience and dynamic system theory meet autonomous robotics: a contracting basal ganglia model for action selection. Neural Netw. 21, 628-641.
  44. Glascher, J., Daw, N., Dayan, P., and O'Doherty, J. P. (2010). States versus rewards: dissociable neural predic- tion error signals underlying model- based and model-free reinforcement learning. Neuron 66, 585-595.
  45. Gorny, J. H., Gorny, B., Wallace, D. G., and Whishaw, I. Q. (2002). Fimbria- fornix lesions disrupt the dead reckoning (homing) component of exploratory behavior in mice. Learn. Mem. 9, 387-394.
  46. Graybiel, A. M. (1998). The basal gan- glia and chunking of action reper- toires. Neurobiol. Learn. Mem. 70, 119-136.
  47. Groenewegen, H. J., Wright, C. I., and Beijer, A. V. (1996). The nucleus accumbens: gateway for limbic structures to reach the motor system? Prog. Brain Res. 107, 485-511.
  48. Gruber, A. J., Hussain, R. J., and O'Donnell, P. (2009). The nucleus accumbens: a switchboard for goal-directed behaviors. PLoS ONE 4:e5062. doi: 10.1371/journal.pone.0005062
  49. Gupta, A. S., van der Meer, M. A., Touretzky, D. S., and Redish, A. D. (2010). Hippocampal replay is not a simple function of experience. Neuron 65, 695-705.
  50. Haber, S. N. (2003). The primate basal ganglia: parallel and integra- tive networks. J. Chem. Neuroanat. 26, 317-330.
  51. Haber, S. N., Fudge, J. L., and McFarland, N. R. (2000). Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J. Neurosci. 20, 2369-2382.
  52. Hannesson, D. K., and Skelton, R. W. (1998). Recovery of spatial perfor- mance in the morris water maze following bilateral transection of the fimbria/fornix in rats. Behav. Brain Res. 90, 35-56.
  53. Hartley, T., and Burgess, N. (2005). Complementary memory systems: competition, cooperation and compensation. Trends Neurosci. 28, 169-170.
  54. Hasselmo, M. (2005). A model of prefrontal cortical mechanisms for goal-directed behavior. J. Cogn. Neurosci. 17, 1115-1129.
  55. Heimer, L., Alheid, G. F., de Olmos, J. S., Groenewegen, H., Haber, S., E., Harlan, R. E., et al. (1997). The accumbens: beyond the core-shell dichotomy. J. Neuropsychiatry Clin. Neurosci. 9, 354-381.
  56. Hok, V., Save, E., Lenck-Santini, P. P., and Poucet, B. (2005). Coding for spatial goals in the prelimbic/infralimbic area of the rat frontal cortex. PNAS 102, 4602-4607.
  57. Honzik, C. H. (1936). The sensory basis of maze learning in rats. Comp. Psychol. Monogr. 13, 113.
  58. Houk, J. C., and Wise, S. P. (1995). Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb. Cortex 5, 95-110.
  59. Humphries, M. D., Khamassi, M., and Gurney, K. (2012). Dopaminergic control of the exploration- exploitation trade-off via the basal ganglia. Front. Neurosci. 6:9. doi: 10.3389/fnins.2012.00009
  60. Humphries, M. D., and Prescott, T. J. (2010). The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog. Neurobiol. 90, 385-417.
  61. Humphries, M. D., Stewart, R. D., and Gurney, K. N. (2006). A physio- logically plausible model of action selection and oscillatory activity in the basal ganglia. J. Neurosci. 26, 12921-12942.
  62. Ikemoto, S. (2002). Ventral stri- atal anatomy of locomotor activity induced by cocaine, (d)-amphetamine, dopamine and d1/d2 agonists. Neuroscience 113, 939-955.
  63. Ito, M., and Doya, K. (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Curr. Opin. Neurobiol. 21, 368-373.
  64. Jacobson, T. K., Gruenbaum, B. F., and Markus, E. J. (2012). Extensive training and hippocampus or stria- tum lesions: effect on place and response strategies. Physiol. Behav. 105, 645-652.
  65. Jadhav, S. P., Kemere, C., German, P. W., and Frank, L. M. (2012). Awake hip- pocampal sharp-wave ripples sup- port spatial memory. Science 336, 1454-1458.
  66. Joel, D., Niv, Y., and Ruppin, E. (2002). Actor-critic models of the basal gan- glia: new anatomical and computa- tional perspectives. Neural Netw. 15, 535-547.
  67. Joel, D., and Weiner, I. (1994). The organization of the basal ganglia- thalamocortical circuits: open inter- connected rather than closed segre- gated. Neuroscience 63, 363-379.
  68. Joel, D., and Weiner, I. (2000). The con- nections of the dopaminergic sys- tem with the striatum in rats and primates: an analysis with respect to the functional and compartmen- tal organization of the striatum. Neuroscience 96, 451-474.
  69. Jog, M. S., Kubota, Y., Connolly, C. I., Hillegaart, V., and Graybiel, A. M. (1999). Building neural rep- resentations of habits. Science 286, 1745-1749.
  70. Johnson, A., and Redish, A. D. (2005). Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw. 18, 1163-1171.
  71. Johnson, A., and Redish, A. D. (2007). Neural ensembles in CA3 tran- siently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176-12189.
  72. Kelley, A. E. (1999). Neural integra- tive activities of nucleus accumbens subregions in relation to learning and motivation. Psychobiology 27, 198-213.
  73. Khamassi, M. (2007). Complementary Roles of the Rat Prefrontal Cortex and Striatum in Reward-based Learning and Shifting Navigation Strategies. PhD thesis, Université Pierre et Marie Curie.
  74. Khamassi, M., Lacheze, L., Girard, B., Berthoz, A., and Guillot, A. (2005). Actor-critic models of reinforce- ment learning in the basal ganglia: from natural to arificial rats. Adapt. Behav. 13, 131-148.
  75. Khamassi, M., Mulder, A. B., Tabuchi, E., Douchamps, V., and Wiener, S. I. (2008). Anticipatory reward sig- nals in ventral striatal neurons of behaving rats. Eur. J. Neurosci. 28, 1849-1866.
  76. Kim, S. M., and Frank, L. M. (2009). Hippocampal lesions impair rapid learning of a continuous spatial alternation task. PLoS ONE 4:e5494. doi: 10.1371/journal.pone.0005494
  77. Kimchi, E. Y., and Laubach, M. (2009). Dynamic encoding of action selection by the medial striatum. J. Neurosci. 29, 3148-3159.
  78. Kimchi, E. Y., Torregrossa, M. M., Taylor, J. R., and Laubach, M. (2009). Neuronal correlates of instrumental learning in the dorsal striatum. J. Neurophysiol. 102, 475-489.
  79. Krech, D. (1932). The genesis of "hypotheses" in rats. Publ. Psychol. 6, 45-64.
  80. Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L., and Pennartz, C. M. A. (2009). Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol. 7:e1000173. doi: 10.1371/journal.pbio.1000173
  81. Leblois, A., Boraud, T., Meissner, W., Bergman, H., and Hansel, D. (2006). Competition between feed- back loops underlies normal and pathological dynamics in the basal ganglia. J. Neurosci. 26, 3567-3583.
  82. Lex, B., Sommer, S., and Hauber, W. (2011). The role of dopamine in the dorsomedial striatum in place and response learning. Neuroscience 172, 212-218.
  83. Martel, G., Blanchard, J., Mons, N., Gastambide, F., Micheau, J., and Guillou, J. (2007). Dynamic inter- plays between memory systems depend on practice: the hip- pocampus is not always the first to provide solution. Neuroscience 150, 743-753.
  84. Martinet, L.-E., Sheynikhovich, D., Benchenane, K., and Arleo, A. (2011). Spatial learning and action planning in a prefrontal cortical network model. PLoS Comput. Biol. 7:e1002045. doi: 10.1371/ journal.pcbi.1002045
  85. Maurin, Y., Banrezes, B., Menetrey, A., Mailly, P., and Deniau, J. M. (1999). Three-dimensional distribution of nigrostriatal neurons in the rat: rela- tion to the topography of striaton- igral projections. Neuroscience 91, 891-909.
  86. McDannald, M. A., Lucantonio, F., Burke, K. A., Niv, Y., and Schoenbaum, G. (2011). Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700-2705.
  87. Middleton, F. A., and Strick, P. L. (2000). Basal ganglia and cerebel- lar loops: motor and cognitive cir- cuits. Brain Res. Brain Res. Rev. 31, 236-250.
  88. Mink, J. W. (1996). The basal ganglia: focused selection and inhibition of competing motor programs. Prog. Neurobiol. 50, 381-425.
  89. Mogenson, G. J., Jones, D. L., and Yim, C. Y. (1980). From motivation to action: functional interface between the limbic system and the motor system. Prog. Neurobiol. 14, 69-97.
  90. Morris, R. G. M. (1981). Spatial local- ization does not require the pres- ence of local cues. Learn. Motiv. 12, 239-260.
  91. Moussa, R., Poucet, B., Amalric, M., and Sargolini, F. (2011). Contributions of dorsal striatal subregions to spatial alternation behavior. Learn. Mem. 18, 444-451.
  92. Mulder, A. B., Tabuchi, E., and Wiener, S. I. (2004). Neurons in hippocam- pal afferent zones of rat stria- tum parse routes into multi-pace segments during maze navigation. Eur. J. Neurosci. 19, 1923-1932.
  93. Nicola, S. M. (2007). The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology (Berl.) 191, 521-550.
  94. O'Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., and Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452-454.
  95. O'Keefe, J., and Nadel, L. (1978). The Hippocampus as a Cognitive Map. Oxford, UK: Oxford University Press. Packard, M. (1999). Glutamate infused posttraining into the hippocam- pus or caudate-putamen differen- tially strengthens place and response learning. PNAS 96, 12881-12886.
  96. Packard, M., and McGaugh, J. (1992). Double dissociation of fornix and caudate nucleus lesions on acquisi- tion of two water maze tasks: further evidence for multiple memory sys- tems. Behav. Neurosci. 106, 439-446.
  97. Packard, M., and McGaugh, J. (1996). Inactivation of hippocampus or caudate nucleus with lidocaine dif- ferentially affects the expression of place and response learning. Neurobiol. Learn. Mem. 65, 65-72.
  98. Packard, M. G., Hirsh, R., and White, N. M. (1989). Differential effects of fornix and caudate nucleus lesions on two radial maze tasks: evi- dence for multiple memory systems. J. Neurosci. 9, 1465-1472.
  99. Packard, M. G., and Knowlton, B. J. (2002). Learning and memory func- tions of the basal ganglia. Annu. Rev. Neurosci. 25, 563-593.
  100. Pearce, J. M., Roberts, A. D., and Good, M. (1998). Hippocampal lesions disrupt navigation based on cogni- tive maps but not heading vectors. Nature 396, 75-77.
  101. Pennartz, C. M., Groenewegen, H. J., and da Silva, F. H. L. (1994). The nucleus accumbens as a com- plex of functionally distinct neu- ronal ensembles: an integration of behavioural, electrophysiolog- ical and anatomical data. Prog. Neurobiol. 42, 719-761.
  102. Penner, M. R., and Mizumori, S. J. Y. (2012). Neural systems analy- sis of decision making during goal- directed navigation. Prog. Neurobiol. 96, 96-135.
  103. Peoples, L. L., Gee, F., Bibi, R., and West, M. O. (1998). Phasic firing time locked to cocaine self-infusion and locomotion: dissociable firing patterns of single nucleus accum- bens neurons in the rat. J. Neurosci. 18, 7588-7598.
  104. Ploeger, G. E., Spruijt, B. M., and Cools, A. R. (1994). Spatial local- ization in the morris water maze in rats: acquisition is affected by intra-accumbens injections of the dopaminergic antagonist haloperidol. Behav. Neurosci. 108, 927-934. Potegal, M. (1972). The caudate nucleus egocentric localization system. Acta Neurobiol. Exp. 32, 479-494.
  105. Poucet, B., Lenck-Santini, P. P., Hok, V., Save, E., Banquet, J. P., Gaussier, P., et al. (2004). Spatial navigation and hippocampal place cell firing: the problem of goal encoding. Rev. Neurosci. 15, 89-107.
  106. Pych, J. C., Chang, Q., Colon-Rivera, C., and Gold, P. E. (2005). Acetylcholine release in hip- pocampus and striatum during testing on a rewarded spontaneous alternation task. Neurobiol. Learn. Mem. 84, 93-101.
  107. Ragozzino, M. E., and Choi, D. (2004). Dynamic changes in acetylcholine output in the medial striatum dur- ing place reversal learning. Learn. Mem. 11, 70-77.
  108. Redgrave, P., Prescott, T. J., and Gurney, K. (1999). The basal ganglia: a verte- brate solution to the selection prob- lem? Neuroscience 89, 1009-1023.
  109. Redish, A. D. (1999). Beyond the Cognitive Map: From Place Cells to Episodic Memory. Cambridge, MA: MIT Press.
  110. Redish, A. D., and Touretzky, D. S. (1997). Cognitive maps beyond the hippocampus. Hippocampus 7, 15-35.
  111. Redish, A. D., and Touretzky, D. S. (1998). The role of the hippocam- pus in solving the morris water maze. Neural Comput. 10, 73-111.
  112. Reynolds, J. N., Hyland, B. I., and Wickens, J. R. (1957). Discrimination of cues in mazes: a resolution of the "place-vs.- response" question. Psychol. Rev. 64, 217-228.
  113. Reynolds, J. N., Hyland, B. I., and Wickens, J. R. (2001). A cellular mechanism of reward-related learn- ing. Nature 413, 67-70.
  114. Reynolds, S. M., and Berridge, K. C. (2003). Glutamate motivational ensembles in nucleus accumbens: rostrocaudal shell gradients of fear and feeding. Eur. J. Neurosci. 17, 2187-2200.
  115. Rudy, J. W. (2009). Context repre- sentations, context functions, and the parahippocampal-hippocampal system. Learn. Mem. 16, 573-585.
  116. Sargolini, F., Florian, C., Oliverio, A., Mele, A., and Roullet, P. (2003). Differential involvement of NMDA and AMPA receptors within the nucleus accumbens in consolida- tion of information necessary for place navigation and guidance strat- egy of mice. Learn. Mem. 10, 285-292.
  117. Schmitzer-Torbert, N. C., and Redish, A. D. (2008). Task-dependent encoding of space and events by striatal neurons is dependent on neural subtype. Neuroscience 153, 349-360.
  118. Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593-1599.
  119. Setlow, B., and McGaugh, J. (1998). Sulpiride infused into the nucleus accumbens posttraining impairs memory of spatial water maze training. Behav. Neurosci. 112, 603-610.
  120. Setlow, B., Schoenbaum, G., and Gallagher, M. (2003). Neural encoding in ventral striatum during olfactory discrimination learning. Neuron 38, 625-636.
  121. Shen, W., Flajolet, M., Greengard, P., and Surmeier, D. J. (2008). Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321, 848-851.
  122. Shibata, R., Mulder, A. B., Trullier, O., and Wiener, S. I. (2001). Position sensitivity in phasically discharg- ing nucleus accumbens neurons of rats alternating between tasks requiring complementary types of spatial cues. Neuroscience 108, 391-411. Smith, D. M., and Mizumori, S. J. Y. (2006). Hippocampal place cells, context, and episodic memory. Hippocampus 16, 716-729.
  123. Sutherland, R. J., and Hamilton, D. A. (2004). Rodent spatial navigation: at the crossroads of cognition and movement. Neurosci. Biobehav. Rev. 28, 687-697.
  124. Sutherland, R. J., and Rodriguez, A. J. (1989). The role of the fornix/fimbria and some related subcortical structures in place learning and memory. Behav. Brain Res. 32, 265-277.
  125. Sutton, R. S., and Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
  126. Taha, S. A., Nicola, S. M., and Fields, H. L. (2007). Cue-evoked encoding of movement planning and execu- tion in the rat nucleus accumbens. J. Physiol. 584, 801-818.
  127. Tang, C., Pawlak, A. P., Prokopenko, V., and West, M. O. (2007). Changes in activity of the striatum during formation of a motor habit. Eur. J. Neurosci. 25, 1212-1227.
  128. Tepper, J. M., Koos, T., and Wilson, C. J. (2004). GABAergic microcircuits in the neostriatum. Trends Neurosci. 27, 662-669.
  129. Thorn, C. A., Atallah, H., Howe, M., and Graybiel, A. M. (2010). Differential dynamics of activity changes in dorsolateral and dorso- medial striatal loops during learn- ing. Neuron 66, 781-795.
  130. Tolman, E. C. (1948). Cognitive maps in rats and men. Psychol. Rev. 55, 189-208.
  131. Trullier, O., Wiener, S., Berthoz, A., and Meyer, J.-A. (1997). Biologically-based artificial naviga- tion systems: review and prospects. Prog. Neurobiol. 51, 483-544.
  132. Uylings, H. B. M., Groenewegen, H. J., and Kolb, B. (2003). Do rats have a prefrontal cortex? Behav. Brain Res. 146, 3-17.
  133. van der Meer, M. A. A., Johnson, A., Schmitzer-Torbert, N. C., and Redish, A. D. (2010). Triple disso- ciation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spa- tial decision task. Neuron 67, 25-32.
  134. van der Meer, M. A. A., Kurth- Nelson, Z., and Redish, A. D. (2012). Information processing in decision- making systems. Neuroscientist 18, 342-359.
  135. van der Meer, M. A. A., and Redish, A. D. (2009). Covert expectation-of-reward in rat ven- tral striatum at decision points. Front. Integr. Neurosci. 3:1. doi: 10.3389/neuro.07.001.2009
  136. van der Meer, M. A. A., and Redish, A. D. (2010). Theta phase precession in rat ventral striatum links place and reward information. J. Neurosci. 31, 2843-2854.
  137. van der Meer, M. A. A., and Redish, A. D. (2011). Ventral striatum: a critical look at models of learn- ing and evaluation. Curr. Opin. Neurobiol. 21, 387-392.
  138. Voorn, P., Vanderschuren, L. J., Groenewegen, H. J., Robbins, T. W., and Pennartz, C. M. (2004). Putting a spin on the dorsal-ventral divide of the striatum. Trends Neurosci. 27, 468-474.
  139. Watabe-Uchida, M., Zhu, L., Ogawa, S. K., Vamanrao, A., and Uchida, N. (2012). Whole-brain map- ping of direct inputs to midbrain dopamine neurons. Neuron 74, 858-873.
  140. Whishaw, I. Q., Cassel, J. C., and Jarrad, L. E. (1995). Rats with fimbria-fornix lesions display a place response in a swimming pool: a dissociation between getting there and knowing where. J. Neurosci. 15, 5779-5788.
  141. Whishaw, I. Q., Mittleman, G., Bunch, S. T., and Dunnett, S. B. (1987). Impairments in the acquisition, retention and selection of spatial navigation strategies after medial caudate-putamen lesions in rats. Behav. Brain Res. 24, 125-138.
  142. White, N. M., and McDonald, R. J. (2002). Multiple parallel mem- ory systems in the brain of the rat. Neurobiol. Learn. Mem. 77, 125-184.
  143. Wiener, S. I. (1993). Spatial and behav- ioral correlates of striatal neurons in rats performing a self-initiated navigation task. J. Neurosci. 13, 3802-3817.
  144. Wiener, S. I., Paul, C. A., and Eichenbaum, H. (1989). Spatial and behavioral correlates of hip- pocampal neuronal activity. J. Neurosci. 9, 2737-2763.
  145. Willingham, D. B. (1998). What differ- entiates declarative and procedural memories: reply to cohen, poldrack, and eichenbaum (1997). Memory 6, 689-699.
  146. Yin, H. H., and Knowlton, B. J. (2004). Contributions of striatal subregions to place and response learning. Learn. Mem. 11, 459-463.
  147. Yin, H. H., and Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7, 464-476.
  148. Yin, H. H., Knowlton, B. J., and Balleine, B. W. (2004). Lesions of dorsolateral striatum preserve out- come expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181-189.
  149. Yin, H. H., Knowlton, B. J., and Balleine, B. W. (2005a). Blockade of NMDA receptors in the dor- somedial striatum prevents action- outcome learning in instrumental conditioning. Eur. J. Neurosci. 22, 505-512.
  150. Yin, H. H., Ostlund, S. B., Knowlton, B. J., and Balleine, B. W. (2005b). The role of the dorsomedial stria- tum in instrumental conditioning. Eur. J. Neurosci. 22, 513-523.
  151. Yin, H. H., Ostlund, S. B., and Balleine, B. W. (2008). Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur. J. Neurosci. 28, 1437-1448.
  152. Lesaint, Sigaud, Flagel, Robinson, Khamassi (2014) PLoS Compu- tational Biology References
  153. Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. The MIT Press.
  154. Sutton RS, Barto AG (1987) A temporal-difference model of classical conditioning. In: Proceedings of the ninth annual conference of the cognitive science society. Seattle, WA, pp. 355-378.
  155. Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG, editors, Models of information processing in the basal ganglia, The MIT Press. pp. 215-232.
  156. Clark JJ, Hollon NG, Phillips PEM (2012) Pavlovian valuation systems in learning and decision making. Curr Opin Neurobiol 22: 1054-1061.
  157. Simon DA, Daw ND (2012) Dual-system learning models and drugs of abuse. In: Computational Neuroscience of Drug Addiction, Springer. pp. 145- 161.
  158. Cardinal RN, Parkinson JA, Hall J, Everitt BJ (2002) Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci Biobehav Rev 26: 321-352.
  159. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW (2005) The role of the dorsomedial striatum in instrumental conditioning. Eur J neurosci 22: 513-523.
  160. Solway A, Botvinick MM (2012) Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol Rev 119: 120-154.
  161. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences on humans' choices and striatal prediction errors. Neuron 69: 1204- 1215.
  162. Graybiel AM (2008) Habits, rituals, and the evaluative brain. Annu Rev Neurosci 31: 359-387.
  163. Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J neurosci 19: 181-189.
  164. Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1-27.
  165. Fiorillo CD, Tobler PN, Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898-1902.
  166. Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuj A, et al. (2011) A selective role for dopamine in stimulus-reward learning. Nature 469: 53-57.
  167. Danna CL, Elmer GI (2010) Disruption of conditioned reward association by typical and atypical antipsychotics. Pharmacol Biochem Behav 96: 40-47.
  168. Dayan P, Niv Y, Seymour B, Daw ND (2006) The misbehavior of value and the discipline of the will. Neural Netw 19: 1153-1160.
  169. Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8: 1704-1711.
  170. Keramati M, Dezfouli A, Piray P (2011) Speed/Accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput Biol 7: e1002055.
  171. Gla ¨scher J, Daw ND, Dayan P, O'Doherty JP (2010) States versus rewards: dissociable neural prediction error signals underlying model-based and model- free reinforcement learning. Neuron 66: 585-595.
  172. Flagel SB, Watson SJ, Robinson TE, Akil H (2007) Individual differences in the propensity to approach signals vs goals promote different adaptations in the dopamine system of rats. Psychopharmacology 191: 599-607.
  173. Flagel SB, Akil H, Robinson TE (2009) Individual differences in the attribution of incentive salience to reward-related cues: Implications for addiction. Neuropharmacology 56: 139-148.
  174. Robinson TE, Flagel SB (2009) Dissociating the predictive and incentive motivational properties of reward-related cues through the study of individual differences. Biol psychiatry 65: 869-873.
  175. Mahler SV, Berridge KC (2009)Which cue to ''want?'' Central amygdala opioid activation enhances and focuses incentive salience on a prepotent reward cue. J Neurosci 29: 6500-13.
  176. DiFeliceantonio AG, Berridge KC (2012) Which cue to 'want'? Opioid stimulation of central amygdala makes goal-trackers show stronger goal-tracking, just as sign-trackers show stronger sign-tracking. Behav Brain Res 230: 399-408.
  177. Saunders BT, Robinson TE (2012) The role of dopamine in the accumbens core in the expression of pavlovian-conditioned responses. Eur J neurosci 36: 2521- 2532.
  178. Meyer PJ, Lovic V, Saunders BT, Yager LM, Flagel SB, et al. (2012) Quantifying individual variation in the propensity to attribute incentive salience to reward cues. PLoS ONE 7: e38987.
  179. Berridge KC (2007) The debate over dopamines role in reward: the case for incentive salience. Psychopharmacology 191: 391-431.
  180. Lovic V, Saunders BT, Yager LM, Robinson TE (2011) Rats prone to attribute incentive salience to reward cues are also prone to impulsive action. Behav Brain Res 223: 255-261.
  181. Williams BA (1994) Conditioned reinforcement: Experimental and theoretical issues. Behav Anal 17: 261-285.
  182. Skinner BF (1938) The behavior of organisms: An experimental analysis. Appleton-Century-Crofts New York, 82-82 pp.
  183. Lomanowska AM, Lovic V, Rankine MJ, Mooney SJ, Robinson TE, et al. (2011) Inadequate early social experience increases the incentive salience of reward-related cues in adulthood. Behav Brain Res 220: 91-99.
  184. Humphries MD, Khamassi M, Gurney K (2012) Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Front Neurosci 6: 9.
  185. Khamassi M, Humphries MD (2012) Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 6.
  186. Huys QJM, Eshel N, O'Nions E, Sheridan L, Dayan P, et al. (2012) Bonsai trees in your head: How the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Comput Biol 8: e1002410.
  187. Doya K, Samejima K, Katagiri Ki, Kawato M (2002) Multiple model-based reinforcement learning. Neural Comput 14: 1347-1369.
  188. Redish AD, Jensen S, Johnson A, Kurth-Nelson Z (2007) Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling. Psychol Rev 114: 784-805.
  189. Takahashi YK, Roesch MR, Stalnaker TA, Haney RZ, Calu DJ, et al. (2009) The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron 62: 269-280.
  190. McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G (2011) Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J Neurosci 31: 2700-2705.
  191. McDannald MA, Takahashi YK, Lopatina N, Pietras BW, Jones JL, et al. (2012) Model-based learning and the contribution of the orbitofrontal cortex to the model-free world. Eur J neurosci 35: 991-996.
  192. Cleland GG, Davey GCL (1983) Autoshaping in the rat: The effects of localizable visual and auditory signals for food. J Exp Anal Behav 40: 47-56.
  193. Meyer PJ, Aldridge JW, Robinson TE (2010) Auditory and visual cues are differentially attributed with incentive salience but similarly affected by amphetamine, 2010 neuroscience meeting planner. In: Society for Neuroscience Annual Meeting (SfN10).
  194. Schmajuk NA, Lam YW, Gray JA (1996) Latent inhibition: A neural network approach. J Exp Psychol Anim Behav Process 22: 321-349.
  195. Balkenius C (1999) Dynamics of a classical conditioning model. Auton Robots 7: 41-56.
  196. Stout SC, Miller RR (2007) Sometimes-competing retrieval (SOCR): A formalization of the comparator hypothesis. Psychol Rev 114: 759-783.
  197. Courville AC, Daw ND, Touretzky DS (2006) Bayesian theories of conditioning in a changing world. Trends Cogn Sci 10: 294-300.
  198. Gershman SJ, Niv Y (2012) Exploring a latent cause theory of classical conditioning. Anim Learn Behav 40: 255-268.
  199. Kamin LJ (1967) Predictability, surprise, attention, and conditioning. In: Campbell BA, Church RMa, editors, Punishment and aversive behavior, New York: Appleton-Century-Crofts. pp. 279-296.
  200. Lattal KM, Nakajima S (1998) Overexpectation in appetitive pavlovian and instrumental conditioning. Anim Learn Behav 26: 351-360.
  201. Bellman R (1957) Dynamic programming. Princeton University Press.
  202. Khamassi M, Martinet LE, Guillot A (2006) Combining self-organizing maps with mixtures of experts: application to an actor-critic model of reinforcement learning in the basal ganglia. In: From Animals to Animats 9, Springer. pp. 394- 405.
  203. Elfwing S, Uchibe E, Doya K (2013) Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces. Front Neurorobot 7: 3.
  204. Boutilier C, Dearden R, Goldszmidt M (2000) Stochastic dynamic programming with factored representations. Artif Intell 121: 49-107.
  205. Degris T, Sigaud O, Wuillemin PH (2006) Learning the structure of factored markov decision processes in reinforcement learning problems. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp. 257-264.
  206. Vigorito CM, Barto AG (2008) Autonomous hierarchical skill acquisition in factored mdps. In: Yale Workshop on Adaptive and Learning Systems, New Haven, Connecticut. volume 63, p. 109.
  207. Guitart-Masip M, Huys QJM, Fuentemilla L, Dayan P, Duzel E, et al. (2012) Go and no-go learning in reward and punishment: interactions between affect and effect. Neuroimage 62: 154-166.
  208. Huys QJM, Cools R, Go ¨lzer M, Friedel E, Heinz A, et al. (2011) Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Comput Biol 7: e1002028.
  209. Yin HH, Ostlund SB, Balleine BW (2008) Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur J neurosci 28: 1437-1448.
  210. Corbit LH, Balleine BW (2005) Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of pavlovian- instrumental transfer. J Neurosci 25: 962-970.
  211. Balsam PD, Payne D (1979) Intertrial interval and unconditioned stimulus durations in autoshaping. Anim Learn Behav 7: 477-482.
  212. Gibbon J, Balsam P (1981) Spreading association in time, Academic Press. pp. 219-253.
  213. Gallistel CR, Gibbon J (2000) Time, rate, and conditioning. Psychol Rev 107: 289-344.
  214. Tomie A, Festa ED, Sparta DR, Pohorecky LA (2003) Lever conditioned stimulus-directed autoshaping induced by saccharin-ethanol unconditioned stimulus solution: effects of ethanol concentration and trial spacing. Alcohol 30: 35-44.
  215. Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9: 1057- 1063.
  216. Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10: 1615-1624.
  217. Bellot J, Sigaud O, Khamassi M (2012) Which temporal difference learning algorithm best reproduces dopamine activity in a multi-choice task? In: From Animals to Animats 12, Springer. pp. 289-298.
  218. Tomie A, Lincks M, Nadarajah SD, Pohorecky LA, Yu L (2012) Pairings of lever and food induce pavlovian conditioned approach of sign-tracking and goal- tracking in c57bl/6 mice. Behav Brain Res 226: 571-578.
  219. Kobayashi S, Schultz W (2008) Influence of reward delays on responses of dopamine neurons. J Neurosci 28: 7837-7846.
  220. Daw ND, Courville AC, Touretzky DS (2006) Representation and timing in theories of the dopamine system. Neural Comput 18: 1637-1677.
  221. Fiorillo CD, Newsome WT, Schultz W (2008) The temporal precision of reward prediction in dopamine neurons. Nat Neurosci 11: 966-973.
  222. Gurney KN, Humphries MD, Wood R, Prescott TJ, Redgrave P (2004) Testing computational hypotheses of brain systems function: a case study with the basal ganglia. Network 15: 263-290.
  223. Robinson MJF, Berridge KC (2013) Instant transformation of learned repulsion into motivational ''wanting''. Current Biology 23: 282-289.
  224. Panlilio LV, Thorndike EB, Schindler CW (2007) Blocking of conditioning to a cocaine-paired stimulus: testing the hypothesis that cocaine perpetually produces a signal of larger-than-expected reward. Pharmacol Biochem Behav 86: 774- 777.
  225. Redish AD (2004) Addiction as a computational process gone awry. Science 306: 1944-1947.
  226. Daw ND, Niv Y, Dayan P (2006) Actions, policies, values and the basal ganglia. In: Bezard E, editor, Recent Breakthroughs in Basal Ganglia Research, Nova Science Publishers, Inc Hauppauge, NY. pp. 91-106.
  227. Yin HH, Knowlton BJ (2006) The role of the basal ganglia in habit formation. Nat Rev Neurosci 7: 464-476.
  228. Thorn CA, Atallah H, Howe M, Graybiel AM (2010) Differential dynamics of activity changes in dorsolateral and dorsomedial striatal loops during learning. Neuron 66: 781-795.
  229. Bornstein AM, Daw ND (2011) Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr Opin Neurobiol 21: 374-380.
  230. van der Meer M, Kurth-Nelson Z, Redish AD (2012) Information processing in decision-making systems. Neuroscientist 18: 342-359.
  231. Flagel SB, Cameron CM, Pickup KN, Watson SJ, Akil H, et al. (2011) A food predictive cue must be attributed with incentive salience for it to induce c-fos mRNA expression in cortico-striatalthalamic brain regions. Neuroscience 196: 80-96.
  232. Mink JW (1996) The basal ganglia: focused selection and inhibition of competing motor programs. Prog Neurobiol 50: 381-425.
  233. Redgrave P, Prescott TJ, Gurney K (1999) The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89: 1009-1023.
  234. Gurney K, Prescott TJ, Redgrave P (2001) A computational model of action selection in the basal ganglia. I. A new functional anatomy. Biol Cybern 84: 401-410.
  235. Baird III LC (1993) Advantage updating. Technical report, DTIC Document.
  236. Dayan P, Balleine BW (2002) Reward, motivation, and reinforcement learning. Neuron 36: 285-298.
  237. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3: 79-87.
  238. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6: 182-197.
  239. Mouret JB, Doncieux S (2010) SFERESv2: Evolvin' in the Multi-Core World. In: WCCI 2010 IEEE World Congress on Computational Intelligence, Congress on Evolutionary Computation (CEC). pp. 4079-4086.
  240. Alexander WH, Brown JW. 2010. Computational Models of Performance Monitoring and Cognitive Control. Topics in Cognitive Science. 2: 658-677.
  241. Alexander WH, Brown JW. 2011. Medial prefrontal cortex as an action-outcome predictor. Nat Neurosci. 14: 1338-1344.
  242. Amiez C, Joseph JP, Procyk E. 2005. Anterior cingulate error-related activity is modulated by predicted reward. Eur J Neurosci. 21: 3447-3452.
  243. Amiez C, Neveu R, Warrot D, Petrides M, Knoblauch K, Procyk E. 2013. The location of feedback-related activity in the midcingulate cortex is predicted by local morphology. J Neurosci. 33: 2217-2228.
  244. Aston-Jones G, Cohen JD. 2005. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu Rev Neurosci. 28: 403-450.
  245. Barkley RA. 2001. Linkages between attention and executive functions. In: Reid Lyon G, Krasnegor NA, eds. Attention, memory and executive function P.H. Brooks p 307-326.
  246. Barraclough DJ, Conroy ML, Lee D. 2004. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci. 7: 404-410.
  247. Bartumeus F, da Luz MG, Viswanathan GM, Catalan J. 2005. Animal search strategies: a quantitative random- walk analysis. Ecology. 86: 3078-2087.
  248. Behrens TE, Hunt LT, Rushworth MF. 2009. The computation of social behavior. Science. 324: 1160-1164.
  249. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. 2007. Learning the value of information in an uncertain world. Nat Neurosci. 10: 1214-1221.
  250. Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. 2001. Conflict monitoring and cognitive control. Psychol Rev. 108: 624-652.
  251. Brown JW, Braver TS. 2005. Learned predictions of error likelihood in the anterior cingulate cortex. Science. 307: 1118-1121.
  252. Cohen JD, Aston-Jones G, Gilzenrat MS. 2004. A systems-level perspective on attention and cognitive control. In: Posner MI, ed. Cognitive Neuroscience of attention New York: Guilford p 71-90.
  253. Cohen JD, McClure SM, Yu AJ. 2007. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B Biol Sci. 362: 933-942.
  254. Collins AG, Frank MJ. 2012. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci. 35: 1024-1035.
  255. Daw ND. 2011. Trial-by-trial data analysis using computational models. In: Affect, Learning and Decision Making. New York: Oxford University Press
  256. Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. 2006. Cortical substrates for exploratory decisions in humans. Nature. 441: 876-879.
  257. De Lillo C, Visalerberghi E, Aversano M. 1997. The organization of exhaustive searches in a patchy space by capuchin monkeys (cebus apella). J Comp Psychol. 111.
  258. Dehaene S, Kerszberg M, Changeux JP. 1998. A neuronal model of a global workspace in effortful cognitive tasks. Proc Natl Acad Sci U S A. 95: 14529-14534.
  259. Desrochers TM, Jin DZ, Goodman ND, Graybiel AM. 2010. Optimal habits can develop spontaneously through sensitivity to local cost. Proc Natl Acad Sci U S A. 107: 20512-20517.
  260. Domenech P, Dreher JC. 2010. Decision threshold modulation in the human brain. J Neurosci. 30: 14305-14317.
  261. Dosenbach NU, Visscher KM, Palmer ED, Miezin FM, Wenger KK, Kang HC, Burgund ED, Grimes AL, Schlaggar BL, Petersen SE. 2006. A core system for the implementation of task sets. Neuron. 50: 799-812.
  262. Doya K. 2002. Metalearning and neuromodulation. Neural Netw. 15: 495-506.
  263. Durstewitz D, Seamans JK. 2008. The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-o-methyltransferase genotypes and schizophrenia. Biol Psychiatry. 64: 739-749.
  264. Durstewitz D, Vittoz NM, Floresco SB, Seamans JK. 2010. Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron. 66: 438-448.
  265. Enomoto K, Matsumoto N, Nakai S, Satoh T, Sato TK, Ueda Y, Inokawa H, Haruno M, Kimura M. 2011. Dopamine neurons learn to encode the long-term value of multiple future rewards. Proc Natl Acad Sci U S A. 108: 15462-15467.
  266. Fonio E, Benjamini Y, Golani I. 2009. Freedom of movement and the stability of its unfolding in free exploration of mice. Proc Natl Acad Sci U S A. 106: 21335-21340.
  267. Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. 2009. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci. 12: 1062-1068.
  268. Holroyd CB, Coles MG. 2002. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol Rev. 109: 679-709.
  269. Humphries MD, Khamassi M, Gurney K. 2012. Dopaminergic Control of the Exploration-Exploitation Trade-Off via the Basal Ganglia. Frontiers in neuroscience. 6: 9.
  270. Ishii S, Yoshida W, Yoshimoto J. 2002. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15: 665-687.
  271. Ito M, Doya K. 2009. Validation of decision-making models and analysis of decision variables in the rat basal ganglia. J Neurosci. 29: 9861-9874.
  272. Kaping D, Vinck M, Hutchison RM, Everling S, Womelsdorf T. 2011. Specific contributions of ventromedial, anterior cingulate, and lateral prefrontal cortex for attentional selection and stimulus valuation. PLoS Biol. 9: e1001224.
  273. Kennerley SW, Wallis JD. 2009. Evaluating choices by single neurons in the frontal lobe: outcome value encoded across multiple decision variables. Eur J Neurosci. 29: 2061-2073.
  274. Kennerley SW, Walton ME. 2011. Decision making and reward in frontal cortex: complementary evidence from neurophysiological and neuropsychological studies. Behav Neurosci. 125: 297-317.
  275. Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. 2006. Optimal decision making and the anterior cingulate cortex. Nat Neurosci. 9: 940-947.
  276. Kerns JG, Cohen JD, MacDonald AW, 3rd, Cho RY, Stenger VA, Carter CS. 2004. Anterior cingulate conflict monitoring and adjustments in control. Science. 303: 1023-1026.
  277. Khamassi M, Enel P, Dominey PF, Procyk E. 2013. Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. Prog Brain Res. 202: 441-464.
  278. Khamassi M, Lallee S, Enel P, Procyk E, Dominey PF. 2011. Robot cognitive control with a neurophysiologically inspired reinforcement learning model. Front Neurorobot. 5: 1.
  279. Kolling N, Behrens TE, Mars RB, Rushworth MF. 2012. Neural mechanisms of foraging. Science. 336: 95-98.
  280. Kouneiher F, Charron S, Koechlin E. 2009. Motivation and cognitive control in the human prefrontal cortex. Nat Neurosci. 12: 939-945.
  281. Krichmar JL. 2008. The neuromodulatory system -a framework for survival and adaptive behavior in a challenging world. Adapt Behav. 16: 385-399.
  282. Landmann C, Dehaene S, Pappata S, Jobert A, Bottlaender M, Roumenov D, Le Bihan D. 2007. Dynamics of prefrontal and cingulate activity during a reward-based logical deduction task. Cereb Cortex. 17: 749-759.
  283. Lau B. 2014. Matlab code for diagnosing collinearity in a regression design matrix. figshare. http://dx.doi.org/10.6084/m9.figshare.1008225.
  284. Leung HC, Gore JC, Goldman-Rakic PS. 2002. Sustained mnemonic response in the human middle frontal gyrus during on-line storage of spatial memoranda. J Cogn Neurosci. 14: 659-671.
  285. Luk CH, Wallis JD. 2009. Dynamic encoding of responses and outcomes by neurons in medial prefrontal cortex. J Neurosci. 29: 7526-7539.
  286. MacDonald AW, 3rd, Cohen JD, Stenger VA, Carter CS. 2000. Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science. 288: 1835-1838.
  287. Matsumoto M, Matsumoto K, Abe H, Tanaka K. 2007. Medial prefrontal cell activity signaling prediction errors of action values. Nat Neurosci. 10: 647-656.
  288. McClure SM, Gilzenrat MS, Cohen JD. 2006. An exploration-exploitation model based on norepinephrine and dopamine activity. In: Weiss Y, Sholkopf B, Platt J, eds. Advances in neural information processing systems MIT Press, Cambridge, MA p 867-874.
  289. Miller EK, Cohen JD. 2001. An integrative theory of prefrontal cortex function. Annu Rev Neurosci. 24: 167-202.
  290. Panzeri S, Senatore R, Montemurro MA, Petersen RS. 2007. Correcting for the sampling bias problem in spike train information measures. J Neurophysiol. 98: 1064-1072.
  291. Panzeri S, Treves A. 1996. Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems. 7: 87-107.
  292. Procyk E, Goldman-Rakic PS. 2006. Modulation of dorsolateral prefrontal delay activity during self-organized behavior. J Neurosci. 26: 11313-11323.
  293. Procyk E, Tanaka YL, Joseph JP. 2000. Anterior cingulate activity during routine and non-routine sequential behaviors in macaques. Nat Neurosci. 3: 502-508.
  294. Quian Quiroga R, Panzeri S. 2009. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci. 10: 173-185.
  295. Quilodran R, Rothé M, Procyk E. 2008. Behavioral shifts and action valuation in the anterior cingulate cortex. Neuron. 57(2): 314-325.
  296. Rothe M, Quilodran R, Sallet J, Procyk E. 2011. Coordination of High Gamma Activity in Anterior Cingulate and Lateral Prefrontal Cortical Areas during Adaptation. J Neurosci. 31: 11110-11117.
  297. Rushworth MF, Behrens TE. 2008. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat Neurosci. 11: 389-397.
  298. Satoh T, Nakai S, Sato T, Kimura M. 2003. Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci. 23: 9913-9923.
  299. Schultz W, Dayan P, Montague PR. 1997. A neural substrate of prediction and reward. Science. 275: 1593-1599.
  300. Schweighofer N, Doya K. 2003. Meta-learning in reinforcement learning. Neural Netw. 16: 5-9.
  301. Seo H, Lee D. 2007. Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed- strategy game. J Neurosci. 27: 8366-8377.
  302. Seo H, Lee D. 2008. Cortical mechanisms for reinforcement learning in competitive games. Philos Trans R Soc Lond B Biol Sci. 363: 3845-3857.
  303. Seo H, Lee D. 2009. Behavioral and neural changes after gains and losses of conditioned reinforcers. J Neurosci. 29: 3627-3641.
  304. Sutton RS, Barto AG. 1998. Reinforcement learning: an introduction. Cambridge, MA London, England: MIT Press.
  305. Treves A, Panzeri S. 1995. The upward bias in measures of information derived from limited data samples. Neural Comput. 7: 399-407.
  306. Vogt BA, Vogt L, Farber NB, Bush G. 2005. Architecture and neurocytology of monkey cingulate gyrus. J Comp Neurol. 485: 218-239.
  307. Wang XJ. 2010. Neurophysiological and computational principles of cortical rhythms in cognition. Physiol Rev. 90: 1195-1268.
  308. Wilson CR, Gaffan D, Browning PG, Baxter MG. 2010. Functional localization within the prefrontal cortex: missing the forest for the trees? Trends Neurosci. 33: 533-540. References
  309. B. W Balleine and J. P O'Doherty. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology : official publication of the American College of Neuropsychopharmacology, 35(1):48-69, January 2010.
  310. H. M Bayer and P. W Glimcher. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47(1):129-141, 2005.
  311. D. P Bertsekas and J. N Tsitsiklis. Neuro-dynamic programming: An overview. In Decision and Control, 1995., Proceedings of the 34th IEEE Conference on, volume 1, pages 560-564. IEEE, 1995.
  312. S. Bhatnagar, M. Ghavamzadeh, M. Lee, and R. S Sutton. Incremental natural actor-critic algorithms. In Advances in neural information pro- cessing systems, pages 105-112, 2007.
  313. F. Brischoux, S. Chakraborty, D. I Brierley, and M. a Ungless. Phasic ex- citation of dopamine neurons in ventral VTA by noxious stimuli. Proceed- ings of the National Academy of Sciences of the United States of America, 106(12):4894-9, March 2009.
  314. N. D Daw. Dopamine: at the intersection of reward and action. Nat Neurosci, 10(12):1505-1507, December 2007.
  315. N. D Daw, Y. Niv, and P. Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci, 8(12):1704-1711, December 2005.
  316. J. J Day, J. L Jones, R M. Wightman, and R. M Carelli. Phasic nucleus accumbens dopamine release encodes effort-and delay-related costs. Bi- ological psychiatry, 68(3):306-9, August 2010.
  317. L. Dollé, D. Sheynikhovich, B. Girard, R. Chavarriaga, and A. Guillot. Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biological cybernetics, 103(4):299-317, October 2010.
  318. K. Doya. Reinforcement learning: Computational theory and biological mechanisms. HFSP Journal, 1(1):30, 2007.
  319. K. Enomoto, N. Matsumoto, S. Nakai, T. Satoh, T. K Sato, Y. Ueda, H. Inokawa, M. Haruno, and M. Kimura. Dopamine neurons learn to encode the long-term value of multiple future rewards. PNAS, 2011.
  320. C. D Fiorillo. Two dimensions of value: dopamine neurons represent reward but not aversiveness. Science (New York, N.Y.), 341(6145):546-9, August 2013.
  321. C. D Fiorillo, P. N Tobler, and W. Schultz. Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299(5614):1898, 2003.
  322. S. B Flagel, J. J Clark, T. E Robinson, L. Mayo, A. Czuj, I. Willuhn, C. A Akers, S. M Clinton, P. EM Phillips, and H. Akil. A selective role for dopamine in stimulus-reward learning. Nature, 469(7328):53-57, 2010.
  323. S. N. Haber. The primate basal ganglia: parallel and integrative networks. Journal of Chemical Neuroanatomy, 26(4):317-330, December 2003.
  324. S. N Haber and R. Calzavara. The cortico-basal ganglia integrative net- work: the role of the thalamus. Brain research bulletin, 78(2-3):69-74, February 2009.
  325. S. N Haber, J. L Fudge, and N. R McFarland. Striatonigrostriatal path- ways in primates form an ascending spiral from the shell to the dorsolat- eral striatum. The Journal of neuroscience : the official journal of the Society for Neuroscience, 20(6):2369-82, March 2000.
  326. J. R Hollerman and W. Schultz. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci, 1(4):304-309, 1998.
  327. M. Ito and K. Doya. Multiple representations and algorithms for rein- forcement learning in the cortico-basal ganglia circuit. Current opinion in neurobiology, 21(3):368-73, June 2011.
  328. D. Joel, Y. Niv, and E. Ruppin. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks, 15(4- 6):535-547, 2002.
  329. D. Joel and I. Weiner. Commentary the connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuro- science, 96(3):451-474, 2000.
  330. M. Keramati, A. Dezfouli, and P. Piray. Speed/Accuracy Trade-Off be- tween the habitual and the Goal-Directed processes. PLoS Comput Biol, 7(5):e1002055, May 2011.
  331. V. R Konda and J. N Tsitsiklis. Actor-critic algorithms. In NIPS, pages 1008-1014. Citeseer, 1999.
  332. S. Lammel, D. I Ion, J. Roeper, and R. C Malenka. Projection-specific modulation of dopamine neuron synapses by aversive and rewarding stim- uli. Neuron, 70(5):855-62, June 2011.
  333. S. Lammel, B. K. Lim, C. Ran, K. W. Huang, M. J Betley, K. M Tye, K. Deisseroth, and R. C Malenka. Input-specific control of reward and aversion in the ventral tegmental area. Nature, October 2012.
  334. F. Lesaint, O. Sigaud, S. B. Flagel, T. E. Robinson, and M. Khamassi. Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations. PLoS Computational Biology, 2014.
  335. T. Ljungberg, P. Apicella, and W. Schultz. Responses of monkey dopamine neurons during learning of behavioral reactions. Journal of Neurophysiology, 67(1):145 -163, January 1992.
  336. K. Lloyd, N. Becker, M. W Jones, and R. Bogacz. Learning to use work- ing memory: a reinforcement learning gating model of rule acquisition in rats. Frontiers in computational neuroscience, 6(October):87, January 2012.
  337. M. Matsumoto and O. Hikosaka. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature, 459(7248):837-841, 2009.
  338. J. Mirenowicz and W. Schultz. Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72(2):1024 -1027, 1994.
  339. G. Morris, A. Nevet, D. Arkadir, E. Vaadia, and H. Bergman. Mid- brain dopamine neurons encode decisions for future action. Nat Neurosci, 9(8):1057-1063, 2006.
  340. Y. Niv, N. D Daw, and P. Dayan. Choice values. Nature neuroscience, 9(8):987-988, 2006.
  341. Y. Niv, M.O. Duff, and P. Dayan. Dopamine, uncertainty and td learning. Behavioral and Brain Functions, 1:6:1-9, 2005.
  342. M. R Roesch, D. J Calu, and G. Schoenbaum. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci, 10(12):1615-1624, December 2007.
  343. K. Samejima and K. Doya. Multiple representations of belief states and action values in corticobasal ganglia loops. Annals of the New York Academy of Sciences, 1104:213-28, May 2007.
  344. W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1):1 -27, July 1998.
  345. W. Schultz, P. Dayan, and P. R. Montague. A neural substrate of predic- tion and reward. Science, 275(5306):1593 -1599, March 1997.
  346. W. Schultz and R. Romo. Responses of nigrostriatal dopamine neurons to high-intensity somatosensory stimulation in the anesthetized monkey. Journal of Neurophysiology, 57(1):201-217, 1987.
  347. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, March 1998.
  348. S. C Tanaka, K. Doya, G. Okada, K. Ueda, Y. Okamoto, and S. Yamawaki. Prediction of immediate and future rewards differentially recruits cortico- basal ganglia loops. Nature Neuroscience, 7(8):887-893, 2004.
  349. D. V Wang and J. Z Tsien. Convergent processing of both positive and negative motivational signals by the VTA dopamine neuronal populations. PloS one, 6(2):e17047, January 2011.
  350. H. H Yin and B. J Knowlton. The role of the basal ganglia in habit for- mation. Nature reviews. Neuroscience, 7(6):464-76, June 2006.
  351. H. H Yin, B. J Knowlton, and B. W Balleine. Lesions of dorsolat- eral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. European Journal of Neuroscience, 19(August 2003):181-189, 2004.
  352. H. H Yin, S. B Ostlund, B. J Knowlton, and B. W Balleine. The role of the dorsomedial striatum in instrumental conditioning. The European journal of neuroscience, 22(2):513-23, July 2005.
  353. Meyer J-A, Guillot A, Girard B, Khamassi M, Pirim P and Berthoz A 2005 The Psikharpax project: towards building an artificial rat Robot. Auton. Syst. 50 211-23
  354. N'Guyen S, Pirim P and Meyer J-A 2010 Tactile texture discrimination in the robot-rat Psikharpax BIOSIGNALS 2010: 3rd Int. Conf. on Bio-Inspired Systems and Signal Processing (Valencia, Spain) pp 74-81
  355. Bernard M, N'Guyen S, Pirim P, Gas B and Meyer J-A 2010 Phonotaxis behavior in the artificial rat Psikharpax IRIS2010: Int. Symp. on Robotics and Intelligent Sensors (Nagoya, Japan) pp 118-22
  356. N'Guyen S 2010 Mise au point du système vibrissal du robot-rat Psikharpax et contribution à la fusion de ses capacités visuelle, auditive et tactile PhD Thesis Université Pierre et Marie Curie
  357. N'Guyen S, Pirim P, Meyer J-A and Girard B 2010 An integrated neuromimetic model of the saccadic eye movements for the Psikharpax robot SAB '10: Proc. 11th
  358. Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats (Paris, France, 25-28 August 2010) (LNAI vol 6226) ed S Doncieux et al (Berlin: Springer) pp 188-98
  359. Trullier O, Wiener S, Berthoz A and Meyer J-A 1997 Biologically-based artificial navigation systems: review and prospects Prog. Neurobiol. 51 483-544
  360. Redish A D 1999 Beyond the Cognitive Map: From Place Cells to Episodic Memory (Cambridge, MA: MIT Press)
  361. Khamassi M 2007 Complementary roles of the rat prefrontal cortex and striatum in reward-based learning and shifting navigation strategies PhD Thesis Université Pierre et Marie Curie
  362. Arleo A and Rondi-Reig L 2007 Multimodal sensory integration and concurrent navigation strategies for spatial cognition in real and artificial organisms J. Integr. Neurosci. 6 327-66
  363. Morris R 1984 Developments of a water-maze procedure for studying spatial learning in the rat J. Neurosci. Methods 11 47-60
  364. Packard M G, Hirsh R and White N M 1989 Differential effects of fornix and caudate nucleus lesions on two radial maze tasks: evidence for multiple memory systems J. Neurosci. 9 1465-72
  365. Burgess N 2008 Spatial cognition and the brain Annal. NY Acad. Sci. 1124 77-97
  366. Alexander G E, DeLong M R and Strick P L 1986 Parallel organization of functionally segregated circuits linking basal ganglia and cortex Annu. Rev. Neurosci. 9 357-81
  367. Mink J W 1996 The basal ganglia: focused selection and inhibition of competing motor programs Prog. Neurobiol. 50 381-425
  368. Houk J C, Adams J L and Barto A G 1995 A model of how the basal ganglia generate and use neural signals that predict reinforcement Models of Information Processing in the Basal Ganglia ed J C Houk, J L Davis and D G Beiser (Cambridge, MA: MIT Press) pp 249-71
  369. Graybiel A M 1998 The basal ganglia and chunking of action repertoires Neurobiol. Learn. Memory 70 119-36
  370. Yin H H and Knowlton B J 2006 The role of the basal ganglia in habit formation Nature Rev. Neurosci. 7 464-76
  371. Devan B D and White N M 1999 Parallel information processing in the dorsal striatum: relation to hippocampal function J. Neurosci. 19 2789-98
  372. Packard M G and Knowlton B J 2002 Learning and memory functions of the basal ganglia Annu. Rev. Neurosci. 25 563-93
  373. Morris R G M 1981 Spatial localization does not require the presence of local cues Learn. Motiv. 12 239-60
  374. O'Keefe J and Nadel L 1978 The Hippocampus as a Cognitive Map (Oxford: Clarendon)
  375. Yin H H and Knowlton B J 2004 Contributions of striatal subregions to place and response learning Learn. Memory 11 459-63
  376. Albertin S V, Mulder A B, Tabuchi E, Zugaro M B and Wiener S I 2000 Lesions of the medial shell of the nucleus accumbens impair rats in finding larger rewards, but spare reward-seeking behavior Behav. Brain Res. 117 173-83
  377. Pfeifer R, Lungarella M and Iida F 2007 Self-organization, embodiment, and biologically inspired robotics Science 318 1088-93
  378. Arbib M, Metta G and van der Smagt P 2008 Neurorobotics: from vision to action Handbook of Robotics (Berlin: Springer) pp 1453-80
  379. Meyer J-A and Guillot A 2008 Biologically-inspired robots Handbook of Robotics ed B Siciliano and O Khatib (Berlin: Springer) pp 1395-422
  380. Arleo A and Gerstner W 2000 Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity Biol. Cybern. 83 287-99
  381. Krichmar J L, Seth A K, Nitz D A, Fleischer J G and Edelman G M 2005 Spatial navigation and causal analysis in a brain-based device modeling corticalhippocampal interactions Neuroinformatics 3 147-69
  382. Fleischer J G, Gally J A, Edelman G M and Krichmar J L 2007 Retrospective and prospective responses arising in a modeled hippocampus during maze navigation by a brain-based device Proc. Natl Acad. Sci. 104 3556-61
  383. Barrera A and Weitzenfeld A 2008 Biologically-inspired robot spatial cognition based on rat neurophysiological studies Auton. Robots 25 147-69
  384. Giovannangeli C and Gaussier P 2008 Autonomous vision-based navigation: goal-oriented action planning by transient states prediction, cognitive map building, and sensory-motor learning Proc. Int. Conf. on Intelligent Robots and Systems vol 1 (Berkeley, CA: University of California Press) pp 281-97
  385. Milford M and Wyeth G 2010 Persistent navigation and mapping using a biologically inspired slam system Int. J. Robot. Res. 29 1131-53
  386. Dollé L, Sheynikhovich D, Girard B, Chavarriaga R and Guillot A 2010 Path planning versus cue responding: a bioinspired model of switching between navigation strategies Biol. Cybern. 103 299-317
  387. Pearce J M, Roberts A D and Good M 1998 Hippocampal lesions disrupt navigation based on cognitive maps but not heading vectors Nature 396 75-7
  388. Ragozzino M E, Detrick S and Kesner R P 1999 Involvement of the prelimbic-infralimbic areas of the rodent prefrontal cortex in behavioral flexibility for place and response learning J. Neurosci. 19 4585-94
  389. Birrell J M and Brown V J 2000 Medial frontal cortex mediates perceptual attentional set shifting in the rat J. Neurosci. 20 4320-4
  390. Killcross S and Coutureau E 2003 Coordination of actions and habits in the medial prefrontal cortex of rats Cereb. Cortex 13 400-8
  391. Ujfalussy B, Erős P, Somogyvári Z and Kiss T 2008 Episodes in space: a modeling study of hippocampal place representation SAB '08: Proc. 10th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats (Osaka, Japan, 7-12 July 2008) (LNAI vol 5040) ed M Asada et al (Berlin: Springer) pp 123-36
  392. Block M T 1969 A note on refraction and image formation of rats eye Vis. Res. 9 705-11
  393. Parker A J 2007 Binocular depth perception and the cerebral cortex Nature Rev. Neurosci. 8 379-91
  394. Dollé L, Sheynikhovich D, Girard B, Ujfalussy B, Chavariagga R and Guillot A 2010 Analyzing interactions between cue-guided and place-based navigation with a computational model of action selection: influence of sensory cues and training SAB '10: Proc. 11th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats (Paris, France, 25-28 August 2010) (LNAI vol 6226) ed S Doncieux et al (Berlin: Springer) pp 335-46
  395. Pearson K 1901 On lines and planes of closest fit to systems of points in space Phil. Mag. 2 559-72
  396. Kohonen T, Schroeder M R and Huang T S 2001 Self-Organizing Maps (Secaucus, NJ: Springer)
  397. Fritzke B 1995 A growing neural gas network learns topologies Advances in Neural Information Processing Systems 7 (Cambridge, MA: MIT Press) pp 625-32
  398. Bishop C M 2007 Pattern Recognition and Machine Learning (Berlin: Springer)
  399. Schölkopf B, Smola A and Muller K-R 1998 Nonlinear component analysis as a kernel eigenvalue problem Neural Comput. 10 1299-319
  400. Kim K I, Franz M O and Schölkopf B 2003 Kernel Hebbian algorithm for iterative kernel principal component analysis Technical Report Max Planck Institute for Biological Cybernetics
  401. Hasselmo M E 2005 A model of prefrontal cortical mechanisms for goal-directed behavior J. Cognitive Neurosci. 17 1115-29
  402. Martinet L-E, Sheynikhovich D, Benchenane K and Arleo A 2011 Spatial learning and action planning in a prefrontal cortical network model PLoS Comput. Biol. 7 e1002045
  403. Banquet J P, Gaussier P, Quoy M, Revel A and Burnod Y 2005 A hierarchy of associations in hippocampo-cortical systems: cognitive maps and navigation strategies Neural Comput. 17 1339-84
  404. Cuperlier N, Quoy M and Gaussier P 2007 Neurobiologically inspired mobile robot navigation and planning Front. Neurorobot. 1 3
  405. Floyd R W 1962 Algorithm 97: shortest path Commun. ACM 5 345
  406. Sutton R S and Barto A G 1998 Reinforcement Learning: An Introduction (Cambridge, MA: MIT Press)
  407. Khamassi M, Martinet L E and Guillot A 2006 Combining self-organizing maps with mixture of experts: application to an actor-critic model of reinforcement learning in the basal ganglia SAB '06: Proc. 9th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats (Rome, Italy, 25-29 September 2006) (LNAI vol 4095) ed S Nolfi et al (Berlin: Springer) pp 394-405
  408. Packard M and McGaugh J 1996 Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects the expression of place and response learning Neurobiol. Learn. Memory 65 65-72
  409. Save E and Poucet B 2000 Hippocampal-parietal cortical interactions in spatial cognition Hippocampus 10 491-9
  410. Eilam D and Golani I 1989 Home base behavior of rats (rattus norvegicus) exploring a novel environment Behav. Brain Res. 34 199-211
  411. Doya K 2000 Complementary roles of basal ganglia and cerebellum in learning and motor control Curr. Opin. Neurobiol. 10 732-9
  412. Samejima K, Ueda Y, Doya K and Kimura M 2005 Representation of action-specific reward values in the striatum Science 310 1337-40
  413. Roesch M R, Calu D J and Schoenbaum G 2007 Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards Nature Neurosci. 10 1615-24
  414. Matsumoto M, Matsumoto K, Abe H and Tanaka K 2007 Medial prefrontal cell activity signaling prediction errors of action values Nature Neurosci. 10 647-56
  415. Chavarriaga R, Strösslin T, Sheynikhovich D and Gerstner W 2005 A computational model of parallel navigation systems in rodents Neuroinformatics 3 223-41
  416. Daw N D, O'Doherty J P, Dayan P, Seymour B and Dolan R J 2006 Cortical substrates for exploratory decisions in humans Nature 441 876-9
  417. Watkins C J C H and Dayan P 1992 Technical note: Q-learning Mach. Learn. 8 279-92
  418. Miller E K and Cohen J D 2001 An integrative theory of prefrontal cortex function Annu. Rev. Neurosci. 24 167-202
  419. Peyrache A, Khamassi M, Benchenane K, Wiener S I and Battaglia F P 2009 Replay of rule-learning related neural patterns in the prefrontal cortex during sleep Nature Neurosci. 12 919-26
  420. Uylings H B, Groenewegen H J and Kolb B 2003 Do rats have a prefrontal cortex? Behav. Brain Res. 146 3-17
  421. Granon S and Poucet B 2000 Involvement of the rat prefrontal cortex in cognitive functions: a central role for the prelimbic area Psychobiology 28 229-37
  422. Baeg E H, Kim Y B, Huh K, Mook-Jung I, Kim H T and Jung M W 2003 Dynamics of population code for working memory in the prefrontal cortex Neuron 40 177-88
  423. Hok V, Save E, Lenck-Santini P P and Poucet B 2005 Coding for spatial goals in the prelimbic/infralimbic area of the rat frontal cortex Proc. Natl Acad Sci. USA 102 4602-7
  424. Mulder A B, Nordquist R E, Orgut O and Pennartz C M 2003 Learning-related changes in response patterns of prefrontal neurons during instrumental conditioning Behav. Brain Res. 146 77-88
  425. Kargo W J, Szatmary B and Nitz D A 2007 Adaptation of prefrontal cortical firing patterns and their fidelity to changes in action-reward contingencies J. Neurosci. 27 3548-59
  426. Daw N D, Niv Y and Dayan P 2005 Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control Nature Neurosci.
  427. Ostlund S B and Balleine B W 2005 Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning J. Neurosci. 25 7763-70
  428. Johnson A and Redish A D 2007 Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point J. Neurosci. 27 12176-89
  429. Johnson A, van der Meer M A A and Redish A D 2007 Integrating hippocampus and striatum in decision-making Curr. Opin. Neurobiol. 17 692-7
  430. Meer M A A van der and Redish A D 2011 Theta phase precession in rat ventral striatum links place and reward information J. Neurosci. 31 2843-54
  431. van der Meer M A A and Redish A D 2011 Ventral striatum: a critical look at models o learning and evaluation Curr. Opin. Neurobiol. 21 387-92
  432. Salazar R F, White W, Lacroix L, Feldon J and White I M 2004 NMDA lesions in the medial prefrontal cortex impair the ability to inhibit responses during reversal of a simple spatial discrimination Behav. Brain Res. 152 413-24
  433. Naneix F, Marchand A R, DiScala G, Pape J R and Coutureau E 2009 A role of the medial prefrontal cortex dopaminergic innervation in instrumental conditioning J. Neurosci. 29 6599-606
  434. Battaglia F P, Peyrache A, Khamassi M and Wiener S I 2008 Spatial decisions and neuronal activity in hippocampal projection zones in prefrontal cortex and striatum Hippocampal Place Fields: Relevance to Learning and Memory ed S Mizumori (Oxford: Oxford University Press) chapter 18, pp 289-311
  435. Rich E L and Shapiro M 2009 Rat prefrontal cortical neurons selectively code strategy switches J. Neurosci. 29 7208-19
  436. Bonasso P and Dean T 1997 A retrospective of the AAAI robot competitions AI Mag. 18 11-23
  437. Gat E 1998 On three-layer architectures Artificial Intelligence and Mobile Robots: Case Studies of Successful Robot Systems ed D Kortenkamp, R P Bonnasso and R Murphy (Cambridge, MA: MIT Press) pp 195-210
  438. Kortenkamp D and Simmons R 2008 Robotic systems architectures and programming Handbook of Robotics ed B Siciliano and O Khatib (Berlin: Springer) pp 187-206
  439. Minguez J, Lamiraux F and Laumond J P 2008 Motion planning and obstacle avoidance Handbook of Robotics ed B Siciliano and O Khatib (Berlin: Springer) pp 827-52
  440. Dickinson A 1985 Actions and habits: The development of behavioural autonomy Phil. Trans. R. Soc. B 308 67-78
  441. Keramati M, Dezfouli A and Piray P 2011 Speed/accuracy trade-off between the habitual and goal-directed processes PLoS Comput. Biol. 7 1-25
  442. Comon P 1994 Independent component analysis, a new concept? Signal Process. 36 287-314
  443. MacQueen J B 1967 Some methods for classification and analysis of multivariate observations Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability vol 1 ed L M Le Cam and J Neyman (Berkeley, CA: University of California Press) pp 281-97
  444. Zito T, Wilbert N, Wiskott L and Berkes P 2009 Modular toolkit for data processing (MDP): a python data processing framework Front. Neuroinform. 2 8
  445. Mahalanobis P C 1936 On the generalised distance in statistics Proc. National Institute of Science (India) vol 2 pp 49-55
  446. B. W. Balleine and A. Dickinson. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology, 37:407- 419, 1998.
  447. B. W. Balleine and J. P. O'Doherty. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neu- ropsychopharmacology, 35:48-69, 2010.
  448. K. Caluwaerts, A. Favre-Félix, M. Staffa, S. N'Guyen, C. Grand, B. Girard, and M. Khamassi. Neuro-inspired navigation strategies shifting for robots: Integration of a multiple landmark taxon strategy. In T.J. et al. Prescott, editor, Living Machines 2012, LNAI, volume 7375/2012, pages 62-73. 2012.
  449. K. Caluwaerts, M. Staffa, S. N'Guyen, C. Grand, L. Dollé, A. Favre-Félix, B. Gi- rard, and M. Khamassi. A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspiration & Biomimetics, 2012.
  450. N.D. Daw, Y. Niv, and P. Dayan. Uncertainty-based competition between pre- frontal and dorsolateral striatal systems for behavioral control. Nature Neuro- science, 8(12):1704-1711, 2005.
  451. Amir Dezfouli and Bernard W. Balleine. Habits, action sequences and reinforce- ment learning. European Journal of Neuroscience, 35(7):1036-1051, 2012.
  452. A. Dickinson. Contemporary animal learning theory. Cambridge: Cambridge Uni- versity Press, 1980.
  453. A. Dickinson. Actions and habits: The development of behavioral autonomy. Philo- sophical Transactions of the Royal Society (London), 308:67 -78, 1985 1985.
  454. L. Dollé, D. Sheynikhovich, B. Girard, R. Chavarriaga, and A. Guillot. Path plan- ning versus cue responding: a bioinspired model of switching between navigation strategies. Biological Cybernetics, 103(4):299-317, 2010.
  455. E. Gat. On three-layer architectures. In Artificial Intelligence and Mobile Robots. MIT Press, 1998.
  456. Q.J. Huys, N. Eshel, E. O'Nions, L. Sheridan, P. Dayan, and J.P. Roiser. Bon- sai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8(3):e1002410, 2012.
  457. L.P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4:237-285, 1996.
  458. M. Keramati, A. Dezfouli, and P. Piray. Speed/accuracy trade-off between the ha- bitual and goal-directed processes. PLoS Computational Biology, 7(5):1-25, 2011.
  459. M. Khamassi and M.D. Humphries. Integrating cortico-limbic-basal ganglia archi- tectures for learning model-based and model-free navigation strategies. Frontiers in Behavioral Neuroscience, 6:79, 2012.
  460. J Kober, D. Bagnell, and J. Peters. Reinforcement learning in robotics: A survey. (11):1238-1274, 2013.
  461. F. Lesaint, O. Sigaud, S. B. Flagel, T. E. Robinson, and M. Khamassi. Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations. PLoS Comput Biol, 10(2):e1003466+, February 2014.
  462. J. Minguez, F. Lamiraux, and J.P. Laumond. Motion planning and obstacle avoid- ance.
  463. In B. Siciliano and O. Khatib, editors, Handbook of Robotics., pages 827-852.
  464. M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng. Ros: an open-source robot operating system. In ICRA Workshop on Open Source Software, 2009.
  465. Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.
  466. C. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cam- bridge, UK, 1989.
  467. H. H. Yin, S. B. Ostlund, and B. W. Balleine. Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal gan- glia networks. Eur J Neurosci, 28:1437-1448, 2008.
  468. References R. Alami, R. Chatila, S. Fleury, M. Ghallab, and F. Ingrand. An architecture for autonomy. International Journal of Robotics Research, 17(4) :315-337, 1998. (Cited on pages 11 and 200.)
  469. R. Alami, A. Clodic, V. Montreuil, E.A. Sisbot, and R. Chatila. Toward human-aware robot task planning. In AAAI Spring Symposium : To Boldly Go Where No Human-Robot Team Has Gone Before, pages 39-46. 2006. (Ci- ted on pages vii, 11, 197, and 198.)
  470. R. Alami, M. Warnier, J. Guitton, S. Lemaignan, and E.A. Sisbot. When the robot considers the human... In O. Khatib and H. Christensen, edi- tors, Proceedings of the 15th International Symposium on Robotics Research. Springer, 2013. (Cited on pages vii, 197, and 198.)
  471. G.E. Alexander, M.D. Crutcher, and M.R. DeLong. Basal ganglia- thalamocortical circuits : parallel substrates for motor, oculomotor, "pre- frontal" and "limbic" functions. Progress in Brain Research, 85 :119-146, 1990. (Cited on page 199.)
  472. W.H. Alexander and O. Sporns. An embodied model of learning, plasti- city, and reward. Adaptive Behavior, 10 :143-159, 2002. (Cited on page 10.)
  473. J.R. Anderson, D. Bothell, M.D. Byrne, S. Douglas, C. Lebiere, and Y. Qin. An integrated theory of the mind. Psychological Review, 111(4) :1036- 1060, 2004. (Cited on page 200.)
  474. A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer. A fast and incremental method for loop-closure detection using bags of visual words. IEEE Transactions On Robotics, Special Issue on Visual SLAM, 24(5), 2008. (Cited on page 12.)
  475. M. Arbib, G. Metta, and P. van der Smagt. Neurorobotics : From vision to action. In B. Siciliano and O. Khatib, editors, Handbook of robotics, pages 1453-1480. Springer-Verlag, Berlin, 2008. (Cited on page 12.)
  476. A. Arleo and W. Gerstner. Spatial cognition and neuro-mimetic naviga- tion : a model of hippocampal place cell activity. Biological Cybernetics, 83(3) :287-299, 2000. (Cited on page 12.)
  477. A. Arleo, F. Smeraldi, and W. Gerstner. Cognitive navigation based on no- nuniform gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Transactions on Neural Networks, 15 :639- 652, 2004. (Cited on page 10.)
  478. References F.G. Ashby, B.O. Turner, and J.C. Horvitz. Cortical and basal ganglia contributions to habit learning and automaticity. Trends in Cognitive Science, 14 :208-215, 2010. (Cited on page 6.)
  479. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the mul- tiarmed bandit problem. Machine Learning, 47 :235-256, 2002. (Cited on pages 9 and 194.)
  480. D. Badre and M. D'Esposito. Is the rostro-caudal axis of the frontal lobe hierarchical ? Nature Reviews Neuroscience, 10 :659-669, 2009. (Cited on page 6.)
  481. G. Baldassarre. A modular neural-network model of the basal ganglia's role in learning and selecting motor behaviours. Cognitive Systems Re- search, 3(1) :5-13, 2002. (Cited on page 5.)
  482. B.W. Balleine and J.P. O'Doherty. Human and rodent homologies in ac- tion control : corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology, 35(1) :48-69, 2010. (Cited on pages 5, 6, and 192.)
  483. A. Barrera, A. Caceres, A. Weitzenfeld, and V. Ramirez-Amaya. Compa- rative experimental studies on spatial memory and learning in rats and robots. Journal of Intelligent and Robotic Systems, 63(3-4) :361-397, 2011. (Cited on page 13.)
  484. F.P. Battaglia, A. Peyrache, M. Khamassi, and S.I. Wiener. Spatial deci- sions and neuronal activity in hippocampal projection zones in prefron- tal cortex and striatum. In S. Mizumori, editor, Hippocampal Place Fields : Relevance to Learning and Memory, chapter 18, pages 289-311. Oxford University Press, 2008. (Cited on pages 9 and 202.)
  485. H.M. Bayer and P.W. Glimcher. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47(1) :129-141, 2005. (Cited on pages 5 and 203.)
  486. T.E. Behrens, M.W. Woolrich, M.E. Walton, and M.F. Rushworth. Learning the value of information in an uncertain world. Nature Neuroscience, 10 (9) :1214-1221, 2007. (Cited on page 10.)
  487. J. Bellot, O. Sigaud, and M. Khamassi. Neuro-inspired navigation strate- gies shifting for robots : Integration of a multiple landmark taxon stra- tegy.
  488. In T. Ziemke, C. Balkenius, and J. Hallam, editors, From Animals to Animats : Proceedings of the 12th International Conference on Adaptive Be- haviour (SAB 2012), pages 289-298. Springer, 2012. (Cited on pages 18 and 202.)
  489. J. Bellot, O. Sigaud, M.R. Roesch, G. Schoenbaum, B. Girard, and M. Kha- massi. What do vta dopamine neurons encode : value, rpe or other behavior ? in preparation. (Cited on pages 6, 17, 69, 194, and 202.)
  490. K. Benchenane, A. Peyrache, M. Khamassi, P.L. Tierney, Y. Gioanni, F.P. Battaglia, and S.I. Wiener. Coherent theta oscillations and reorganization of spike timing in the hippocampal-prefrontal network upon learning. Neuron, 66(6) :921-936, 2010. (Cited on pages 9, 147, 199, and 202.)
  491. P. Biber and T. Duckett. Dynamic maps for long-term operation of mobile service robots. In Proceedings of the ECML-98 Workshop on Upgrading Learning to Meta-Level : Model Selection and Data Transformatio, pages 11- 17. 2005. (Cited on page 12.)
  492. P. Brazdil. Data transformation and model selection by experimentation and meta-learning. In Robotics : science and systems, pages 17-24. 1998. (Cited on page 9.)
  493. R. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, RA-2 :14-23, 1986. (Cited on page 11.)
  494. A. Brovelli, N. Laksiri, B. Nazarian, M. Meunier, and Boussaoud D. Un- derstanding the neural computations of arbitrary visuomotor learning through fmri and associative learning theory. Cerebral Cortex, 18(1) : 1485-1495, 2008. (Cited on page 5.)
  495. A. Brovelli, B. Nazarian, M. Meunier, and D. Boussaoud. Differential roles of caudate nucleus and putamen during instrumental learning. NeuroI- mage, 57(4) :1580-90, 2011. (Cited on page 192.)
  496. K. Caluwaerts, A. Favre-Félix, M. Staffa, S. N'Guyen, C. Grand, B. Girard, and M. Khamassi. Neuro-inspired navigation strategies shifting for ro- bots : Integration of a multiple landmark taxon strategy. In T.J. Prescott, N.F. Lepora, A. Mura, and P.F.M.J. Verschure, editors, 1st Living Ma- chines Conference, Lecture Notes in Artificial Intelligence 7375, pages 62-73.
  497. Springer, 2012a. (Cited on pages 13, 18, and 201.)
  498. K. Caluwaerts, M. Staffa, N'Guyen. S., C. Grand, L. Dollé, A. Favre-Félix, B. Girard, and M. Khamassi. A biologically inspired meta-control navi- gation system for the psikharpax rat robot. Bioinspiration and Biomime- tics, 7(2) :025009, 2012b. (Cited on pages 8, 13, 18, 147, 195, 201, and 205.)
  499. J. Catanese, E. Cerasti, M. Zugaro, A. Viggiano, and S.I. Wiener. Dynamics of decision-related activity in hippocampus. Hippocampus, 22(9) :1901- 1911, 2012. (Cited on page 202.)
  500. R. Chatila, R. Alami, B. Degallaix, and H. Laruelle. Integrated planning and execution control of autonomous robot actions. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA'92, pages 2689-2696. 1992. (Cited on page 11.)
  501. R. Chavarriaga, T. Strösslin, D. Sheynikhovich, and W. Gerstner. A com- putational model of parallel navigation systems in rodents. Neuroinfor- matics, 3 :223-241, 2005. (Cited on page 201.)
  502. A.G.E. Collins and M.J. Frank. How much of reinforcement learning is working memory, not reinforcement learning ? a behavioral, computa- tional, and neurogenetic analysis. European Journal of Neuroscience, 35 : 1024-1035, 2012. (Cited on page 192.)
  503. A.G.E. Collins and E. Koechlin. Reasoning, learning, and creativity : fron- tal lobe function and human decision-making. PLoS Biology, 10(3) : e1001293, 2012. (Cited on page 5.) References
  504. A. Coninx, A. Guillot, and B. Girard. Adaptive motivation in a biomimetic action selection mechanism. In Proceedings of the NeuroComp Conference 2008, pages 158-162, Marseille, France, 2008. (Cited on page 199.)
  505. L.H. Corbit and B.W. Balleine. The general and outcome-specific forms of pavlovian-instrumental transfer are differentially mediated by the nu- cleus accumbens core and shell. The Journal of Neuroscience, 31(33) : 11786-11794, 2011. (Cited on page 192.)
  506. G. Corrado and K. Doya. Understanding neural coding through the model-based analysis of decision making. Journal of Neuroscience, 27 (31) :8178-8180, 2007. (Cited on page 5.)
  507. I. Cos, N. Bélanger, and P. Cisek. The influence of predicted arm biome- chanics on decision making. Journal of Neurophysiology, 105 :3022-3033, 2011. (Cited on page 205.)
  508. I. Cos, F. Medleg, and P. Cisek. The modulatory influence of end-point controllability on decisions between actions. Journal of Neurophysiology, 108 :1764-1780, 2012. (Cited on page 205.)
  509. I. Cos, M. Khamassi, and B Girard. Modeling the learning of biomechanics and visual planning for decision-making of motor actions. Journal of Physiology, 107(5) :399-408, 2013. (Cited on pages 205 and 206.)
  510. N.D. Daw. Affect, learning and decision making, attention and perfor- mance. vol. xxiii. chapter Trial-by-trial data analysis using computa- tional models, pages 3-38. Oxford University Press, Oxford, UK, 2011. (Cited on pages 5 and 199.)
  511. N.D. Daw, Y. Niv, and P. Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Na- ture Neuroscience, 8(12) :1704-1711, 2005. (Cited on pages 6, 7, 192, 195, and 196.)
  512. N.D. Daw, J.P. O'Doherty, P. Dayan, B. Seymour, and R.J. Dolan. Cortical substrates for exploratory decisions in humans. Nature, 441(7095) :876- 879, 2006. (Cited on page 5.)
  513. N.D. Daw, S.J. Gershman, B. Seymour, P. Dayan, and R.J. Dolan. Model- based influences on humans' choices and striatal prediction errors. Neu- ron, 69 :1204-1215, 2011. (Cited on page 6.)
  514. P. Dayan and B.W. Balleine. Reward, motivation, and reinforcement lear- ning. Neuron, 36(2) :285-298, 2002. (Cited on page 6.)
  515. P. Dayan and Y. Niv. Reinforcement learning and the brain : The good, the bad and the ugly. Current Opinion in Neurobiology, 18(2) :185-196, 2008. (Cited on page 3.)
  516. B.D. Devan and N.M. White. Parallel information processing in the dorsal striatum : relation to hippocampal function. The Journal of Neuroscience, 19(7) :2789-2798, 1999. (Cited on page 201.)
  517. A. Dezfouli and B.W. Balleine. Habits, action sequences and reinforcement learning. European Journal of Neuroscience, 35(7) :1036-1051, 2012. (Cited on page 7.)
  518. A. Dickinson. Actions and habits : The development of behavioural au- tonomy. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 308(1135) :67-78, 1985. (Cited on page 6.)
  519. L. Dollé, M. Khamassi, B. Girard, A. Guillot, and R. Chavarriaga. Analy- zing interactions between navigation strategies using a computational model of action selection. In Spatial Cognition Conference, Lecture Notes in Computer Science 5248, pages 71-86. Springer, 2008. (Cited on pages 8, 195, and 201.)
  520. L. Dollé, D. Sheynikhovich, B. Girard, R. Chavarriaga, and A. Guillot. Path planning versus cue responding : a bioinspired model of switching bet- ween navigation strategies. Biological Cybernetics, 103(4) :299-317, 2010. (Cited on pages 8, 195, 196, and 201.)
  521. L. Dollé, R. Chavarriaga, M. Khamassi, and A. Guillot. Interactions bet- ween spatial strategies producing generalization gradient and blocking : a computational approach. Psychological Reviews, submitted. (Cited on pages 8, 195, and 201.)
  522. S. Doncieux, N. Bredeche, and J-B. Mouret, editors. New Horizons in Evo- lutionary Robotics. Springer-Verlag, Berlin Heidelberg, 2011. (Cited on pages 9 and 205.)
  523. K. Doya. Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10 :732-739, 2000a. (Cited on pages vii, 3, and 5.)
  524. K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12(1) :219-245, 2000b. (Cited on page 205.)
  525. K. Doya. Metalearning and neuromodulation. Neural Networks, 15(4-6) : 495-506, 2002. (Cited on page 8.)
  526. K. Doya. Modulators of decision making. Nature Neuroscience, 11(4) :410- 416, 2008. (Cited on pages 194 and 203.)
  527. A. Faure, U. Haberland, F. Condé, and N. El Massioui. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit forma- tion. Journal of Neuroscience, 25 :2771-2780, 2005. (Cited on page 5.)
  528. C.D. Fiorillo. Two dimensions of value : dopamine neurons represent reward but not aversiveness. Science, 341(6145) :546-549, 2013. (Cited on pages 194 and 203.)
  529. C.D. Fiorillo and W. Tobler, P.N. Schultz. Discrete coding of reward pro- bability and uncertainty by dopamine neurons. Science, 299(5614) :1898- 902, 2003. (Cited on page 203.)
  530. J. Fix, N. Rougier, and F. Alexandre. From physiological principles to computational models of the cortex. Journal of Physiology Paris, 101(1- 3) :32-39, 2007. (Cited on page 3.) References
  531. J.G. Fleischer, J.A. Gally, G.M. Edelman, and J.L. Krichmar. Retrospective and prospective responses arising in a modeled hippocampus during maze navigation by a brain-based device. Proceedings of the National Academy of Science, 104(9) :3556-3561, 2007. (Cited on page 13.)
  532. J. Folkesson and H.I. Christensen. Integrated planning and execution control of autonomous robot actions. In Proceedings of the IEEE Inter- national Conference on Robotics and Automation, ICRA'04, pages 383-390. 2004. (Cited on page 12.)
  533. D. Foster, R. Morris, and P. Dayan. Models of hippocampally dependent navigation using the temporal difference learning rule. Hippocampus, 10 :1-16, 2000. (Cited on page 201.)
  534. M.J. Frank. Dynamic dopamine modulation in the basal ganglia : a neuro- computational account of cognitive deficits in medicated and nonmedi- cated parkinsonism. Journal of Cognitive Neuroscience, 17(1) :51-72, 2005. (Cited on pages 5 and 193.)
  535. M.J. Frank, B.B. Doll, J. Oas-Terpstra, and F. Moreno. Prefrontal and stria- tal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience, 12(8) :1062-1068, 2009. (Cited on page 5.)
  536. H. Frezza-Buet, N. Rougier, and F. Alexandre. Integration of biologically inspired temporal mechanisms into a cortical framework for sequence processing. In R. Sun and C.L. Giles, editors, Sequence Learning : Para- digms, Algorithms, and Applications, LNAI 1828, pages 321-348. Springer, 2001. (Cited on page 12.)
  537. C. Giovannangeli and P. Gaussier. Autonomous vision-based navigation : Goal-oriented action planning by transient states prediction, cognitive map building, and sensory-motor learning. In Proceedings of the Interna- tional Conference on Intelligent Robots and Systems, volume 1, pages 281- 297. University of California Press, 2008. (Cited on page 13.)
  538. B. Girard, D. Filliat, J.A. Meyer, A. Berthoz, and A. Guillot. Integration of navigation and action selection functionalities in a computational model of corticobasal gangliathalamocortical loops. Adaptive Behavior, 13(2) : 115-130, 2005. (Cited on page 201.)
  539. C. Giraud-Carrier, P. Brazdil, and R. Vilalta. Introduction to the special issue on meta-learning. Machine Learning, 54(3) :187-193, 2004. (Cited on page 8.)
  540. J. Gläscher, N.D. Daw, P. Dayan, and J.P. O'Doherty. States versus rewards : dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66 :585-595, 2010. (Cited on page 6.)
  541. A.M. Graybiel. Habits, rituals, and the evaluative brain. Anuual Review of Neuroscience, 31 :359-387, 2008. (Cited on page 6.)
  542. A. Guazzelli, F.J. Corbacho, M. Bota, and M.A. Arbib. Affordances, mo- tivations and the world graph theory. Adaptive Behavior : Special issue on biologically inspired models of spatial navigation, 6(34) :435-471, 1998. (Cited on page 201.)
  543. K. Gurney, T.J. Prescott, and P. Redgrave. A computational model of action selection in the basal ganglia. I. A new functional anatomy. Biological Cybernetic, 84(6) :401-410, 2001. (Cited on page 5.)
  544. S.N. Haber, J.L. Fudge, and N.R. McFarland. Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. The Journal of Neuroscience, 20(6) :2369-82, 2000. (Cited on pages 194 and 199.)
  545. M.E. Hasselmo. A model of prefrontal cortical mechanisms for goal- directed behavior. Journal of Cognitive Neuroscience, 17(7) :1115-1129, 2005. (Cited on page 3.)
  546. J.C. Houk, J.L. Adams, and A.G. Barto. A model of how the basal gan- glia generate and use neural signals that predict reinforcement. In J. C. Houk, J. L. Davis, and D. G. Beiser, editors, Models of Information Proces- sing in the Basal Ganglia, pages 249-271. The MIT Press, Cambridge, MA, 1995. (Cited on page 5.)
  547. M.W. Howe, P.L. Tierney, S.G. Sandberg, P.E. Phillips, and A.M. Graybiel. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature, 500(7464) :575-579, 2013. (Cited on page 194.)
  548. M.D. Humphries and T.J. Prescott. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Progress in Neurobiology, 90 :385-417, 2010. (Cited on page 5.)
  549. M.D. Humphries, M. Khamassi, and K. Gurney. Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Frontiers in Neuroscience, 6 :9, 2012. (Cited on pages 5, 195, 203, 204, and 205.)
  550. Q.J. Huys, N. Eshel, E. O'Nions, L. Sheridan, P. Dayan, and J.P. Roiser. Bon- sai trees in your head : how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8(3) : e1002410, 2012. (Cited on page 196.)
  551. S. Ishii, W. Yoshida, and J. Yoshimoto. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Networks, 15(4-6) : 665-687, 2002. (Cited on page 9.)
  552. M. Ito and K. Doya. Validation of decision-making models and analysis of decision variables in the rat basal ganglia. Journal of Neuroscience, 29 (31) :9861-9874, 2009. (Cited on page 5.)
  553. Makoto Ito and Kenji Doya. Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Current Opi- nion in Neurobiology, 21 :368-373, 2011. (Cited on page 6.) References
  554. D. Joel, Y. Niv, and E. Ruppin. Actor-critic models of the basal ganglia : new anatomical and computational perspectives. Neural Networks, 15 (4-6) :535-547, 2002. (Cited on page 5.)
  555. Kanoun, J.P. Laumond, and E. Yoshida. Planning foot placements for a humanoid robot : A problem of inverse kinematics. International Journal of Robotics Research, 30(4) :476-485, 2011. (Cited on page 11.)
  556. M. Kawato, S. Kuroda, and N. Schweighofer. Cerebellar supervised lear- ning revisited : biophysical modeling and degrees-of-freedom control. Current Opinion in Neurobiology, 21(5) :791-800, 2011. (Cited on page 2.)
  557. M. Keramati and B. Gutkin. Imbalanced decision hierarchy in addicts emerging from drug-hijacked dopamine spiraling circuit. PLoS ONE, 8 (4) :e61489, 2013. (Cited on pages 3 and 199.)
  558. M. Keramati and B.S. Gutkin. A reinforcement learning theory for ho- meostatic regulation. In J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, NIPS, pages 82-90, 2011. (Cited on page 199.)
  559. M. Keramati, A Dezfouli, and P. Piray. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Computational Biology, 7(5) :e1002055, 2011. (Cited on pages 7, 195, and 196.)
  560. M. Khamassi. Complementary roles of the rat prefrontal cortex and striatum in reward-based learning and shifting navigation strategies. PhD thesis, Uni- versité Pierre et Marie Curie, 2007. (Cited on pages 5, 15, and 201.)
  561. M. Khamassi and M.D. Humphries. Integrating cortico-limbic-basal gan- glia architectures for learning model-based and model-free navigation strategies. Frontiers in Behavioral Neuroscience, 6 :79, 2012. (Cited on pages 4, 8, 17, 19, 191, 200, 201, and 202.)
  562. M. Khamassi, L. Lacheze, B. Girard, A. Berthoz, and A. Guillot. Actor-critic models of reinforcement learning in the basal ganglia : from natural to arificial rats. Adaptive Behavior, 13 :131-148, 2005. (Cited on pages 5 and 10.)
  563. M. Khamassi, L.-E. Martinet, and A. Guillot. Combining self-organizing maps with mixtures of experts : application to an actor-critic model of reinforcement learning in the basal ganglia. In From Animals to Animats 9, LNAI 4095, pages 394-405. Berlin, Heidelberg : Springer-Verlag, 2006. (Cited on pages 5, 10, and 205.)
  564. M. Khamassi, A.B. Mulder, E. Tabuchi, V. Douchamps, and S.I. Wiener. Anticipatory reward signals in ventral striatal neurons of behaving rats. European Journal of Neuroscience, 28 :1849-1866, 2008. (Cited on page 5.)
  565. M. Khamassi, S. Lallée, P. Enel, E. Procyk, and P.F. Dominey. Robot cogni- tive control with a neurophysiologically inspired reinforcement learning model. Frontiers in Neurorobotics, 5 :1, 2011a. (Cited on page 10.)
  566. M. Khamassi, C. Wilson, R. Rothé, R. Quilodran, P.F. Dominey, and E. Pro- cyk.
  567. Meta-learning, cognitive control, and physiological interactions bet- ween medial and lateral prefrontal cortex. In R.B. Mars, J. Sallet, M.F.S. Rushworth, and N. Yeung, editors, Neural Basis of Motivational and Cog- nitive Control, pages 351-370. Cambridge, MA : MIT Press, 2011b. (Cited on pages 9 and 202.)
  568. M. Khamassi, P. Enel, P.F. Dominey, and E. Procyk. Medial prefrontal cor- tex and the adaptive regulation of reinforcement learning parameters. Progress in Brain Research, 202 :441-464, 2013. (Cited on pages 9 and 202.)
  569. M. Khamassi, R. Quilodran, P. Enel, P.F. Dominey, and E. Procyk. Behavio- ral regulation and the modulation of information coding in the lateral prefrontal and cingulate cortex. Cerebral Cortex, 2014. in press. (Cited on pages 5, 10, 17, 69, 193, and 202.)
  570. S. Killcross and E. Coutureau. Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral Cortex, 13 :400-408, 2003. (Cited on page 7.)
  571. J. Kober and J. Peters. Policy search for motor primitives in robotics. Ma- chine Learning, 84 :171-203, 2011. (Cited on page 11.)
  572. E. Koechlin and C. Summerfield. An information theoretical approach to prefrontal executive function. Trends in Cognitive Sciences, 11(6) :22935, 2007. (Cited on page 9.)
  573. E. Koechlin, C. Ody, and F. Kouneiher. The architecture of cognitive control in the human prefrontal cortex. Science, 302(5648) :1181-1185, 2003. (Ci- ted on page 6.)
  574. J.L. Krichmar and G.M. Edelman. Machine psychology : Autonomous behavior, perceptual categorization, and conditioning in a brain-based device. Cerebral Cortex, 12 :818-830, 2002. (Cited on page 10.)
  575. J.L. Krichmar, A.K. Seth, D.A. Nitz, J.G. Fleischer, and G.M. Edelman. Spa- tial navigation and causal analysis in a brain-based device modeling cortical-hippocampal interactions. Neuroinformatics, 3(3) :147-169, 2005. (Cited on page 13.)
  576. S. Lammel, A. Hetzel, O. Hackel, I. Jones, B. Liss, and J. Roeper. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 57 :760-773, 2008. (Cited on pages 194 and 203.)
  577. N. Lavesson and P. Davidsson. Quantifying the impact of learning algo- rithm parameter tuning. AAAI National Conference on Artificial Intelli- gence, 21(1) :395-400, 2006. (Cited on page 8.)
  578. S. Lemaignan, R. Ros, E.A. Sisbot, R. Alami, and M. Beetz. Grounding the interaction : anchoring situated discourse in everyday human-robot interaction. International Journal of Social Robotics, 4(2) :181-199, 2012. (Cited on pages vii, 197, and 198.) References
  579. N. Lepora, P. Verschure, and T. Prescott. The state of the art in biomimetics. Bioinspiration and Biomimetics, 8 :013001, 2013. (Cited on page 13.)
  580. F. Lesaint, O. Sigaud, S.B. Flagel, T.E. Robinson, and M. Khamassi. Model- ling individual differences observed in pavlovian autoshaping in rats using a dual learning systems approach and factored representations. PLoS Computational Biology, 10(2) :e1003466, 2014. (Cited on pages 8, 17, 19, and 192.)
  581. F. Lesaint, O. Sigaud, J.J. Clark, S.B. Flagel, and M. Khamassi. Experimen- tal predictions drawn from a computational model of sign-trackers and goal-trackers. Journal of Physiology Paris, submitted. (Cited on pages 18 and 192.)
  582. J. Liénard and B. Girard. A biologically constrained model of the whole ba- sal ganglia addressing the paradoxes of connections and selection. Jour- nal of Computational Neuroscience, pages 1-24, 2013. (Cited on page 193.)
  583. M. Likhachev, M. Kaess, and R.C. Arkin. Learning behavioral parameteri- zation using spatio-temporal case-based reasoning. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA'02, pages 1282-1289. 2002. (Cited on page 11.)
  584. M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini. Developmental ro- botics : a survey. Connection Science, 15(4) :151-190, 2004. (Cited on pages 13 and 203.)
  585. T. Maia and M.J. Frank. From reinforcement learning models to psychiatric and neurological disorders. Nature Neuroscience, 14(2) :154-162, 2011. (Cited on page 3.)
  586. A. Marchand, V. Fresno, M. Khamassi, and E. Coutureau. Dopaminer- gic modulation of the exploration level in a non-stationary probabilistic task. In FENS Abstract. 2014. (Cited on page 195.)
  587. L.-E. Martinet, D. Sheynikhovich, K. Benchenane, and A. Arleo. Spatial learning and action planning in a prefrontal cortical network model. PLoS Computational Biology, 7(5) :e1002045, 2011. (Cited on page 3.)
  588. M. Matsumoto and O. Hikosaka. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature, 459(7248) : 837-841, 2009. (Cited on pages 5, 194, and 203.)
  589. S.M. McClure, N.D. Daw, and P.R. Montague. A computational substrate for incentive salience. Trends in Neurosciences, 26(8) :423-428, 2003. (Cited on page 203.)
  590. J.-A. Meyer and A. Guillot. Biologically-inspired robots. In B. Siciliano and O. Khatib, editors, Handbook of robotics, pages 1395-1422. Springer- Verlag, Berlin, 2008. (Cited on page 12.)
  591. M. Milford and G. Wyeth. Persistent navigation and mapping using a biologically inspired slam system. The International Journal of Robotics Research, 29(9) :1131-1153, 2010. (Cited on page 13.)
  592. E.K. Miller and J.D. Cohen. An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24 :167202, 2001. (Cited on page 9.)
  593. J. Minguez, F. Lamiraux, and J.P. Laumond. Motion planning and obstacle avoidance. In B. Siciliano and O. Khatib, editors, Handbook of Robotics, pages 827-852. Springer-Verlag, 2008. (Cited on page 11.)
  594. B. Mitchinson, M.J. Pearson, A.G. Pipe, and T.J. Prescott. Biomimetic ro- bots as scientific models : A view from the whisker tip, 2011. (Cited on page 13.)
  595. M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. Fast-slam : A facto- red solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI National Conference on Artificial Intelligence, pages 593-598. 2002. (Cited on page 12.)
  596. J. Morimoto and K. Doya. Acquisition of stand-up behavior by a real ro- bot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36 :37-51, 2001. (Cited on page 10.)
  597. G. Morris, A. Nevet, D. Arkadir, E. Vaadia, and H. Bergman. Midbrain do- pamine neurons encode decisions for future action. Nature Neuroscience, 9(8) :1057-1063, 2006. (Cited on pages 5 and 203.)
  598. P. Moutarlier and R. Chatila. Stochastic multisensory data fusion for mo- bile robot location and environment modeling. 5th International Sympo- sium on Robotics Research, pages 85-94, 1985. (Cited on page 12.)
  599. S. N'Guyen, P. Pirim, and J.-A. Meyer. Texture discrimination with arti- ficial whiskers in the robot-rat psikharpax. In Biomedical Engineering Systems and Technologies : Third International Joint Conference, BIOSTEC 2010, pages 127-152. Valencia, Spain, 2011. (Cited on page 13.)
  600. Y. Niv, M.O. Duff, and P. Dayan. Dopamine, uncertainty and td learning. Behavioral and Brain Functions, 4 :1-6, 2005. (Cited on page 203.)
  601. J. O'Doherty, P. Dayan, J. Schultz, R. Deichmann, K. Friston, and R.J. Do- lan. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304 :452-454, 2004. (Cited on page 5.)
  602. P.-Y. Oudeyer and F. Kaplan. What is intrinsic motivation ? a typology of computational approaches. Frontiers in Neurorobotics, 1 :6, 2007. (Cited on page 13.)
  603. S. Palminteri, M. Lebreton, Y. Worbe, D. Grabli, A. Hartmann, and M. Pes- siglione. Pharmacological modulation of subliminal learning in parkin- son's and tourette's syndromes. Proceedings of the National Academy of Science, 106(45) :19179-19184, 2009. (Cited on page 5.)
  604. A. Parent and L.N. Hazrati. Functional anatomy of the basal ganglia. i. the cortico-basal ganglia-thalamo-cortical loop. Brain Research Reviews, 20(1) :91-127, 1995a. (Cited on page 193.) References
  605. A. Parent and L.N. Hazrati. unctional anatomy of the basal ganglia. ii. the place of subthalamic nucleus and external pallidum in basal gan- glia circuitry. Brain Research Reviews, 20(1) :128-154, 1995b. (Cited on page 193.)
  606. J.M. Pearce, A.D. Roberts, and M. Good. Hippocampal lesions disrupt navigation based on cognitive maps but not heading vectors. Nature, 396(6706) :75-77, 1998. (Cited on page 201.)
  607. M. Pessiglione, B. Seymour, G. Flandin, R.J. Dolan, and C.D. Frith. Dopamine-dependent prediction errors underpin reward-seeking beha- viour in humans. Nature, 442(7106) :1042-1045, 2006. (Cited on page 5.)
  608. J. Peters and S. Schaal. Policy gradient methods for robotics. In Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS), pages 2219-2225. 2006. (Cited on page 7.)
  609. J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4) :682-697, 2008. (Cited on pages 11 and 205.)
  610. A. Peyrache, M. Khamassi, K. Benchenane, S.I. Wiener, and F.P. Battaglia. Replay of rule-learning related neural patterns in the prefrontal cor- tex during sleep. Nature Neuroscience, 12(7) :919-926, 2009. (Cited on pages 9, 147, and 202.)
  611. A. Peyrache, K. Benchenane, M. Khamassi, S.I. Wiener, and F.P. Battaglia. Principal component analysis of ensemble recordings reveals cell assem- blies at high temporal resolution. Journal of Computational Neuroscience, 29(1-2) :309-325, 2010a. (Cited on page 9.)
  612. A. Peyrache, K. Benchenane, M. Khamassi, S.I. Wiener, and F.P. Battaglia. Sequential reinstatement of neocortical activity during slow oscillations depends on cells' global activity. Frontiers in Systems Neuroscience, 3 :18, 2010b. (Cited on page 9.)
  613. R. Pfeifer, M. Lungarella, and F. Iida. Self-organization, embodiment, and biologically inspired robotics. Science, 318 :1088-1093, 2007. (Cited on page 12.)
  614. P. Redgrave, T.J. Prescott, and K. Gurney. The basal ganglia : a vertebrate solution to the selection problem ? Neuroscience, 89(4) :1009-1023, 1999. (Cited on page 5.)
  615. E. Renaudo, B Girard, R. Chatila, and M. Khamassi. Design of a deci- sion architecture for habit learning in robots. In A. Duff, N.F. Lepora, A. Mura, T.J. Prescott, and P.F.M.J. Verschure, editors, 3rd Living Ma- chines Conference, Lecture Notes in Artificial Intelligence 8608, pages 249- 260.
  616. Springer, 2014. (Cited on pages 18, 147, and 196.)
  617. R.A. Rescorla and A.R. Wagner. Classical conditioning ii : Current research and theory. chapter A theory of Pavlovian conditioning : variations in the effectiveness of reinforcement and nonreinforcement, pages 64-99. Appleton-Century-Crofts, New-York, 1972. (Cited on page 4.)
  618. J.N. Reynolds, B.I. Hyland, and J.R. Wickens. A cellular mechanism of reward-related learning. Nature, 413 :67-70, 2001. (Cited on page 5.)
  619. L. Rigoux and E. Guigon. A model of reward-and effort-based optimal decision making and motor control. PLOS Computational Biology, 8(10) : e1002716, 2012. (Cited on page 205.)
  620. M.R. Roesch, D.J. Calu, and G. Schoenbaum. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10(12) :1615-1624, 2007. ISSN 1097-6256. (Cited on pages 5 and 203.)
  621. L. Rondi-Reig, G.H. Petit, C. Tobin, S. Tonegawa, J. Mariani, and A. Ber- thoz. Impaired sequential egocentric and allocentric memories in forebrain-specific-nmda receptor knock-out mice during a new task dis- sociating strategies of navigation. The Journal of Neuroscience, 26(15) : 4071-4081, 2006. (Cited on page 202.)
  622. P. Rosenbloom, J. Laird, and A. Newell, editors. The Soar Papers : Research on Integrated Intelligence. MIT Press, Cambridge, Massachusetts, 1993. (Cited on page 200.)
  623. M.F. Rushworth and T.E. Behrens. Choice, uncertainty and value in pre- frontal and cingulate cortex. Nature Neuroscience, 11(4) :389-397, 2008. (Cited on pages 5 and 203.)
  624. K. Samejima and K. Doya. Multiple representations of belief states and ac- tion values in corticobasal ganglia loops. Annals of the New York Academy of Sciences, 1104 :213-228, 2007. (Cited on pages 9 and 192.)
  625. K. Samejima, Y. Ueda, K. Doya, and M. Kimura. Representation of action- specific reward values in the striatum. Science, 310 :1337-1340, 2005. (Cited on page 5.)
  626. J. Schmidhuber, J. Zhao, and N. Schraudolph. Reinforcement learning with self-modifying policies. In Learning to Learn, pages 293-309. Bos- ton : Kluwer, 1997. (Cited on page 8.)
  627. W. Schultz. Introduction. neuroeconomics : the promise and the profit. Philosophical Transactions of the Royal Society B, 363 :3767-3769, 2008. (Ci- ted on page 203.)
  628. W.
  629. Schultz, P. Dayan, and P.R. Montague. A neural substrate of predic- tion and reward. Science, 275(5306) :1593-1599, 1997. (Cited on pages 5 and 203.)
  630. N. Schweighofer and K. Doya. Meta-learning in reinforcement learning. Neural Networks, 16(1) :5-9, 2003. (Cited on page 10.)
  631. W. Shen, M. Flajolet, P. Greengard, and D.J. Surmeier. Dichotomous do- paminergic control of striatal synaptic plasticity. Science, 321 :848-851, 2008. (Cited on page 5.)
  632. B. Siciliano and O. Khatib. Handbook of robotics. Springer-Verlag, Berlin, 2008. (Cited on page 10.)
  633. O. Sigaud and J. Peters. From motor learning to interaction learning in robots. In O. Sigaud and J. Peters, editors, From Motor Learning to Inter- action Learning in Robots, pages 1-12. Springer-Verlag, publisher., 2010. (Cited on page 11.)
  634. Sigaud, C. Salaun, and V. Padois. On-line regression algorithms for learning mechanical models of robots : a survey. Robotics and Autono- mous Systems, 59(12) :1115-1129, 2011. (Cited on page 205.)
  635. W.D. Smart and L.P. Kaelbling. Effective reinforcement learning for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA'02, pages 3404-3410. 2002. (Cited on page 10.)
  636. R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relation- ships in robotics. In I. Cox and G. Wilfong, editors, Autonomous Robot Vehicles, pages 167-193. Springer-Verlag, 1990. (Cited on page 12.)
  637. F. Stulp and O. Sigaud. Robot skill learning : From reinforcement learning to evolution strategies. Paladyn Journal of Behavioral Robotics, 4(1) :49-61, 2013. (Cited on page 11.)
  638. R.S. Sutton and A.G. Barto. Reinforcement Learning : An Introduction. Cam- bridge, MA : MIT Press, 1998. (Cited on pages 4, 6, and 9.)
  639. Y.K. Takahashi, M.R. Roesch, R.C. Wilson, K. Toreson, P. O'Donnell, Y. Niv, and G. Schoenbaum. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nature Neuroscience, 14(12) : 1590-1597, 2011. (Cited on page 203.)
  640. S.C. Tanaka, K. Doya, G. Okada, Y. Ueda, K.and Okamoto, and S. Yama- waki. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7(8) :887-893, 2004. (Ci- ted on page 10.)
  641. M. A. A. van der Meer, Z. Kurth-Nelson, and A. D. Redish. Information processing in decision-making systems. The Neuroscientist, XX(X) :1-18, 2012. (Cited on page 192.)
  642. M.A.A. van der Meer and A.D. Redish. Ventral striatum : a critical look at models of learning and evaluation. Current Opinion in Neurobiology, 21 (3) :387-392, 2011. (Cited on page 5.)
  643. H. van Hasselt and M. Wiering. Reinforcement learning in continuous action spaces. In IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 272-279. 2007. (Cited on page 7.)
  644. R. Volpe, I. Nesnas, T. Estlin, D. Mutz, R. Petras, and H. Das. The cla- raty architecture for robotic autonomy. In Proceedings of the 2001 IEEE Aerospace Conference, pages 39-46. 2001. (Cited on pages 11 and 200.)
  645. H.H. Yin and B.J. Knowlton. The role of the basal ganglia in habit forma- tion. Nature Reviews Neuroscience, 7(6) :464-476, 2006. (Cited on pages 5, 6, and 195.)
  646. H.H. Yin, S.B. Ostlund, and B.W. Balleine. Reward-guided learning beyond dopamine in the nucleus accumbens : the integrative functions of cortico-basal ganglia networks. European Journal of Neuroscience, 28 (8) :1437-1448, 2008. (Cited on page 5.)
  647. Ce document a été préparé à l'aide de l'éditeur de texte TeXworks et du logiciel de composition typographique L A T E X 2 ε .