Papers by Sylvain Calinon

In minimally invasive surgery, tools go through narrow openings and manipulate soft organs to per... more In minimally invasive surgery, tools go through narrow openings and manipulate soft organs to perform surgical tasks. There are limitations in current robot-assisted surgical systems due to the rigidity of robot tools. The aim of the STIFF-FLOP European project is to develop a soft robotic arm to perform surgical tasks. The flexibility of the robot allows the surgeon to move within organs to reach remote areas inside the body and perform challenging procedures in laparoscopy. This article addresses the problem of designing learning interfaces enabling the transfer of skills from human demonstration. Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level imitation of the underlying intent extracted from the demonstrations. By focusing on this last form, we study the problem of extracting an objective function explaining the demonstrations from an over-specified set of candidate reward functions, and using this information for self-refinement of the skill. In contrast to inverse reinforcement learning strategies that attempt to explain the observations with reward functions defined for the entire task (or a set of pre-defined reward profiles active for different parts of the task), the proposed approach is based on context-dependent reward-weighted learning, where the robot can learn the relevance of candidate objective functions with respect to the current phase of the task or encountered situation. The robot then exploits this information for skills refinement in the policy parameters space. The proposed approach is tested in simulation with a cutting task performed by the STIFF-FLOP flexible robot, using kinesthetic demonstrations 1 from a Barrett WAM manipulator.

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to l... more In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the application of reinforcement learning to real-world robots are described: a pancake flipping task, a bipedal walking energy minimization task and an archery-based aiming task. In all examples, a state-of-the-art expectation-maximization-based reinforcement learning is used, and different policy representations are proposed and evaluated for each task. The proposed policy representations offer viable solutions to six rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, globality, multi-dimensionality and convergence. Both the successes and the practical difficulties encountered in these examples are discussed. Based on insights from these particular cases, conclusions are drawn about the state-of-the-art and the future perspective directions for reinforcement learning in robotics.

When collaboration between human users and robots involves physical interaction, the importance o... more When collaboration between human users and robots involves physical interaction, the importance of the safety issue arises. We propose a method to transfer to robots several tasks demonstrated by the user through kinesthetic teaching and subsequently learned using a weighted combination of dynamical systems (DS). The approach used to encode the desired skills ensures a safe robot behavior during the task reproduction, allowing physical interaction with the user who can employ the manipulator as a tangible interface. By using a force sensor-less impedance controller with a back-drivable robot, this concept is exploited in two physical human-robot interaction (pHRI) scenarios. The first considers an emergency situation in which the user can stop or pause a task execution by grasping and moving the robot away from the region of space associated to the skill. The second studies the possibility to select one among several learned tasks and switch to its execution by physically guiding the robot towards the task region.
We study the use of different weighting mechanisms in robot learning to represent a movement as a... more We study the use of different weighting mechanisms in robot learning to represent a movement as a combination of linear systems. Kinesthetic teaching is used to acquire a skill from demonstrations which is then reproduced by the robot. The behaviors of the systems are analyzed when the robot faces perturbation introduced by the user physically interacting with the robot to momentarily stop the task. We propose the use of a Hidden Semi-Markov Model (HSMM) representation to encapsulate duration and position information in a robust manner with parameterization on the involvement of time and space constraints. The approach is tested in simulation and in two robot experiments, where a 7 DOFs manipulator is taught to play a melody by pressing three big keys and to pull a model train on its track.
We present a learning-based approach for minimizing the electric energy consumption during walkin... more We present a learning-based approach for minimizing the electric energy consumption during walking of a passively-compliant bipedal robot. The energy consumption is reduced by learning a varying-height center-of-mass trajectory which uses efficiently the robot's passive compliance. To do this, we propose a reinforcement learning method which evolves the policy parameterization dynamically during the learning process and thus manages to find better policies faster than by using fixed parameterization. The method is first tested on a function approximation task, and then applied to the humanoid robot COMAN where it achieves significant energy reduction.

The availability of haptic interfaces in music content processing offers interesting possibilitie... more The availability of haptic interfaces in music content processing offers interesting possibilities of performerinstrument interaction for musical expression. These new musical instruments can precisely modulate the haptic feedback, and map it to a sonic output, thus offering new artistic content creation possibilities. With this article, we investigate the use of a robotic arm as a bidirectional tangible interface for musical expression, actively modifying the compliant control strategy to create a bind between gestural input and music output. The user can define recursive modulations of music parameters by grasping and gradually refining periodic movements on a gravitycompensated robot manipulator. The robot learns on-line the new desired trajectory, increasing its stiffness as the modulation refinement proceeds. This article reports early results of an artistic performance that has been carried out with the collaboration of a musician, who played with the robot as part of his live stage setup.
We present an integrated approach allowing a free-standing humanoid robot to acquire new motor sk... more We present an integrated approach allowing a free-standing humanoid robot to acquire new motor skills by kinesthetic teaching. The proposed method controls simultaneously the upper and lower body of the robot with different control strategies. Imitation learning is used for training the upper body of the humanoid robot via kinesthetic teaching, while at the same time Reaction Null Space method is used for keeping the balance of the robot. During demonstration, a force/torque sensor is used to record the exerted forces, and during reproduction, we use a hybrid position/force controller to apply the learned trajectories in terms of positions and forces to the end effector. The proposed method is tested on a 25-DOF Fujitsu HOAP-2 humanoid robot with a surface cleaning task.

Our research focuses on exploring new modalities to make robots acquire skills in a fast and user... more Our research focuses on exploring new modalities to make robots acquire skills in a fast and userfriendly manner. In this work we present a novel active interface with perception and projection capabilities for simplifying the skill transfer process. The interface allows humans and robots to interact with each other in the same environment, with respect to visual feedback. During the learning process, the real workspace is used as a tangible interface for helping the user to better understand what the robot has learned up to then, to display information about the task or to get feedback and guidance. Thus, the user is able to incrementally visualize and assess the learner's state and, at the same time, focus on the skill transfer without disrupting the continuity of the teaching interaction. We also propose a proof-of-concept, as a core element of the architecture, based on an experimental setting where a picoprojector and an rgb-depth sensor are mounted onto the end-effector of a 7-DOF robotic arm.

Research in learning from demonstration has focused on transferring movements from humans to robo... more Research in learning from demonstration has focused on transferring movements from humans to robots. However, a need is arising for robots that do not just replicate the task on their own, but that also interact with humans in a safe and natural way to accomplish tasks cooperatively. Robots with variable impedance capabilities opens the door to new challenging applications, where the learning algorithms must be extended by encapsulating force and vision information. In this paper we propose a framework to transfer impedancebased behaviors to a torque-controlled robot by kinesthetic teaching. The proposed model encodes the examples as a task-parameterized statistical dynamical system, where the robot impedance is shaped by estimating virtual stiffness matrices from the set of demonstrations. A collaborative assembly task is used as testbed. The results show that the model can be used to modify the robot impedance along task execution to facilitate the collaboration, by triggering stiff and compliant behaviors in an on-line manner to adapt to the user's actions. Figure 1: Top: Two humans assembling a wooden table. Bottom: demonstration (left) and reproduction (right) of the impedance-based behavior.

The democratization of robotics technology and the development of new actuators progressively bri... more The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot's accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a representation of the policy consisting of moving the robot through a set of points to follow a trajectory. With new environments such as homes and offices populated with humans, the reproduction performance is portrayed differently. These robots are expected to acquire rich motor skills that can be generalized to new situations, while behaving safely in the vicinity of users. Skills acquisition can no longer be guided by a single form of learning, and must instead combine different approaches to continuously create, adapt and refine policies. The family of search strategies based on expectation-maximization (EM) looks particularly promising to cope with these new requirements. The exploration can be performed directly in the policy parameters space, by refining the policy together with exploration parameters represented in the form of covariances. With this formulation, RL can be extended to a multi-optima search problem in which several policy alternatives can be considered. We present here two applications exploiting EM-based exploration strategies, by considering parameterized policies based on dynamical systems, and by using Gaussian mixture models for the search of multiple policy alternatives.

Learning by imitation in humanoids is challenging due to the unpredictable environments these rob... more Learning by imitation in humanoids is challenging due to the unpredictable environments these robots have to face during reproduction. Two sets of tools are relevant for this purpose: 1) probabilistic machine learning methods that can extract and exploit the regularities and important features of the task; and 2) dynamical systems that can cope with perturbation in real-time without having to replan the whole movement. We present a learning by imitation approach combining the two benefits. It is based on a superposition of virtual spring-damper systems to drive a humanoid robot's movement. The method relies on a statistical description of the springs attractor points acting in different candidate frames of reference. It extends dynamic movement primitives models by formulating the dynamical systems parameters estimation problem as a Gaussian mixture regression problem with projection in different coordinate systems. The robot exploits local variability information extracted from multiple demonstrations of movements to determine which frames are relevant for the task, and how the movement should be modulated with respect to these frames. The approach is tested on the new prototype of the COMAN compliant humanoid with time-based and timeinvariant movements, including bimanual coordination skills.
Skills can often be performed in many different ways. In order to provide robots with human-like ... more Skills can often be performed in many different ways. In order to provide robots with human-like adaptation capabilities, it is of great interest to learn several ways of achieving the same skills in parallel, since eventual changes in the environment or in the robot can make some solutions unfeasible. In this case, the knowledge of multiple solutions can avoid relearning the task. This problem is addressed in this paper within the framework of Reinforcement Learning, as the automatic determination of multiple optimal parameterized policies. For this purpose, a model handling a variable number of policies is built using a Bayesian non-parametric approach. The algorithm is first compared to single policy algorithms on known benchmarks. It is then applied to a typical robotic problem presenting multiple solutions.

In learning by exploration problems such as reinforcement learning (RL), direct policy search, st... more In learning by exploration problems such as reinforcement learning (RL), direct policy search, stochastic optimization or evolutionary computation, the goal of an agent is to maximize some form of reward function (or minimize a cost function). Often, these algorithms are designed to find a single policy solution. We address the problem of representing the space of control policy solutions by considering exploration as a density estimation problem. Such representation provides additional information such as shape and curvature of local peaks that can be exploited to analyze the discovered solutions and guide the exploration. We show that the search process can easily be generalized to multi-peaked distributions by employing a Gaussian mixture model (GMM) with an adaptive number of components. The GMM has a dual role: representing the space of possible control policies, and guiding the exploration of new policies. A variation of expectation-maximization (EM) applied to reward-weighted policy parameters is presented to model the space of possible solutions, as if this space was a probability distribution. The approach is tested in a dart game experiment formulated as a black-box optimization problem, where the agent's throwing capability increases while it chases for the best strategy to play the game. This experiment is used to study how the proposed approach can exploit new promising solution alternatives in the search process, when the optimality criterion slowly drifts over time. The results show that the proposed multi-optima search approach can anticipate such changes by exploiting promising candidates to smoothly adapt to the change of global optimum.

We present a human-robot interface for projecting information on arbitrary planar surfaces by sha... more We present a human-robot interface for projecting information on arbitrary planar surfaces by sharing a visual understanding of the workspace. A compliant 7-DOF arm robot endowed with a pico-projector and a depth sensor has been used for the experiment. The perceptual capabilities allows the system to detect geometry features of the environment which are used for superimposing undistorted projection on planar surfaces. The proposed scenario consists of a first phase in which the user physically interacts with the gravity compensated robot for choosing the place where the projection will appear. After, in the second phase, the robotic arm is able to autonomously superimpose visual information in the selected area and actively adapt to perturbations. We also present a proof-of-concept for managing occlusions and tracking the position of the projection whenever obstacles enter in the projection field.

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to l... more In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. We give a summary of the state-of-the-art of reinforcement learning in the context of robotics, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Three recent examples for the application of reinforcement learning to real-world robots are described: a pancake flipping task, a bipedal walking energy minimization task and an archery-based aiming task. In all examples, a state-of-the-art expectation-maximization-based reinforcement learning is used, and different policy representations are proposed and evaluated for each task. The proposed policy representations offer viable solutions to six rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, globality, multi-dimensionality and convergence. Both the successes and the practical difficulties encountered in these examples are discussed. Based on insights from these particular cases, conclusions are drawn about the state-of-the-art and the future perspective directions for reinforcement learning in robotics.
Teaching a humanoid: A user study on learning by demonstration with hoap-3
Abstract This article reports on the results of a user study investigating the satisfaction of na... more Abstract This article reports on the results of a user study investigating the satisfaction of nave users conducting two learning by demonstration tasks with the HOAP-3 robot. The main goal of this study was to gain insights on how to ensure a successful as well as satisfactory experience for nave users. The participants performed two tasks: They taught the robot to (1) push a box, and to (2) close a box.
Human-robot teaching experiment Cogniron Winter School
Information on a publication
Using a learning algorithm based on Gaussian Mixture Regression, the task constraints are extract... more Using a learning algorithm based on Gaussian Mixture Regression, the task constraints are extracted from several demonstrations. Those constraints take the form of desired velocity profiles of the end-effector and joint angle variables. The velocity profiles are then used to modulate a dynamical system which has the reaching target as attractor. This way, the reaching trajectory can be reshaped in order to satisfy the constraints of the task, while preserving the adaptability and robustness provided by the dynamical system.
Challenges for the policy representation when applying reinforcement learning in robotics
Abstract A summary of the state-of-the-art reinforcement learning in robotics is given, in terms ... more Abstract A summary of the state-of-the-art reinforcement learning in robotics is given, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Two recent examples for application of reinforcement learning to robots are described: pancake flipping task and bipedal walking energy minimization task.
SPECIAL ISSUE ON ROBOT LEARNING BY OBSERVATION, DEMONSTRATION, AND IMITATION
Uploads
Papers by Sylvain Calinon