Papers by Darius Burschka

Software Architecture of a System for Robotic Surgery
At the German Heart Center Munich we have installed and evaluated a novel system for robotic surg... more At the German Heart Center Munich we have installed and evaluated a novel system for robotic surgery. Its main features are the incorporation of haptics (by means of strain gauge sensors at the instruments) and partial automation of surgical tasks. However, in this paper we focus on the software engineering aspects of the system. We present a hierarchical approach, which is inspired by the modular architecture of the hardware. Each component of the system, and therefore each component of the control software can be easily interchanged by another instance (e.g. different types of robots may be employed to carry the surgical instruments). All operations are abstracted by an intuitive user interface, which provides a high level of transparency. In addition, we have included techniques known from character animation (so called key-framing) in order to enable operation of the system by users with non-engineering backgrounds. The introduced concepts have proven effective during an extensive evaluation with 30 surgeons. Thereby, the system was used to conduct simplified operations in the field of heart surgery, including the replacement of a papillary tendon and the occlusion of an atrial septal defect.

Proceedings of the International Conference on Computer Vision Theory and Applications, 2013
This paper describes an approach to consistently model and characterize potential object candidat... more This paper describes an approach to consistently model and characterize potential object candidates presented in non-static scenes. With a stereo camera rig we recollect and collate range data from different views around a scene. Three principal procedures support our method: i) the segmentation of the captured range images into 3D clusters or blobs, by which we obtain a first gross impression of the spatial structure of the scene, ii) the maintenance and reliability of the map, which is obtained through the fusion of the captured and mapped data to which we assign a degree of existence (confidence value), iii) the visual motion estimation of potential object candidates, through the combination of the texture and 3D-spatial information, allows not only to update the state of the actors and perceive their changes in a scene, but also to maintain and refine their individual 3D structures over time. The validation of the visual motion estimation is supported by a dual-layered 3Dmapping framework in which we are able to store the geometric and abstract properties of the mapped entities or blobs, and determine which entities were moved in order to update the map to the actual scene state.

arXiv (Cornell University), Jun 20, 2018
In this work, we address a challenging problem of fine-grained and coarse-grained recognition of ... more In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multiclass classifiers, and binary classifiers with one-vs-one multiclass voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition.

arXiv (Cornell University), Jun 20, 2018
In this work, we address a challenging problem of fine-grained and coarse-grained recognition of ... more In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multiclass classifiers, and binary classifiers with one-vs-one multiclass voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition.

Context-Aware 3D Visualization of the Dynamic Environment
We present a graphical interface that provides context-dependent information about possible inter... more We present a graphical interface that provides context-dependent information about possible interactions with a complex cluttered environment, which can be used as a video overlay system in Head Mounted Devices (HMDs) in Augmented Reality applications. The system identifies task relevant objects in a cluttered scene, learns the typical interaction patterns with them in a teaching phase, and supports the user with information about the object functionality and handling in the interaction phase. The underlying systen is capable of identification of objects in cluttered environment, providing information about possible task-dependent grasps, tracking object motion with high frame-rate and prediction of possible actions from a learned action graph that represents possible handling variants for a specific object. The system can be used to train a novice in an unknown environment providing expert knowledge. We present our implementation of the system in a service scenario of a kitchen environment, where the system supports the user while performing the actions.

Context-Aware 3D Visualization of the Dynamic Environment
We present a graphical interface that provides context-dependent information about possible inter... more We present a graphical interface that provides context-dependent information about possible interactions with a complex cluttered environment, which can be used as a video overlay system in Head Mounted Devices (HMDs) in Augmented Reality applications. The system identifies task relevant objects in a cluttered scene, learns the typical interaction patterns with them in a teaching phase, and supports the user with information about the object functionality and handling in the interaction phase. The underlying systen is capable of identification of objects in cluttered environment, providing information about possible task-dependent grasps, tracking object motion with high frame-rate and prediction of possible actions from a learned action graph that represents possible handling variants for a specific object. The system can be used to train a novice in an unknown environment providing expert knowledge. We present our implementation of the system in a service scenario of a kitchen environment, where the system supports the user while performing the actions.

Direct Image Based Traffic Junction Crossing System for Autonomous Vehicles
One of the most common traffic scenario when navigating in urban area is the traffic junction. Cr... more One of the most common traffic scenario when navigating in urban area is the traffic junction. Crossing a traffic junction is not trivial for an autonomous vehicle as it needs to perform both scene understanding and decision making tasks. In this work we introduce a two-stage vision-based system for an autonomous vehicle that is capable of deciding when to cross a traffic junction safely. The first stage of the system consists of various convolutional neural network (CNN) models that are utilized to obtain information about the traffic junction. The information is then used in the second stage of the system to decide whether to cross the traffic junction. Here, it is represented as affordances and directly used by a Bayesian network to infer the final decision without the need for an environment model. The Bayesian network models the decision making process by taking into consideration the traffic rules associated with a traffic junction and avoiding collision with another traffic participant entering the traffic junction. We evaluated the feasibility of the system as well as the various components within it using real world data and achieved encouraging accuracy results. The results show the potential of the system to help autonomous vehicles to cross a traffic junction safely.

We propose an extension for a dynamic 3D model that allows a hieratchical labeling of continuous ... more We propose an extension for a dynamic 3D model that allows a hieratchical labeling of continuous interactions in scenes. While most systems focus on labels for pure transportation tasks, we show how Atlas information attached to objects identified in the scene can be used to label not only transportation tasks but also physical interactions, like writing, erasing a board, tightning a screw etc. We analyse the dynamic motion observed by a camera system at different abtraction levels ranging from simple motion primitives, over single physical actions to complete processes. The associated observation time horizons range from single turning motion on the screws tightened during a task over the process of inserting screws to the entire process of building a device. The complexity and the time horizon for possible predictions about actions in the scene increase with the abstraction level. We present the extension at the example of typical tasks observed by a camera, like writing and erasing a whiteboard.
Framework for consistent maintenance of geometric data and abstract task-knowledge from range observations
Abstract We present a framework for on-line exploration of object attributes from range data desi... more Abstract We present a framework for on-line exploration of object attributes from range data designed to include the cognitive aspects for surprise detection. In this framework we introduce a layered representation of the environment that couples the pure geometric 3D representation of the world to the abstract knowledge about the structures in the scene. This knowledge in the higher layer represents a-priori known, task-relevant information about structures in the world like mass, handling properties and grasping points being examples ...
Biologically Motivated Optical Flow-Based Navigation

Object-Centric Approach to Prediction and Labeling of Manipulation Tasks
We propose an object-centric framework to label and predict human manipulation actions from obser... more We propose an object-centric framework to label and predict human manipulation actions from observations of the object trajectories in 3D space. The goal is to lift the low-level sensor observation to a context specific human vocabulary. The low-level visual sensory input from a depth camera is processed into high-level descriptive action labels using a directed action graph representation. It is built based on the concepts of pre-computed Location Areas (LA), regions within a scene where an action typically occur, and Sector-Maps (SM), reference trajectories between the LAs. The framework consists of two stages, an offline teaching phase for graph generation, and an online action recognition phase that maps the current observations to the generated graph. This graph representation allows the framework to predict the most probable action from the observed motion in real-time and to adapt its structure whenever a new LA appears. Furthermore, the descriptive action labels enable not only a better exchange of information between a human and a robot but they allow also the robots to perform high-level reasoning. We present experimental results on real human manipulation actions using a system designed with this framework to show the performance of prediction and labeling that can be achieved.
Intelligent Robots and Systems, 2012
A robotic manipulation system that is supposed to replace a human operator needs to deal with a v... more A robotic manipulation system that is supposed to replace a human operator needs to deal with a variety of possible manipulation actions. These actions may be more or less constrained in their motion profile and in the accuracy of the transport goals. Some of this variation can be used to simplify the control and to optimize the base placement to improve the efficiency of the generated motion. We present an analysis tool that uses the abstraction of the human actions to generate path with efficient motion profiles. We compare the dynamics of a robot for different paths. We show in experiments that certain path properties are preferred to support efficient control and that the intuitive solution does not necessarily agree with the results optimizing for efficiency.

Visual Prediction of Driver Behavior in Shared Road Areas
We propose a framework to analyze and predict vehicles behavior within shared road segments like ... more We propose a framework to analyze and predict vehicles behavior within shared road segments like intersections or at narrow passages. The system first identifies critical interaction regions based on topological knowledge. It then checks possible colliding trajectories from the current state of vehicles in the scene, defined by overlapping occupation times in road segments. For each possible interaction area, it analyzes the behavioral profile of both vehicles. Depending on right of way and (unpredictable) behavior parameters, different outcomes are expected and will be tested against input. The interaction between vehicles is analyzed over a short time horizon based on an initial action from one vehicle and the reaction by the other. The vehicle to yield most often performs the first action and the response of the opponent vehicle is measured after a reaction time. The observed reaction is classified by attention, if there was a reaction at all, and the collaboration of the opponent vehicle, whether it helps to resolve the situation or hinders it. The output is a classification of behavior of involved vehicles in terms of active participation in the interaction and assertiveness of driving style in terms of collaborative or disruptive behavior. The additional knowledge is used to refine the prediction of intention and outcome of a scene, which is then compared to the current status to catch unexpected behavior. The applicability of the concept and ideas of the approach is validated on scenarios from the recent Intersection Drone (inD) data set.

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 23, 2022
Planning collision-free motions for robots with many degrees of freedom is challenging in environ... more Planning collision-free motions for robots with many degrees of freedom is challenging in environments with complex obstacle geometries. Recent work introduced the idea of speeding up the planning by encoding prior experience of successful motion plans in a neural network. However, this "neural motion planning" did not scale to complex robots in unseen 3D environments as needed for real-world applications. Here, we introduce "basis point set", well-known in computer vision, to neural motion planning as a modern compact environment encoding enabling efficient supervised training networks that generalize well over diverse 3D worlds. Combined with a new elaborate training scheme, we reach a planning success rate of 100 %. We use the network to predict an educated initial guess for an optimization-based planner (OMP), which quickly converges to a feasible solution, massively outperforming random multi-starts when tested on previously unseen environments. For the DLR humanoid Agile Justin with 19 DoF and in challenging obstacle environments, optimal paths can be generated in 200 ms using only a single CPU core. We also show a first successful real-world experiment based on a high-resolution world model from an integrated 3D sensor.
arXiv (Cornell University), Sep 7, 2017
We present a processing technique for a robust reconstruction of motion properties for single poi... more We present a processing technique for a robust reconstruction of motion properties for single points in large scale, dynamic environments. We assume that the acquisition camera is moving and that there are other independently moving agents in a large environment, like road scenarios. The separation of direction and magnitude of the reconstructed motion allows for robust reconstruction of the dynamic state of the objects in situations, where conventional binocular systems fail due to a small signal (disparity) from the images due to a constant detection error, and where structure from motion approaches fail due to unobserved motion of other agents between the camera frames. We present the mathematical framework and the sensitivity analysis for the resulting system.
This paper presents our approach for laser-based local position tracking based on the data explor... more This paper presents our approach for laser-based local position tracking based on the data explored in a three-dimensional environmental model of an indoor environment. This algorithm is used to substitute the dead reckoning on a mobile robot to allow robust map generation and position dependent task triggering. The underlying concept of the local environmental model used for filtering of the sensor information allows an easy fusion of different sources of the available information, like: a-priori knowledge, explored information and even fusion of the information from other sensor systems. This system is implemented and tested on our mobile robot.

arXiv (Cornell University), Sep 18, 2017
We present a direct method to calculate a 6DoF pose change of a monocular camera for mobile navig... more We present a direct method to calculate a 6DoF pose change of a monocular camera for mobile navigation. The calculated pose is estimated up to a constant unknown scale parameter that is kept constant over the entire reconstruction process. This method allows a direct calculation of the metric position and rotation without any necessity to fuse the information in a probabilistic approach over longer frame sequence as it is the case in most currently used VSLAM approaches. The algorithm provides two novel aspects to the field of monocular navigation. It allows a direct pose estimation without any a-priori knowledge about the world directly from any two images and it provides a quality measure for the estimated motion parameters that allows to fuse the resulting information in Kalman Filters. We present the mathematical formulation of the approach together with experimental validation on real scene images.
Scene Tentative object candidates Encapsulated 3D blobs Motion estimation An approach to consiste... more Scene Tentative object candidates Encapsulated 3D blobs Motion estimation An approach to consistently model and characterize potential object candidates presented in non-static scenes. Three principal procedures support our method: i) the segmentation of the captured range images into 3D clusters or blobs, by which we obtain a first gross impression of the spatial structure of the scene, ii) the maintenance and reliability of the map, which are obtained through the fusion of the captured and mapped data to which we assign a degree of existence (confidence value), iii) the visual motion estimation of potential object candidates, through the combination of the texture and 3Dspatial information, allows not only to update the state of the actors and perceive their changes in a scene, but also to maintain and refine their individual 3D structures over time.

IEEE Transactions on Medical Imaging, Aug 1, 2018
Isotropic three-dimensional (3D) acquisition is a challenging task in Magnetic Resonance Imaging ... more Isotropic three-dimensional (3D) acquisition is a challenging task in Magnetic Resonance Imaging (MRI). Particularly in cardiac MRI, due to hardware and time limitations, current 3D acquisitions are limited by low-resolution, especially in the through-plane direction, leading to poor image quality in that dimension. To overcome this problem, superresolution (SR) techniques have been proposed to reconstruct a single isotropic 3D volume from multiple anisotropic acquisitions. Previously, local regularization techniques such as Total Variation (TV) have been applied to limit noise amplification while preserving sharp edges and small features in the images. In this paper, inspired by the recent progress in patch-based reconstruction, we propose a novel isotropic 3D reconstruction scheme that integrates non-local and self-similarity information from 3D patch neighborhoods. By grouping 3D patches with similar structures, we enforce the natural sparsity of MR images, which can be expressed by a low rank structure, leading to robust image reconstruction with high-SNR efficiency. An Augmented Lagrangian formulation of the problem is proposed to efficiently decompose the optimization into a low-rank volume denoising and a SR reconstruction. Experimental results in simulations, brain imaging and clinical cardiac MRI, demonstrate that the proposed joint SR and self-similarity learning framework outperforms current state-of-the-art methods. The proposed reconstruction of isotropic 3D may be particularly useful for cardiac applications such as myocardial infarction scar assessment by late gadolinium enhancement MRI.
Uploads
Papers by Darius Burschka