IEEE transactions on visualization and computer graphics, 2024
Figure 1: Left: top-down view of a 3D point cloud reconstruction with inaccurate SUE data visuali... more Figure 1: Left: top-down view of a 3D point cloud reconstruction with inaccurate SUE data visualized for different infrastructure types as simple line strings. Middle: Segmentation results from virtual cameras (top), highlighting inaccurate (red) and aligned (green) infrastructure data. Right: fitted 3D infrastructure model shown on a top-down view of a 3D point cloud reconstruction.
Indoor reconstruction using depth camera algorithms (e.g., InfiniTAMv3) is becoming increasingly ... more Indoor reconstruction using depth camera algorithms (e.g., InfiniTAMv3) is becoming increasingly popular. Simple reconstruction methods solely use the frames of the depth camera, leaving any imagery from the adjunct RGB camera untouched. Recent approaches also incorporate color camera information to improve consistency. However, the results heavily depend on the accuracy of the rig calibration, which can strongly vary in quality. Unfortunately, any errors in the rig calibration result in apparent visual discrepancies when it comes to colorization of the 3D reconstruction. We propose an easy approach to fix this issue for the purpose of image-based rendering. We show that a relatively simple warping function can be calculated from a 3D checkerboard pattern for a rig with poor calibration between cameras. The warping is applied to the RGB images online during reconstruction, leading to a significantly improved visual result.
In this paper, we present the maintenance assistance user interface (MAUI), a novel approach for ... more In this paper, we present the maintenance assistance user interface (MAUI), a novel approach for providing tele-assistance to a worker charged with maintenance of a cyber-physical system. Such a system comprises both physical and digital interfaces, making it challenging for a worker to understand the required steps and to assess work progress. A remote expert can access the digital interfaces and provide the worker with timely information and advice in an augmented reality display. The remote expert has full control over the user interface of the worker in a manner comparable to remote desktop systems. The worker needs to perform all physical operations and retrieve physical information, such as reading physical labels or meters. Thus, worker and remote expert collaborate not only via shared audio, video or pointing, but also share control of the digital interface presented in the augmented reality space. We report results on two studies: The first study evaluates the benefits of our system against a condition with the same cyber-physical interface, but without tele-assistance. Results indicate significant benefits concerning speed, cognitive load and subjective comfort of the worker. The second study explores how interface designers use our system, leading to initial design guidelines for tele-presence interfaces like ours.
Augmented reality for medical applications allows physicians to obtain an inside view into the pa... more Augmented reality for medical applications allows physicians to obtain an inside view into the patient without surgery. In this context, we present an augmented reality application running on a standard smartphone or tablet computer, providing visualizations of medical image data, overlaid with the patient, in a video see-through fashion. Our system is based on the registration of medical imaging data to the patient using a single 2D photograph of the patient. From this image, a 3D model of the patient's face is reconstructed using a convolutional neural network, to which a pre-operative CT scan is automatically registered. For efficient processing, this is performed on a server PC. Finally, anatomical and pathological information is sent back to the mobile device and can be displayed, accurately registered with the live patient, on the screen. Hence, our cost-effective, markerless approach needs only a smartphone and a server PC for image processing. We present a qualitative and quantitative evaluation using real patient photos and CT from the clinical routine in facial surgery, reporting overall processing times and registration errors.
Indoor reconstruction using depth camera algorithms (e.g., InfiniTAMv3) is becoming increasingly ... more Indoor reconstruction using depth camera algorithms (e.g., InfiniTAMv3) is becoming increasingly popular. Simple reconstruction methods solely use the frames of the depth camera, leaving any imagery from the adjunct RGB camera untouched. Recent approaches also incorporate color camera information to improve consistency. However, the results heavily depend on the accuracy of the rig calibration, which can strongly vary in quality. Unfortunately, any errors in the rig calibration result in apparent visual discrepancies when it comes to colorization of the 3D reconstruction. We propose an easy approach to fix this issue for the purpose of image-based rendering. We show that a relatively simple warping function can be calculated from a 3D checkerboard pattern for a rig with poor calibration between cameras. The warping is applied to the RGB images online during reconstruction, leading to a significantly improved visual result.
Figure 1: Collaborative Tracking and Mapping. Top: Keyframes of individual SLAM clients observing... more Figure 1: Collaborative Tracking and Mapping. Top: Keyframes of individual SLAM clients observing the same scene simultaneously. Middle: Sparse point map created by the server using the keyframes from four clients. Bottom: Densified server point cloud reconstruction of the scene.
In contrast to indoor tracking using computer vision, which has reached a good amount of maturity... more In contrast to indoor tracking using computer vision, which has reached a good amount of maturity, outdoor tracking still suffers from comparably poor localization on a global scale. Smartphones and other commodity devices contain consumer-grade sensors for GPS, compass and inertial measurements, which are not accurate enough for augmented reality (AR) in most situations. This restricts what AR can offer to application areas such as surveying or building constructions. We present a self-contained localization device which connects wirelessly to any AR device, such as a smartphone or headset. The device gives centimeter-level accuracy and can be built out of commercial-of-the-shelf components for less than 500 EUR. We demonstrate the performance of the localization device using a variety of position and orientation sensing benchmarks.
In this work we present a mobile computer vision system which simplifies the task of identifying ... more In this work we present a mobile computer vision system which simplifies the task of identifying pharmaceutical pills. A single input image of pills on a special markerbased target is processed by an efficient method for object segmentation on structured background. Estimators for the object properties size, shape and color deliver parameters that can be used for querying an online database about an unknown pill. A prototype application is constructed using the Studierstube ES framework, which allows to perform pill recognition on off-the-shelf mobile phones. System runtime and retrieval performance with the estimated features is subsequently evaluated on a realistic test set. The retrieval performance on the exemplarily used Identa database confirms that the system can facilitate the task of mobile pill recognition in a realistic scenario.
We propose an efficient method for estimating the motion of a multi-camera rig from a minimal set... more We propose an efficient method for estimating the motion of a multi-camera rig from a minimal set of feature correspondences. Existing methods for solving the multi-camera relative pose problem require extra correspondences, are slow to compute, and/or produce a multitude of solutions. Our solution uses a first-order approximation to relative pose in order to simplify the problem and produce an accurate estimate quickly. The solver is applicable to sequential multi-camera motion estimation and is fast enough for real-time implementation in a random sampling framework. Our experiments show that our approach is both stable and efficient on challenging test sequences.
Creating panoramic images in real-time is an expensive operation for mobile devices. Depending on... more Creating panoramic images in real-time is an expensive operation for mobile devices. Depending on the size of the camera image the mapping of individual pixels into the panoramic image is one of the most time consuming parts. This part is the main focus in this paper and will be discussed in detail. To speed things up and to allow larger images the pixel-mapping process is transferred from the Central Processing Unit (CPU) to the Graphics Processing Unit (GPU). The independence of pixels being projected on the panoramic image allows OpenGL shaders to do the mapping very efficiently. Different approaches of the pixel-mapping process are demonstrated and confronted with an existing solution. The application is implemented for Android phones and works fluently on current generation devices.
Communications in Computer and Information Science, 2019
Performing accurate measurements on non-planar targets using a robotic total station in reflector... more Performing accurate measurements on non-planar targets using a robotic total station in reflectorless mode is prone to errors. Besides requiring a fully reflected laser beam of the electronic distance meter, a proper orientation of the pan-tilt unit is required for each individual accurate 3D point measurement. Dominant physical 3D structures like corners and edges often don't fulfill these requirements and are not directly measurable. In this work, three algorithms and user interfaces are evaluated through simulation and physical measurements for simple and efficient construction-side measurement correction of systematic errors. We incorporate additional measurements close to the non-measurable target, and our approach does not require any post-processing of single-point measurements. Our experimental results prove that the systematic error can be lowered by almost an order of magnitude by using support geometries, i.e. incorporating a 3D point, a 3D line or a 3D plane as additional measurements.
IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society, 2018
The accurate registration of a robotic total station with respect to a given CAD model is a cruci... more The accurate registration of a robotic total station with respect to a given CAD model is a crucial task in the construction industry. Common registration techniques rely on a reference network of control points in the CAD model. One must establish correspondences between control points in the CAD model and measured points in the field. Usually physical markers or natural points of interest are selected as control points. We present a user-guided algorithm for simple and efficient registration of a robotic total station with a CAD model in indoor environments without the need for control points. The user interaction is reduced to selecting a local Manhattan-like corner structure for initial model alignment; accurate registration of the device is carried out automatically. Our algorithm relies on angle and distance measurements only and, therefore, is not limited to vision based robotic total stations. In particular, we propose a new algorithm for robust Manhattan corner extraction.
Augmented reality for medical applications allows physicians to obtain an inside view into the pa... more Augmented reality for medical applications allows physicians to obtain an inside view into the patient without surgery. In this context, we present an augmented reality application running on a standard smartphone or tablet computer, providing visualizations of medical image data, overlaid with the patient, in a video see-through fashion. Our system is based on the registration of medical imaging data to the patient using a single 2D photograph of the patient. From this image, a 3D model of the patient’s face is reconstructed using a convolutional neural network, to which a pre-operative CT scan is automatically registered. For efficient processing, this is performed on a server PC. Finally, anatomical and pathological information is sent back to the mobile device and can be displayed, accurately registered with the live patient, on the screen. Hence, our cost-effective, markerless approach needs only a smartphone and a server PC for image processing. We present a qualitative and qu...
2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
Finding high-level semantic information from a point cloud is a challenging task, and it can be u... more Finding high-level semantic information from a point cloud is a challenging task, and it can be used in various applications. For instance, it is useful to compactly represent the scene structure and efficiently understand the scene context. This task is even more challenging when using a hand-held monocular visual SLAM system that outputs a noisy sparse point cloud. In order to tackle this issue, we propose an incremental primitive modeling method using both geometric and statistical analyses for such point cloud. The main idea is to select only reliably-modeled shapes by analyzing the geometric relationship between the point cloud and the estimated shapes. Besides that, a statistical evaluation is incorporated to filter wrongly-detected primitives in a noisy point cloud. As a result of this processing, our approach largely improved precision when compared with state of the art methods. We also show the impact of segmenting and representing a scene using primitives instead of a poi...
Uploads
Papers by Clemens Arth