Automatic data processing

description6,060 papers

group5 followers

lightbulbAbout this topic

Automatic data processing refers to the use of computer systems and software to perform data manipulation tasks without human intervention. This includes data collection, storage, analysis, and reporting, enabling efficient handling of large volumes of information and facilitating decision-making processes in various applications.

lightbulbAbout this topic

Key research themes

1. How can automated program synthesis techniques improve data completion and extraction tasks in tabular data?

This theme focuses on methods that leverage programming-by-example (PBE) and predictive synthesis to automate filling missing data, transforming raw tabular inputs into usable forms. Automating data completion reduces manual effort, making complex data wrangling accessible to non-experts and improving data usability across domains such as spreadsheets, databases, and web data extraction.

Synthesis of data completion scripts using finite tree automata

by Rishabh Singh

2022, Proceedings of the ACM on Programming Languages

Key finding: Proposes a novel program synthesis technique combining program sketching and PBE for automating complex data completion tasks on tabular datasets, requiring users to provide formula sketches and few input-output examples.... Read more

articleView Paper downloadDownload

Automated Data Extraction Using Predictive Program Synthesis

by Mohammad Suhail Raza

2023, Proceedings of the AAAI Conference on Artificial Intelligence

Key finding: Introduces a predictive program synthesis algorithm that infers extraction scripts from input-only examples, eliminating the need for explicit input-output examples in diverse domains such as text log extraction and web data... Read more

articleView Paper downloadDownload

2. What are effective data preprocessing strategies to address real-world data quality issues in automatic data analysis?

Addressing real-world data quality challenges—such as missing data, out-of-range values, inconsistencies, and incomplete records—is crucial for downstream analysis. This research theme covers systematic preprocessing methodologies integrating domain knowledge and iterative refinement, aiming to preserve valuable information while improving data integrity. Automated or semi-automated frameworks for detecting data issues and recommending suitable cleaning techniques help streamline the preprocessing pipeline and improve analytical outcomes.

Data preprocessing and intelligent data analysis

by Bharath Kumar

2021, Intelligent Data …

Key finding: Presents a comprehensive overview of common real-world data issues including missing values, outliers, and incomplete records, emphasizing the necessity of domain expertise and iterative feedback in preprocessing. Highlights... Read more

articleView Paper downloadDownload

Auto-Prep: Efficient and Automated Data Preprocessing Pipeline

by Rabiah Abdul Kadir

2024, IEEE Access

Key finding: Develops an automated Python-based preprocessing approach that detects common data problems (duplicates, missing values, categorical encoding, feature scaling) and recommends or applies optimal solutions with minimal user... Read more

articleView Paper downloadDownload

Data Preparation: A Technological Perspective and Review

by Pavel Pankin

2023, SN Computer Science

Key finding: Provides a detailed survey of data preparation functionalities including profiling, matching, mapping, format transformation, and data repair. Categorizes approaches as program-based, workflow-based, dataset-based, and... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can machine learning enhance reconstruction and cleaning of complex, heterogeneous tabular datasets?

Imperfect and messy datasets with missing columns, mixed delimiters, multi-valued attributes, and varying attribute orders present substantial data structuring challenges. This theme investigates machine learning–based algorithms to reconstruct original table schemas and accurately allocate data into columns, facilitating subsequent analysis. Domain-independent modular ML approaches offer scalable, robust solutions to recovering structured information from diverse noisy datasets.

Restoration of Data Structures Using Machine Learning Techniques

by Branislava Cvijetic

2023, IEEE Access

Key finding: Introduces STCExtract, an algorithm utilizing clustering (k-means, hierarchical) and other machine learning techniques to identify original table structures and allocate data to columns in messy delimited files without prior... Read more

articleView Paper downloadDownload

All papers in Automatic data processing

GENOTROUT-Apport des nouvelles technologies de séquençage (NGS) à l'analyse du génome de la truite arc-en-ciel (2010-2013)

by Olivier Jaillon

2025

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more

descriptionView Paper arrow_downwardDownload

Automatically Quantified Diffuse Excessive High Signal Intensity on MRI Predicts Cognitive Development in Preterm Infants

by Lili He

2025, Pediatric Neurology

Background-Cognitive and language impairments constitute the majority of disabilities observed in preterm infants. It remains unclear if diffuse excessive high signal intensity (DEHSI) on MRI at term represents delayed white matter... more

descriptionView Paper arrow_downwardDownload

Classification of signatures of Bovine Spongiform Encephalopathy in serum using infrared spectroscopy

by Thomas Udelhoven

2025, Analyst

Signatures of Bovine Spongiform Encephalopathy (BSE) have been identified in serum by means of ''Diagnostic Pattern Recognition (DPR)''. For DPR-analysis, mid-infrared spectroscopy of dried films of 641 serum samples was performed using... more

descriptionView Paper arrow_downwardDownload

Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications

by Shenheng Guan

2025, Molecular & Cellular Proteomics

The emergence of efficient fragmentation methods such as electron capture dissociation (ECD) or electron transfer dissociation (ETD) provide the opportunity for detailed structural characterization of heavily covalently modified large... more

descriptionView Paper arrow_downwardDownload

Potential Use of DNA Barcodes in Regulatory Science: Identification of the U.S. Food and Drug Administration's ``Dirty 22,'' Contributors to the Spread of Foodborne Pathogens

by Yolanda Jones

2025, Journal of Food Protection

The U.S. Food, Drug, and Cosmetic Act prohibits the distribution of food that is adulterated, and the regulatory mission of the U.S. Food and Drug Administration (FDA) is to enforce this Act. FDA field laboratories have identified the 22... more

descriptionView Paper arrow_downwardDownload

Evaluating the Efficacy of Innovative Coding System for Ceramic Restorations

by Rajani Kanth

2025

Background: Forensic odontology has evolved with the evolution of mankind. Since ages it has been of our interest to identify the dead. Because of events beyond our control many human beings may not die a natural death or in familiar... more

descriptionView Paper arrow_downwardDownload

Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data

by Mark Grimes

2025, Scientific data

Most tools developed to visualize hierarchically clustered heatmaps generate static images. Clustergrammer is a web-based visualization tool with interactive features such as: zooming, panning, filtering, reordering, sharing, performing... more

descriptionView Paper arrow_downwardDownload

Epidemiology of multiple congenital anomalies in Europe: A EUROCAT population‐based registry study

by Christine Verellen-dumoulin

2025, Birth Defects Research Part A: Clinical and Molecular Teratology

BackgroundThis study describes the prevalence, associated anomalies, and demographic characteristics of cases of multiple congenital anomalies (MCA) in 19 population‐based European registries (EUROCAT) covering 959,446 births in 2004 and... more

descriptionView Paper arrow_downwardDownload

Generalization Capacity of Handwritten Outlier Symbols Rejection with Neural Network

by Éric Anquetil

2025

Different problems of generalization of outlier rejection exist depending of the context. In this study we firstly define three different problems depending of the outlier availability during the learning phase of the classifier. Then we... more

descriptionView Paper arrow_downwardDownload

Acoustic-phonetic features for the automatic classification of stop consonants

by jan van der spiegel

2025, IEEE Transactions on Speech and Audio Processing

In this paper, the acoustic-phonetic characteristics of American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically... more

descriptionView Paper arrow_downwardDownload

A robust automated method to analyze rodent motion during fear conditioning

by David Bush

2025, Neuropharmacology

A central question in the study of LTP has been to determine what role it plays in memory formation and storage. One valuable form of learning for addressing this issue is associative fear conditioning. In this paradigm an animal learns... more

descriptionView Paper arrow_downwardDownload

A robust automated method to analyze rodent motion during fear conditioning

by David Bush

2025, Neuropharmacology

descriptionView Paper arrow_downwardDownload

Wedding biodiversity inventory of a large and complex Lepidoptera fauna with DNA barcoding

by Daniel Janzen

2025, Philosophical Transactions of the Royal Society B

By facilitating bioliteracy, DNA barcoding has the potential to improve the way the world relates to wild biodiversity. Here we describe the early stages of the use of cox1 barcoding to supplement and strengthen the taxonomic platform... more

descriptionView Paper arrow_downwardDownload

Defining growth in small pulmonary nodules using volumetry: results from a “coffee-break” CT study and implications for current nodule management guidelines

by Samuel Kemp

2025, European Radiology

Objectives An increase in lung nodule volume on serial CT may represent true growth or measurement variation. In nodule guidelines, a 25% increase in nodule volume is frequently used to determine that growth has occurred; this is based on... more

descriptionView Paper arrow_downwardDownload

Getting patients the services they need using a computer-assisted system for patient assessment and referral—CASPAR

by Karen Dugosh and

2025, Drug and Alcohol Dependence

The two goals of this technology transfer study were to: (1) increase the number and appropriateness of services received by substance abuse patients, and thereby (2) give clinical meaning and value to research-based assessment... more

descriptionView Paper arrow_downwardDownload

Modern Strategy ( ATR-IR ) Spectroscopy Technique to Distinguish Between Normal and Lung Cancer Tissue

by Ammar Alhasan

2025

♦Corresponding Author: Ammar Alhasan, MSc College of Pharmacy, AlMuthanna University, 66001, Samawah, AL-Muthanna, Iraq Email: ammar@knights.ucf.edu Introduction Detection of cancer is one of the challenges in medicine. Lung cancer is the... more

descriptionView Paper arrow_downwardDownload

Analysis and correction of spatial distortions produced by the gamma camera

by Chester Kylstra

2025, Journal of nuclear medicine : official publication, Society of Nuclear Medicine

descriptionView Paper arrow_downwardDownload

BIGODM System in the Social Media Mining for Health Applications Shared Task 2019

by Hong-jie Dai

2025

In this study, we describe our methods to automatically classify Twitter posts conveying events of adverse drug reaction (ADR). Based on our previous experience in tackling the ADR classification task, we empirically applied the... more

descriptionView Paper arrow_downwardDownload

Big Data: Survey, Technologies, Opportunities, and Challenges

by Abdullah Gani

2025, The Scientific World Journal

Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people... more

descriptionView Paper arrow_downwardDownload

Explaining artificial neural network ensembles: A case study with electrocardiograms from chest pain patients

by Mattias Ohlsson

2025, International Conference on Machine Learning

Artificial neural networks is one of the most commonly used machine learning algorithms in medical applications. However, they are still not used in practice in the clinics partly due to their lack of explanatory capacity. We compare two... more

descriptionView Paper arrow_downwardDownload

A comparative approach for the investigation of biological information processing: An examination of the structure and function of computer hard drives and DNA

by David telescope

2025, Theoretical Biology and Medical Modelling

Background: The robust storage, updating and utilization of information are necessary for the maintenance and perpetuation of dynamic systems. These systems can exist as constructs of metal-oxide semiconductors and silicon, as in a digital computer, or in the "wetware" of organic compounds, proteins and nucleic acids that make up biological organisms. We propose that there are essential functional properties of centralized information-processing systems; for digital computers these properties reside in the computer's hard drive, and for eukaryotic cells they are manifest in the DNA and associated structures. Methods: Presented herein is a descriptive framework that compares DNA and its associated proteins and sub-nuclear structure with the structure and function of the computer hard drive. We identify four essential properties of information for a centralized storage and processing system: (1) orthogonal uniqueness, (2) low level formatting, (3) high level formatting and (4) translation of stored to usable form. The corresponding aspects of the DNA complex and a computer hard drive are categorized using this classification. This is intended to demonstrate a functional equivalence between the components of the two systems, and thus the systems themselves. Results: Both the DNA complex and the computer hard drive contain components that fulfill the essential properties of a centralized information storage and processing system. The functional equivalence of these components provides insight into both the design process of engineered systems and the evolved solutions addressing similar system requirements. However, there are points where the comparison breaks down, particularly when there are externally imposed information-organizing structures on the computer hard drive. A specific example of this is the imposition of the File Allocation Table (FAT) during high level formatting of the computer hard drive and the subsequent loading of an operating system (OS). Biological systems do not have an external source for a map of their stored information or for an operational instruction set; rather, they must contain an organizational template conserved within their intra-nuclear architecture that "manipulates" the laws of chemistry and physics into a highly robust instruction set. We propose that the epigenetic structure of the intra-nuclear environment and the non-coding RNA may play the roles of a Biological File Allocation Table (BFAT) and biological operating system (Bio-OS) in eukaryotic cells.

descriptionView Paper arrow_downwardDownload

Barcoding Life to Conserve Biological Diversity: Beyond the Taxonomic Imperative

by David E Schindel

2025, PLoS Biology

The Community Page is a forum for organizations and societies to highlight their efforts to enhance the dissemination and value of scientific knowledge.

descriptionView Paper arrow_downwardDownload

Automation for Genomics, Part One: Preparation for Sequencing

by D. Meldrum

2025, Genome Research

In the past four years, automation for genomics has enabled a 43-fold increase in the total finished human genomic sequence in the world. This two-part noncomprehensive review will provide an overview of different types of automation... more

descriptionView Paper arrow_downwardDownload

A Barcoding Strategy Enabling Higher-Throughput Library Screening by Microscopy

by Austin Jones

2025, ACS Synthetic Biology

Dramatic progress has been made in the design and build phases of the design-build-test cycle for engineering cells. However, the test phase usually limits throughput, as many outputs of interest are not amenable to rapid analytical... more

descriptionView Paper arrow_downwardDownload

A Barcoding Strategy Enabling Higher-Throughput Library Screening by Microscopy

by Austin Jones

2025, ACS synthetic biology

descriptionView Paper arrow_downwardDownload

Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns

by Yukako Yamane

2025, Nature Neuroscience

descriptionView Paper arrow_downwardDownload

Measures of Location and Dispersion of Sleep State Distributions Within the Circular Frame of a 12:12 Light: Dark Schedule in the Rat

by U. Wyneken

2025, Sleep

Distributions within a 12: 12 light: dark schedule of wakefulness (W), active sleep (AS), quiet sleep (QS) and of QS rich in delta (QSD) and in spindle (QSS) activities were evaluated for 52 days from 15 rats. Angular statistics were... more

descriptionView Paper arrow_downwardDownload

Automatic endpoint detection to support the systematic review process

by Ana Lucic

2025, Journal of Biomedical Informatics

Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically... more

descriptionView Paper arrow_downwardDownload

Hospital blood bank laboratory data processing system

by J J Parekh

2025, Journal of Clinical Pathology

A data processing system designed to improve the management and usage of blood and blood products has been developed as a pilot for general application throughout the West Midlands Regional Health Authority. The package provides for the... more

descriptionView Paper arrow_downwardDownload

Classification of EEG signals using normal inverse Gaussian parameters in the dual-tree complex wavelet transform domain for seizure detection

by Anindya Bijoy Das

2025, Signal, Image and Video Processing

In this paper, a comprehensive method using symmetric normal inverse Gaussian (NIG) parameters of the sub-bands of EEG signals calculated in the dual-tree complex wavelet transformation domain is proposed for classifying EEG data. The... more

descriptionView Paper arrow_downwardDownload

The Cuticular Stress Detector (Csd2) of the Crayfish I. Physiological Properties

by Friedrich Barth

2025, The Journal of Experimental Biology

1. The cuticular stress detector (CSD2) is coupled to a patch of compliant cuticle in the ischiopodite. It is excited when the compliant cuticle is deformed directly or by deformation of the surrounding stiffer cuticle. 2. Sinusoidal... more

descriptionView Paper arrow_downwardDownload

MEMOPS: data modelling and automatic code generation

by Ernest Laue

2025

In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of... more

descriptionView Paper arrow_downwardDownload

A comparative study of supervised evaluation criteria for image segmentation

by Pierre Marché

2025, 2004 12th European Signal Processing Conference

This paper presents a comparative study of five supervised evaluation criteria for image segmentation. The different criteria have been tested on a selection of hundred images extracted from the c Corel database for which manual... more

descriptionView Paper arrow_downwardDownload

E-charting Point-of-Care Data Entry Dilemma

by cheryl stewart

2025, JONA: The Journal of Nursing Administration

Computerized charting and pointof-care testing came together in December 2007 on our nursing unit. When E-charting was implemented, results from glucometer readings went into cyberspace if the medical record numbers (MRNs) were not... more

descriptionView Paper arrow_downwardDownload

Clinical Utility and Analysis of the Run-Roll-Aim Task: Informing Return-to-Duty Readiness Decisions in Active-Duty Service Members

by Karen McCulloch

2025, Carolina Digital Repository (University of North Carolina at Chapel Hill)

different than trials 2,3 and 4 (F(3,47) = 4.60, p < 0.01, Tukey HSD p < 0.05) while the mTBI group showed no significant difference in time between trials. During testing individuals with mTBI were less likely to complete the multiple... more

descriptionView Paper arrow_downwardDownload

Creation of a Local Interface Terminology to SNOMED CT

by Adrian Gomez

2025

This paper describes the steps followed in the creation of a local Interface Terminology to SNOMED CT (as reference terminology) with a strong focus on user acceptability. The resulting list of terms is used for clinical data input by... more

descriptionView Paper arrow_downwardDownload

Regional changes in hippocampal T2 relaxation and volume: a quantitative magnetic resonance imaging study of hippocampal sclerosis

by Kim Mawhinney

2025, Journal of Neurology, Neurosurgery & Psychiatry

Objective-The principal MRI features of hippocampal sclerosis are volume loss and increased T2 weighted signal intensity. Minor and localised abnormalities may be overlooked without careful quantitation. Hippocampal T2 relaxation time... more

descriptionView Paper arrow_downwardDownload

Non-invasive imaging of cellulose microfibril orientation within plant cell walls by polarized Raman microspectroscopy

by Manfred Auer

2025, Biotechnology and bioengineering

Cellulose microfibrils represent the major scaffold of plant cell walls. Different packing and orientation of the microfibrils at the microscopic scale determines the macroscopic properties of cell walls and thus affect their functions... more

descriptionView Paper arrow_downwardDownload

Neonatal Ilio-Psoas Abscess: Report of Two Cases

by Minakshi Bhosale

2025, Journal of neonatal surgery

Ilio-psoas abscess (IPA) is rare in children and exceptional in the neonate. However, we recently managed two consecutive male neonates with right-sided IPA. The first baby was born two days after rupture of the membranes and had thick... more

descriptionView Paper arrow_downwardDownload

Cloud CPFP: A Shotgun Proteomics Data Analysis Pipeline Using Cloud and High Performance Computing

by Hamid Mirzaei

2025, Journal of Proteome Research

We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include... more

descriptionView Paper arrow_downwardDownload

Cost-effectiveness of a shared computerized decision support system for diabetes linked to electronic medical records

by Gord Blackhouse

2025, Journal of the American Medical Informatics Association

Background Computerized decision support systems (CDSSs) are believed to enhance patient care and reduce healthcare costs; however the current evidence is limited and the cost-effectiveness remains unknown. Objective To estimate the... more

descriptionView Paper arrow_downwardDownload

Writer identification using intra-stroke and inter-stroke information for security enhancements in P2P systems

by Jungpil Shin

2025, Peer-to-peer Networking and Applications

Chinese language has enormous number of characters and complicated stroke structures. So it is very difficult to efficiently and accurately identify a Chinese writer from his/ her handwritings. This paper proposes a novel writer... more

descriptionView Paper arrow_downwardDownload

Joint Action EUROCAT 2011-2013 Funded by the Public Health Programme 2008-2013 of the European Commission

by Ingeborg Barisic

2025

Congenital anomalies (CA) are an important cause of morbidity and mortality in infants and children. More than 100,000 children with CA are born in EU each year. The European Surveillance of Congenital Anomalies (EUROCAT) is a network of... more

descriptionView Paper arrow_downwardDownload

Measurement of children's asthma medication adherence by self report, mother report, canister weight, and Doser CT

by Marianne Wamboldt

2025, Annals of Allergy, Asthma & Immunology

Background: Accurate assessment of medication adherence has been difficult to achieve but is essential to drug evaluation in clinical trials and improved outcomes in clinical care. Objective: This study was conducted to compare four... more

descriptionView Paper arrow_downwardDownload

Circulating Brain-Derived Neurotrophic Factor Concentration Is Downregulated by Intralipid/Heparin Infusion or High-Fat Meal in Young Healthy Male Subjects

by Sadiq Hassan

2025, Diabetes Care

OBJECTIVE Insulin resistance and type 2 diabetes are associated with an increased risk of neurodegenerative diseases. Brain-derived neurotrophic factor (BDNF) regulates neuronal differentiation and synaptic plasticity, and its decreased... more

descriptionView Paper arrow_downwardDownload

High‐latitude seasonal variation of meteoric and nonmeteoric oblique propagation at a frequency of 45 MHz

by Jay Weitzen

2025, Radio Science

This paper examines the seasonal duty cycle variation of meteor and nonmeteor propagation at 45 MHz on two high‐latitude links. Software techniques for automatic data processing and analysis of the data are discussed. It is shown that for... more

descriptionView Paper arrow_downwardDownload

The COGITAT holeboard system as a valuable tool to assess learning, memory and activity in mice

by Sarah D Kittel-Schneider

2025, Behavioural Brain Research

The comprehensive and stress-free assessment of various aspects of learning and memory is a prerequisite to evaluate mouse models for neuropsychiatric disorders such as Alzheimer's disease or attention deficit/hyperactivity disorder... more

descriptionView Paper arrow_downwardDownload

Detecting emotional expression in face-to-face and online breast cancer support groups

by Maya Yutsis

2025, Journal of Consulting and Clinical Psychology

Accurately detecting emotional expression in women with primary breast cancer participating in support groups may be important for therapists and researchers. In 2 small studies (N ϭ 20 and N ϭ 16), the authors examined whether video... more

descriptionView Paper arrow_downwardDownload

Relation Between Digital Peripheral Arterial Tonometry and Brachial Artery Ultrasound Measures of Vascular Function in Patients With Coronary Artery Disease and in Healthy Volunteers

by Bich Tran

2025, The American Journal of Cardiology

Digital peripheral arterial tonometry (PAT) is an emerging, noninvasive method to assess vascular function. The physiology underlying this phenotype, however, remains unclear. Therefore, we evaluated the relation between digital PAT and... more

descriptionView Paper arrow_downwardDownload

Automated identification of pneumonia in chest radiograph reports in critically ill patients

by Vincent Liu

2025, BMC Medical Informatics and Decision Making

Background: Prior studies demonstrate the suitability of natural language processing (NLP) for identifying pneumonia in chest radiograph (CXR) reports, however, few evaluate this approach in intensive care unit (ICU) patients. Methods:... more

descriptionView Paper arrow_downwardDownload