Academia.eduAcademia.edu

Automatic data processing

description6,060 papers
group5 followers
lightbulbAbout this topic
Automatic data processing refers to the use of computer systems and software to perform data manipulation tasks without human intervention. This includes data collection, storage, analysis, and reporting, enabling efficient handling of large volumes of information and facilitating decision-making processes in various applications.
lightbulbAbout this topic
Automatic data processing refers to the use of computer systems and software to perform data manipulation tasks without human intervention. This includes data collection, storage, analysis, and reporting, enabling efficient handling of large volumes of information and facilitating decision-making processes in various applications.

Key research themes

1. How can automated program synthesis techniques improve data completion and extraction tasks in tabular data?

This theme focuses on methods that leverage programming-by-example (PBE) and predictive synthesis to automate filling missing data, transforming raw tabular inputs into usable forms. Automating data completion reduces manual effort, making complex data wrangling accessible to non-experts and improving data usability across domains such as spreadsheets, databases, and web data extraction.

Key finding: Proposes a novel program synthesis technique combining program sketching and PBE for automating complex data completion tasks on tabular datasets, requiring users to provide formula sketches and few input-output examples.... Read more
Key finding: Introduces a predictive program synthesis algorithm that infers extraction scripts from input-only examples, eliminating the need for explicit input-output examples in diverse domains such as text log extraction and web data... Read more

2. What are effective data preprocessing strategies to address real-world data quality issues in automatic data analysis?

Addressing real-world data quality challenges—such as missing data, out-of-range values, inconsistencies, and incomplete records—is crucial for downstream analysis. This research theme covers systematic preprocessing methodologies integrating domain knowledge and iterative refinement, aiming to preserve valuable information while improving data integrity. Automated or semi-automated frameworks for detecting data issues and recommending suitable cleaning techniques help streamline the preprocessing pipeline and improve analytical outcomes.

Key finding: Presents a comprehensive overview of common real-world data issues including missing values, outliers, and incomplete records, emphasizing the necessity of domain expertise and iterative feedback in preprocessing. Highlights... Read more
Key finding: Develops an automated Python-based preprocessing approach that detects common data problems (duplicates, missing values, categorical encoding, feature scaling) and recommends or applies optimal solutions with minimal user... Read more
Key finding: Provides a detailed survey of data preparation functionalities including profiling, matching, mapping, format transformation, and data repair. Categorizes approaches as program-based, workflow-based, dataset-based, and... Read more

3. How can machine learning enhance reconstruction and cleaning of complex, heterogeneous tabular datasets?

Imperfect and messy datasets with missing columns, mixed delimiters, multi-valued attributes, and varying attribute orders present substantial data structuring challenges. This theme investigates machine learning–based algorithms to reconstruct original table schemas and accurately allocate data into columns, facilitating subsequent analysis. Domain-independent modular ML approaches offer scalable, robust solutions to recovering structured information from diverse noisy datasets.

Key finding: Introduces STCExtract, an algorithm utilizing clustering (k-means, hierarchical) and other machine learning techniques to identify original table structures and allocate data to columns in messy delimited files without prior... Read more

All papers in Automatic data processing

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
Background-Cognitive and language impairments constitute the majority of disabilities observed in preterm infants. It remains unclear if diffuse excessive high signal intensity (DEHSI) on MRI at term represents delayed white matter... more
Signatures of Bovine Spongiform Encephalopathy (BSE) have been identified in serum by means of ''Diagnostic Pattern Recognition (DPR)''. For DPR-analysis, mid-infrared spectroscopy of dried films of 641 serum samples was performed using... more
The emergence of efficient fragmentation methods such as electron capture dissociation (ECD) or electron transfer dissociation (ETD) provide the opportunity for detailed structural characterization of heavily covalently modified large... more
The U.S. Food, Drug, and Cosmetic Act prohibits the distribution of food that is adulterated, and the regulatory mission of the U.S. Food and Drug Administration (FDA) is to enforce this Act. FDA field laboratories have identified the 22... more
Background: Forensic odontology has evolved with the evolution of mankind. Since ages it has been of our interest to identify the dead. Because of events beyond our control many human beings may not die a natural death or in familiar... more
Most tools developed to visualize hierarchically clustered heatmaps generate static images. Clustergrammer is a web-based visualization tool with interactive features such as: zooming, panning, filtering, reordering, sharing, performing... more
BackgroundThis study describes the prevalence, associated anomalies, and demographic characteristics of cases of multiple congenital anomalies (MCA) in 19 population‐based European registries (EUROCAT) covering 959,446 births in 2004 and... more
Different problems of generalization of outlier rejection exist depending of the context. In this study we firstly define three different problems depending of the outlier availability during the learning phase of the classifier. Then we... more
In this paper, the acoustic-phonetic characteristics of American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically... more
A central question in the study of LTP has been to determine what role it plays in memory formation and storage. One valuable form of learning for addressing this issue is associative fear conditioning. In this paradigm an animal learns... more
A central question in the study of LTP has been to determine what role it plays in memory formation and storage. One valuable form of learning for addressing this issue is associative fear conditioning. In this paradigm an animal learns... more
By facilitating bioliteracy, DNA barcoding has the potential to improve the way the world relates to wild biodiversity. Here we describe the early stages of the use of cox1 barcoding to supplement and strengthen the taxonomic platform... more
Objectives An increase in lung nodule volume on serial CT may represent true growth or measurement variation. In nodule guidelines, a 25% increase in nodule volume is frequently used to determine that growth has occurred; this is based on... more
by Karen Dugosh and 
1 more
The two goals of this technology transfer study were to: (1) increase the number and appropriateness of services received by substance abuse patients, and thereby (2) give clinical meaning and value to research-based assessment... more
♦Corresponding Author: Ammar Alhasan, MSc College of Pharmacy, AlMuthanna University, 66001, Samawah, AL-Muthanna, Iraq Email: ammar@knights.ucf.edu Introduction Detection of cancer is one of the challenges in medicine. Lung cancer is the... more
In this study, we describe our methods to automatically classify Twitter posts conveying events of adverse drug reaction (ADR). Based on our previous experience in tackling the ADR classification task, we empirically applied the... more
Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people... more
Artificial neural networks is one of the most commonly used machine learning algorithms in medical applications. However, they are still not used in practice in the clinics partly due to their lack of explanatory capacity. We compare two... more
Background: The robust storage, updating and utilization of information are necessary for the maintenance and perpetuation of dynamic systems. These systems can exist as constructs of metal-oxide semiconductors and silicon, as in a... more
The Community Page is a forum for organizations and societies to highlight their efforts to enhance the dissemination and value of scientific knowledge.
In the past four years, automation for genomics has enabled a 43-fold increase in the total finished human genomic sequence in the world. This two-part noncomprehensive review will provide an overview of different types of automation... more
Dramatic progress has been made in the design and build phases of the design-build-test cycle for engineering cells. However, the test phase usually limits throughput, as many outputs of interest are not amenable to rapid analytical... more
Dramatic progress has been made in the design and build phases of the design-build-test cycle for engineering cells. However, the test phase usually limits throughput, as many outputs of interest are not amenable to rapid analytical... more
Distributions within a 12: 12 light: dark schedule of wakefulness (W), active sleep (AS), quiet sleep (QS) and of QS rich in delta (QSD) and in spindle (QSS) activities were evaluated for 52 days from 15 rats. Angular statistics were... more
Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically... more
A data processing system designed to improve the management and usage of blood and blood products has been developed as a pilot for general application throughout the West Midlands Regional Health Authority. The package provides for the... more
In this paper, a comprehensive method using symmetric normal inverse Gaussian (NIG) parameters of the sub-bands of EEG signals calculated in the dual-tree complex wavelet transformation domain is proposed for classifying EEG data. The... more
1. The cuticular stress detector (CSD2) is coupled to a patch of compliant cuticle in the ischiopodite. It is excited when the compliant cuticle is deformed directly or by deformation of the surrounding stiffer cuticle. 2. Sinusoidal... more
In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of... more
This paper presents a comparative study of five supervised evaluation criteria for image segmentation. The different criteria have been tested on a selection of hundred images extracted from the c Corel database for which manual... more
Computerized charting and pointof-care testing came together in December 2007 on our nursing unit. When E-charting was implemented, results from glucometer readings went into cyberspace if the medical record numbers (MRNs) were not... more
different than trials 2,3 and 4 (F(3,47) = 4.60, p < 0.01, Tukey HSD p < 0.05) while the mTBI group showed no significant difference in time between trials. During testing individuals with mTBI were less likely to complete the multiple... more
This paper describes the steps followed in the creation of a local Interface Terminology to SNOMED CT (as reference terminology) with a strong focus on user acceptability. The resulting list of terms is used for clinical data input by... more
Objective-The principal MRI features of hippocampal sclerosis are volume loss and increased T2 weighted signal intensity. Minor and localised abnormalities may be overlooked without careful quantitation. Hippocampal T2 relaxation time... more
Cellulose microfibrils represent the major scaffold of plant cell walls. Different packing and orientation of the microfibrils at the microscopic scale determines the macroscopic properties of cell walls and thus affect their functions... more
Ilio-psoas abscess (IPA) is rare in children and exceptional in the neonate. However, we recently managed two consecutive male neonates with right-sided IPA. The first baby was born two days after rupture of the membranes and had thick... more
We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include... more
Background Computerized decision support systems (CDSSs) are believed to enhance patient care and reduce healthcare costs; however the current evidence is limited and the cost-effectiveness remains unknown. Objective To estimate the... more
Chinese language has enormous number of characters and complicated stroke structures. So it is very difficult to efficiently and accurately identify a Chinese writer from his/ her handwritings. This paper proposes a novel writer... more
Congenital anomalies (CA) are an important cause of morbidity and mortality in infants and children. More than 100,000 children with CA are born in EU each year. The European Surveillance of Congenital Anomalies (EUROCAT) is a network of... more
Background: Accurate assessment of medication adherence has been difficult to achieve but is essential to drug evaluation in clinical trials and improved outcomes in clinical care. Objective: This study was conducted to compare four... more
OBJECTIVE Insulin resistance and type 2 diabetes are associated with an increased risk of neurodegenerative diseases. Brain-derived neurotrophic factor (BDNF) regulates neuronal differentiation and synaptic plasticity, and its decreased... more
This paper examines the seasonal duty cycle variation of meteor and nonmeteor propagation at 45 MHz on two high‐latitude links. Software techniques for automatic data processing and analysis of the data are discussed. It is shown that for... more
The comprehensive and stress-free assessment of various aspects of learning and memory is a prerequisite to evaluate mouse models for neuropsychiatric disorders such as Alzheimer's disease or attention deficit/hyperactivity disorder... more
Accurately detecting emotional expression in women with primary breast cancer participating in support groups may be important for therapists and researchers. In 2 small studies (N ϭ 20 and N ϭ 16), the authors examined whether video... more
Digital peripheral arterial tonometry (PAT) is an emerging, noninvasive method to assess vascular function. The physiology underlying this phenotype, however, remains unclear. Therefore, we evaluated the relation between digital PAT and... more
Background: Prior studies demonstrate the suitability of natural language processing (NLP) for identifying pneumonia in chest radiograph (CXR) reports, however, few evaluate this approach in intensive care unit (ICU) patients. Methods:... more
Download research papers for free!