Key research themes
1. What are the challenges and limitations of information extraction methods across different types of unstructured big data?
This line of research investigates the effectiveness and limitations of information extraction (IE) techniques when applied to various unstructured data types such as text, images, audio, and video, especially in the context of large-scale (big) data. Its importance stems from the need to transform heterogeneous, high-volume unstructured data into structured formats usable for analytics and decision-making. Understanding these challenges is crucial to improve IE systems' scalability, accuracy, and applicability across multidimensional unstructured datasets.
2. How can the usability of unstructured text in big data analytics be enhanced to improve insight extraction?
This research area focuses on understanding and improving the practical usability of unstructured textual data within big data analytics workflows. Researchers recognize that unstructured text presents unique technical and conceptual challenges that degrade usability in analytics contexts. Addressing these issues with models and validation techniques aims to optimize the process from raw data to insightful knowledge extraction by accounting for subjective intentions and contextual needs.
3. What algorithmic and machine learning methods can reconstruct or extract structure from partially structured or uncertain unstructured data?
This theme addresses the problem of restoring structure—a core challenge when dealing with messy, semi-structured, or corrupted unstructured datasets that lack consistent schemas. Techniques that infer latent table or relational structures using supervised or unsupervised machine learning provide a pathway to convert unstructured data into analyzable formats. This has broad implications where data export or storage processes disrupt original organization, necessitating intelligent recovery mechanisms.