Key research themes
1. How can Generative Adversarial Networks (GANs) ensure privacy-preserving synthetic tabular data generation with utility retention?
This theme investigates the application of GAN architectures tailored for tabular data synthesis that balances privacy preservation—preventing re-identification, attribute disclosure, and membership inference attacks—with maintaining data utility, particularly model compatibility for downstream machine learning tasks. It matters because tabular data is ubiquitous and often sensitive, requiring synthetic versions that do not compromise privacy yet allow model training comparable to original data.
2. What are effective strategies to generate high-quality synthetic data for domain-specific applications with limited real data?
This theme explores synthetic data generation techniques tailored to particular domains like healthcare, social networks, fraud detection, and specialized imaging systems where real data are scarce, confidential, or costly to obtain. The focus is on methodologies that capture domain-specific characteristics and structure, providing realistic, privacy-preserving datasets suitable for system training, testing, and validation. This is crucial for advancing AI applications in domains constrained by data availability.
3. How can synthetic data generation techniques address small data and class imbalance problems in machine learning tasks?
This theme focuses on algorithmic strategies for producing synthetic data that augment scarce or imbalanced datasets to enhance machine learning model performance. It includes oversampling methods based on geometric approaches, GAN variants, and AI-driven text generation for minority class augmentation. The significance lies in enabling reliable model training where data is limited or skewed, such as small sample sizes, infrequent classes, or underrepresented emotional states.