Generation and Evaluation of Tabular Data in Different Domains Using Gans

Persevearance Marecha *

School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China.

Lu Ye

School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China.

*Author to whom correspondence should be addressed.


Abstract

Deep learning techniques like Generative Adversarial Networks (GANs) provide solutions in many domains where real data needs to be kept private. Synthesizing tabular data is difficult because of its high complexity. Tabular data usually contains a mixture of discrete and continuous data, which is not an easy model to build. The contributions made in this paper include training and generating data with the original Vanilla Gan, then CGan and WGan-Gp and WCGan-Gp which performs better than the former. The Adult Income Census dataset mainly focuses on predicting whether income exceeds 50,000 per year based on census data, then comparing the accuracy of machine learning models and calculating the F1 scores. Then the use of TimeGan on the stock dataset, comparing synthetic data vs real data. This paper will explore the use of GANs for generating and evaluating tabular data in different domains.

Keywords: Generative adversarial networks, tabular data, synthetic


How to Cite

Marecha, P., & Ye, L. (2023). Generation and Evaluation of Tabular Data in Different Domains Using Gans. Asian Journal of Research in Computer Science, 16(1), 15–27. https://doi.org/10.9734/ajrcos/2023/v16i1331

Downloads

Download data is not yet available.

References

Saman Motamed, Patrik Rogalla, Farzad Khalvati,Data augmentation using Generative Adversarial Networks (GANs) for GAN-based detection of Pneumonia and COVID-19 in chest X-ray images,Informatics in Medicine Unlocked. 2021;27:100779. ISSN 2352-9148 Available: https://doi.org/10.1016/j.imu.2021.100779.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. arXiv:1406.2661 [cs, stat]. 2014. arXiv: 1406.2661.

Raschka S, Patterson J, Nolet C. Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence. Information. 2020;11:193.Available: https://doi.org/10.3390/info11040193

Sinaga KP, Yang MS. Unsupervised K-Means Clustering Algorithm. in IEEE Access. 2020;8:80716-80727. doi:10.1109/ACCESS.2020.2988796.

Dong, Shi, Ping Wang, and Khushnood Abbas. ”A survey on deep learning and its applications.” Computer Science Review. 2021;40:100379.

Vivek Harsha Vardhan, Stanley KokSynthetic. Tabular Data Generation with Oblivious Variational

Autoencoders: Alleviating the Paucity of Personal Tabular Data for Open Research; 2020.

Guanyue Li, Qianfen Jiao, Sheng Qian, Si Wu1, and Hau-San Wong. High Fidelity GAN Inversion via Prior Multi-Subspace Feature Composition. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21); 2021.

Mohammad Esmaeilpourxy, Nourhene Chaaliay, Adel Abusittaz, Franc¸ois-Xavier Devaillyy, Wissem Maazouny, Patrick Cardinal. RCCGAN:Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis; 2022.

Rick Sauber-Cole, Taghi M. Khoshgoftaar. The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey. Journal of Big Data. 2022;9:Article number: 98.

Aggarwal, Alankrita, Mamta Mittal, and Gopi Battineni. ”Generative adversarial network: An overview of theory and applications.” International Journal of Information Management Data Insights. 2021;1(1): 100004.

Arjovsky M, Chintala S, Bottou L. Wasserstein GAN; 2017.

Manhar Walia, Brendan Tierney, and Susan McKeever Synthesising Tabular Data using Wasserstein Conditional GANs with Gradient Penalty (WCGAN-GP); 2020.

Massart E. Improving weight clipping in Wasserstein GANs,” 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada. 2022; 2286-2292. DOI: 10.1109/ICPR56361.2022.9956056.

Skandarani Y, Jodoin PM, Lalande A. GANs for Medical Image Synthesis: An Empirical Study. J. Imaging. 2023;9:69. Available: https://doi.org/10.3390/jimaging9030069

Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. Tadgan: Time series anomaly detection using generative adversarial networks. In2020 IEEE International Conference on Big Data (Big Data) 2020 Dec 10 (pp. 33-43). IEEE. DOI: 10.1109/BigData50022.2020.9378139.

Qing Li, Xinyan Zhang, Tianjiao Ma, Dagui Liu, Heng Wang, Wei Hu. A Multi-step ahead photovoltaic power forecasting model based on Time GAN, Soft DTW-based K-medoids clustering, and a CNN-GRU hybrid neural network. Available: https://doi.org/10.1016/j.egyr.2022.08.180

Eoin Brophy, Zhengwei Wang, Qi She, Tom´as Ward. Generative adversarial networks in time series: a survey and taxonomy; 2021.

Ulrike Faltings, Tobias Bettinger, Swen Barth and Michael Sch¨ afer. Impact on Inference Model Performance for ML Tasks Using Real-Life Training Data and Synthetic Training Data from GANs. Information. 2022;13:9. Available: https://doi.org/10.3390/info13010009 .2022.

Guoyun Lv, Syed Muhammad Israr, and Shengyong Qi Multi-Style Unsupervised Image Synthesis Using Generative Adversarial Nets .Digital Object Identifier; 2021. DOI:10.1109/ACCESS.2021.3087665.

Pavitha N, Sugave S. Ensemble Approach with Hyperparameter Tuning for Credit Worthiness Prediction. In2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT). 2022;1-

IEEE. DOI: 10.1109/GCAT55367.2022.9971879.

Stavroula Bourou, Andreas El Saer, Terpsichori- Helen Velivassaki, Artemis Voulkidis and Theodore Zahariadis. A Review of Tabular Data Synthesis Using GANs on an IDS Dataset. Information. 2021;12:375.

Pavitha N, Atharva Bakde, Shantanu Avhad, Isha Korate, Shaunak Mahajan, Rudraksha Padole. ”Brain Tumor Classification using Machine Learning; 2021.