For further information and to register for the upcoming workshop in our series , please visit the following link.
Andrew Kennedy & Prof. Richard Everson of the DDRC’s Synthetic Data research team.
As part of the Synthetic Data Workshop Series.
Watch this inaugural workshop at the University of Exeter, focusing on the evolving and intriguing domain of Synthetic Data. This series welcomes a diverse set of speakers and audience members from academia, industry, and governmental sectors.
Synthetic Data is an evolving field of data science, pushing the bounds of what is possible with AI and machine learning. This seminar introduces our series on synthetic data, with an overview of the topic, and a look at research currently underway.
We focus on use cases of synthetic data for data augmentation, and for privacy, looking at current and emerging methods for synthetic data creation and validation. Synthetic data for privacy enables private data to be shared anonymously while retaining key characteristics and statistical features. Augmenting datasets allows machine learning models to be trained on larger and broader datasets, addressing imbalances and minimising the requirement for real-data collection.
This is an opportunity for learning and engagement amongst professionals interested and working with the science of synthetic data.
This session’s agenda is as follows:
- What is synthetic data?
- Algorithmically generated
- Aim to solve data science tasks/train other models
- Different data types
- E.g. images, audio, other media, tabular, text
- Main use-cases
- Privacy
- Augmentation & de-biasing
- Privacy (further detail)
- Augmentation (further detail)
- Data creation methods
- Geometric methods e.g. SMOTE
- Discriminatory methods e.g. GANs
- DDRC’s work with ImageGPT
- Validation methods
- Discuss whether to validate models or the generated data
- KL divergence
- Maximum mean discrepancy
- FID & variants
- Andrew Kennedy’s work on FID
- Performance on given task
- Other cases studies
- Alternatives to synthetic data
- Dummy data
- Differential privacy
- References
The next session takes place on 27th February 2024, 12:00 – 13:30