Audio generation from a single sample using deep convolutional generative adversarial networks
Date
2021-12Author
Pfantz, Levi
Publisher
University of Wisconsin - Whitewater
Advisor(s)
Gunawardena, Athula
Mukherjee, Lopamudra
Zhou, Jiazhen
Metadata
Show full item recordAbstract
Training neural networks require sizeable datasets for meaningful output. It is difficult to acquire large datasets for many types of data. This is especially challenging for individuals and small organizations. We have taken SinGAN [1], a model that works to address those issues in the image domain, and extended it to work in the audio domain. Our new model, called AudioSinGAN, uses deep convolutional generative adversarial networks (DCGAN) trained on a single audio sample to generate new, unique, audio samples. Like SinGAN, AudioSinGAN uses a pyramid of unique GANs, each responsible for learning and generating different levels of detail. Our system is capable of generating unique audio with clear features from the single input audio clip. We explore and discuss the realities of converting and tuning a generative adversarial network (GAN) built for images into one built for audio and our results. We also present a database of audio clips generated by AudioSinGAN and use Singular Value Decomposition to analyze the dataset and confirm that our model successfully generates audio belonging to unique classes. We also learn that a challenge facing our system is audio that contains multiple audio sources overlapping each other. Finally, we discuss methods to address this issue including splitting audio into frequency band before processing.
Subject
Machine learning
Neural networks (Computer science)
Audio frequency
Permanent Link
http://digital.library.wisc.edu/1793/82595Type
Thesis
Description
This file was last viewed in Adobe Acrobat Pro.