Researchers at Sony Computer Science Laboratories (CSL) have developed a new deep learning method to improve and restore the quality of songs and highly compressed audio recordings

Today, many sophisticated tools and technologies allow us to store large amounts of music and audio recordings on electronic devices. A group of codec technologies including encoder and decoder is used to encrypt, modify and compress media files.

So-called lossless and lossy codecs are two different categories of codecs. Lossless codecs, including the PKZIP and PNG codecs, duplicate the same file as the original file after decompression. On the other hand, lossy compression techniques result in a copy of the original file that looks and/or sounds the same as the original but uses less space on electronic devices.

Lossy audio codecs basically work by compressing digital audio streams and then decompressing them after discarding some data. Generally, it is difficult, if not impossible, for humans to tell the difference between the original file and the unzipped file.

However, lossy codecs can introduce defects and audibly alter audio signals when using high compression rates. Deep learning techniques have recently been used in an attempt to circumvent the disadvantages of lossy codecs and improve compressed files.

A new deep learning technique created by researchers at Sony Computer Science Laboratories (CSL) improves and restores the quality of overly compressed audio and music recordings. Their approach relies on generative adversarial networks (GANs), machine learning models in which two neural networks “compete” to make correct or reliable predictions.

The proposed model is composed of two distinct models, the “generator (G)” and the “critical (D)”. A spectrogram – a visual representation of the frequencies of the spectrum of an audio signal – represents an extract of an MP3 compressed musical audio signal supplied to the generator.

The generator gradually improves its ability to produce a smaller, repaired replica of the original signal. In the meantime, the critical component of the GAN architecture acquires the ability to recognize distinctions between original high-quality files and restored ones. To ensure that the music or audio data included in the restored files is as accurate as the original, the information obtained by the reviewer is ultimately used to improve the quality of the restored files.

In a series of tests, the researchers evaluated the performance of their GAN-based architecture. The main goal was to see if it could improve the MP3 input quality and provide compressed samples that are better and more similar to the original file than those produced by existing base models for audio compression. Their findings show that model restorations of MP3 songs that had been severely compressed (16 kbps and 32 kbps) often sounded better to expert human listeners than the original compressed files. On the other hand, the team found that their model produced slightly lower results while using lower compression rates (64 kbit/s mono).

According to their article, this architecture could produce and add realistic high-frequency information that would improve the audio quality of compressed songs. The material created contained percussion components, guitar sounds and the whistling of singing voices.

The team believes that their work can significantly reduce the size of MP3 audio files without affecting their quality or producing defects that are obvious to the human eye.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don't Forget To Join Our ML Subreddit

Tanushree Shenwai is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new technological advancements and applying them to real life.

Source link

Comments are closed.