#### By comparing sound files we can generate a mathematical score to determine call quality.

## The problem we were facing:

As a service provider we needed to understand **service availability** across our suite of Data Centres. We also needed to understand the quality of the broadband connection that our customers had.

Scheduling a series of automatic calls was the easy bit. Where all the calls of sufficient quality, or where there customer local connection issues that were degrading the calls?

## Our solution:

Record a call at the edge and compare that to the original using spectrograms. To overlay the original call with the recorded call (uploaded from the edge to the server) the spectrograms had to be translated using fast fourier transform.

This process divides the sound frequencies into components of distinct frequencies.

Once the sound waves are broken into their distinct frequencies. The algorithm creates a distinct fourier transform (DFT) of the component frequencies. One this has been done a mathematical comparison can be made of the original sound file and the recorded sound file.

This produces a similarity scale of the recorded call and its original. The user can then decide the level of similarity that they require mathematically.

We developed a similarity scale where 70/100 was of no audible difference to the human ear. 60/100-70/100 was very marginal distinctions.

Anything below this we though noticeable to the human ear and therefore a call of unacceptable quality. We were able to create alarms and call logs of these calls. The recorded calls were available for download so that the validity of the scoring system could be manually checked if required.

**Essentially we are creating a digital fingerprint of the original call and the recorded call.**

## To generate fingerprints, the following process is performed:

- Both are files are resampled to a fixed rate of 12kHz. This helps to compare like for like. Resampling is done using simple linear interpolation of the amplitudes in the WAV file.
- We build a spectrogram of the WAV file. To do this, we:
- First, apply an overlap factor so that each frame (where a frame is 1/6 of a second of data) has no large discontinuities
- We then apply a window function (Hamming) which, along with the overlapping above, smooths each frame to prevent large distortions in our transformation below.
- Next, we apply a 1 dimensional discrete fourier transform on each frame.
- Lastly, we normalise all signals in all frames to be between 0 and 1.

- We use this normalised spectrogram and divide each frame into a number of filter banks. We calculate the robust points (those with the highest spectrogram value) in each filter bank.

Each robust point is scaled by the maximum value of a signed 32-bit integer (2,147,483,647) appended to the fingerprint as an intensity value (along with their coordinates), resulting in a **complete fingerprint of the WAV file**.

## To compare two fingerprints, the following process is performed:

- We build the pair position list table for each fingerprint. This is built by collecting all the positions for each pair. A pair is defined as a hash (a simple, one way function) of two different (x,y) positions, subject to certain constraints, within the fingerprint.
- We generate an offset score by searching each fingerprint pair position list table for the others pair. When we find a matching pair, we count the number of occurrences of each of the pairs’ positions’ differences (their offsets) and keep a tally of the number of times each offset occurs.
- We then determine which offset occurred the most number of times. We use this number (the number of times the most common offset occurred) as our similarity score. We also add to our score half the count of the offset values immediately before and after the most common offset value.

We divide this value by the number of frames in our fingerprints (or, if each fingerprint has a different number of frames, the smaller of the two fingerprints’ frame counts), and multiply our value by 100 to scale it to be between 0 and 100.

**This is a rough approximation of our algorithm for comparing the similarity of two WAV files.**