It is currently Fri Sep 20, 2019 5:23 am

 All times are UTC - 7 hours

 Print view Previous topic | Next topic
Author Message Posted: Sat Jul 14, 2018 9:35 am Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21594
Location: NE Indiana, USA (NTSC)
Autocorrelation, cepstrum, and Mozer transform are three algorithms for processing pitched signals. I realized in the shower this morning how closely related these algorithms are. The only difference is what function is applied to each element of a spectrum in the frequency domain to change the balance of weak and strong signals.

Trigger warning: This follow-up to a post from six years ago will be very math-heavy, as I'm taking notes for a possible future project.

Given two functions f[] and g[] on the integers, their convolution (f * g)[] is defined as

(f * g)[t] = Σ[m=-∞ to ∞](f[m]g[t - m])

The related cross-correlation is

(f *_ g)[t] = Σ[m=-∞ to ∞](f*[m]g[t + m])

where f* is the complex conjugate (real part same, imaginary part negated) of f. This is equivalent to convolution of the time reversal of first signal's complex conjugate (or the signal itself if it is real-valued) with a second signal. If one signal is real and symmetric, correlation and convolution are essentially equivalent.

The support (width of domain where the function is nonzero) of a discrete convolution or cross-correlation equals the sum of the support of the inputs minus one. For example, convolving functions of length 3 and 6 produces a function of length 8.

Convolution with a finite impulse response kernel is often used to filter discrete-time signals, such as audio. Cross-correlation of a signal with itself, or autocorrelation, is used to determine the pitch of a signal. A signal's autocorrelation is time symmetric and has a peak once per period of the signal with an overall simpler shape than the signal itself.

Convolution of two long signals takes O(n^2) operations, where n is the width of their support. But because convolution in the time domain correspods to multiplication in the frequency domain (and vice versa), this can be reduced to O(n log n) through an algorithm using the Fourier transform:

1. Pad both signals to at least the sum of their support (to avoid circular convolution effects)
2. Take the Fourier transform of both signals, producing two spectra F{f[]} and F{g[]}
3. If performing cross-correlation, take the complex conjugate of F{g[]}
4. Multiply the spectra together
5. Take the inverse Fourier transform of the resulting spectrum

Thus autocorrelation becomes this:

1. Pad the signal to at least twice its support
2. Take the FT of the signal
3. Multiply each element of the spectrum by its complex conjugate, or equivalently, take the square of the magnitude
4. Take the inverse FT

Autocorrelation{f[]} = F-1{|F{f[]}|^2}

Time symmetry thus results from loss of phase information, and only the positive half of the autocorrelation is considered.

A power cepstrum is similar to an autocorrelation, except a logarithm is inserted to boost the weak frequencies and separate the formants (timbre) of a periodic signal from its pitch.

PowerCepstrum{f[]} = F-1{log(|F{f[]}|^2)}

1. Pad the signal to at least twice its support
2. Take the FT of the signal
3. Take the logarithm of the square of the magnitude of each element of the spectrum
4. Take the inverse FT

The power cepstrum discards phase and is suitable for analysis rather than synthesis, such as pitch detection. There also exists a complex cepstrum that preserves phase and is thus useful for synthesis, but the documents I initially found about calculating the unwrapped phase were paywalled at \$33.00 per article from IEEE.

Another approach is to replace squared magnitude with magnitude. This results in the Mozer transform, which discards phase information to find a time-symmetric representation for a given timbre. This was used to compress voice in several products using a codec developed by Electronic Speech Systems, such as Impossible Mission for C64 ("Another visitor") and Mito Koumon for Famicom.

1. Pad the signal to at least twice its support
2. Take the FT of the signal
3. Take the magnitude and discard the phase
4. Take the inverse FT

Mozer{f[]} = F-1{|F{f[]}|}

Because the signal is symmetric, half of it can be discarded and computed from the time reversal of the other half. It also concentrates the signal's power close to zero in the time domain, allowing lossy compression of the timbre by windowing out small components far from zero time. And theoretically, the only difference between this and the power cepstrum is that the logarithm (which boosts weak signals) is replaced with the square root (which also boosts weak signals).

So a speech codec designed for real-time playback on a 1.8 MHz microprocessor without a hardware multiply might break the signal into segments on the order of 25 ms and then do the following to each segment:

1. Across the previous, current, and next segment, apply a suitable window function to taper out edge discontinuities.
2. Apply one of these dephasing transforms to the windowed signal.
3. Pick a peak of the signal with a lag (aka quefrency) greater than 2 ms to find the period. One suggestion is to find the amplitude of the strongest such peak, then find the lowest-lag peak with at least 80% of that amplitude. If needed, use the peaks found when analyzing the previous and next segment to help avoid transient octave errors.
4. Apply the Mozer transform to the signal.
5. Crop out the signal more than half a period from zero, and see how much of the signal between a quarter and half period you can crop out.
6. Store the cropped time domain formants and pitch data.

_________________
Pin Eight | Twitter | GitHub | Patreon

Top  Posted: Sat Jul 14, 2018 10:31 am  Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7582
Well, they're related in that they all do some work in the frequency domain, but I think that's true of most sound/music/DSP operations? It's just the natural place to do signal analysis most of the time.

The Fourier Transform step is just the practical way to get to that domain. Like you can do the autocorrelation convolution in the signal domain if you want, but the existence of the Fast Fourier Transform makes it way more practical to go to the frequency domain to apply the convolution there. (Convolutions where one of the kernels is very short, on the other hand, are often faster if done directly in the signal domain.)

Top  Posted: Sat Jul 14, 2018 11:14 am Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21594
Location: NE Indiana, USA (NTSC)
My point is that all three transforms can be thought of as multiband compressors that discard phase, just with different compression curves. So algorithms using one, particularly pitch detection, can be ported to another.

_________________
Pin Eight | Twitter | GitHub | Patreon

Top  Posted: Sat Jul 14, 2018 2:07 pm Joined: Sun Apr 13, 2008 11:12 am
Posts: 8564
Location: Seattle
The cepstrum does not discard phase, and if you do you get garbage out.

You have to preserve phase, you have to take the complex log of the input function, and you have to "linearize" the phase too so that there are no discontinuities.

If you don't linearize phase, the central assumption (i.e. ifft(log(fft(x)·fft(y)))=ifft(log(fft(x)))+ifft(log(fft(y))) is no longer true.

Top  Posted: Sat Jul 14, 2018 3:36 pm Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21594
Location: NE Indiana, USA (NTSC)
lidnariq wrote:
The cepstrum does not discard phase

The complex cepstrum does not. The power cepstrum does. I will edit the above to clarify the difference.

_________________
Pin Eight | Twitter | GitHub | Patreon

Top  Posted: Sat Jul 14, 2018 4:59 pm Joined: Sun Apr 13, 2008 11:12 am
Posts: 8564
Location: Seattle
I hadn't even heard of the Power cepstrum before. Sorry about the tangent.

(From my experience with the complex cepstrum, I'm not altogether convinced the power cepstrum is actually useful for anything... it's hard enough to get something useful out of the reversible complex cepstrum)

Top Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending

 All times are UTC - 7 hours

#### Who is online

Users browsing this forum: No registered users and 5 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ NES / Famicom    NESdev    NESemdev    NES Graphics    NES Music    Homebrew Projects       2019 NESdev Competition       2018 NESdev Competition       2017 NESdev Competition       2016 NESdev Competition       2014 NESdev Competition       2011 NESdev Competition    Newbie Help Center    NES Hardware and Flash Equipment       Reproduction    NESdev International       FCdev       NESdev China       NESdev Middle East Other    General Stuff    Membler Industries    Other Retro Dev       SNESdev       GBDev    Test Forum Site Issues    phpBB Issues    Web Issues    nesdevWiki