Reconstructing the original signal

So far we've looked at a WFIR (Warped Finite Impulse Response) filter, and we get a residual r[n] from a signal x[n] as follows:

r_n = wfir(x_n, fs, order)

We could use a WIIR (Warped Infinite Impulse Response) filter to reconstruct the original signal ([1]):

The WFIR filter is applied in encoding to produce a whitened residual signal. After quantization this is transmitted to the decoder where a WIIR is used to reconstruct a degraded version of the original signal.

Other DSP/filter terminology ([2]) is analysis for the deconstruction phase (i.e. our wfir() function) and synthesis for the reconstruction phase (i.e. our future wiir() function).

In [1]:
import numpy, scipy, scipy.signal


def bark_warp_coef(fs):
        return 1.0674 * numpy.sqrt((2.0 / numpy.pi) * numpy.arctan(0.06583 * fs / 1000.0)) - 0.1916
    
    
def warped_remez_coefs(fs, order):
    l = 20
    r = min(20000, fs/2 - 1)
    t = 1
      
    c = scipy.signal.remez(order+1, [0, l-t, l, r, r+t, 0.5*fs], [0, 1, 0], fs=fs)
    return c.tolist()

    
def wfir(x: numpy.ndarray, fs: float, order: int) -> numpy.ndarray:
    a = bark_warp_coef(fs)

    B = [-a.conjugate(), 1]
    A = [1, -a]
    ys = [0] * order

    ys[0] = scipy.signal.lfilter(B, A, x)
    for i in range(1, len(ys)):
        ys[i] = scipy.signal.lfilter(B, A, ys[i - 1])
        
    c = warped_remez_coefs(fs, order)

    x_hat = c[0] * x
    for i in range(order):
        x_hat += c[i+1] * ys[i]

    r = x - x_hat
    return r

The WIIR is apparently impossible to build directly ([3]). The WIIR transfer function given in [3] and [4] is:

$$H_{WIIR}(z) = \frac{\sum_{i=0}^M\beta_i\left[D(z)\right]^i}{1 + \sum_{i=1}^{R}\alpha_i\left[D(z)\right]^i}$$

The numerator here is the output of the WFIR we're familiar with, where βi are the filter coefficients (c), M is the filter order, and D(z) are all-pass elements like we've seen before. The new introduction here is the denominator, with it's own order R and αi cofficients.

For now I will ignore WIIR, and try to find out for myself what a reconstructed x[n] from r[n] could look like. First, X(z) in terms of R(z) and HWFIR:

$$r[n] = x[n] - \hat x[n]$$$$R(z) = X(z) - \hat X(z)$$$$R(z) = X(z)(1 - H_{WFIR}(z)) \implies X(z) = \frac{R(z)}{1 - H_{WFIR}(z)} = \frac{R(z)}{1 - \sum_{i=0}^M\beta_i\left[D(z)\right]^i}$$

Therefore:

$$H_{RECONSTRUCTION} = \frac{1}{1 - \sum_{i=0}^M\beta_i\left[D(z)\right]^i}$$

With M = 1 (first order WFIR):

$$H_{RECONSTRUCTION} = \frac{1}{1 - \beta_0D(z)^0 - \beta_1D(z)^1}$$

Recall:

$$D(z) = \frac{-\lambda + z^{-1}}{1 - \lambda z^{-1}}$$

Therefore:

$$H_{RECONSTRUCTION}(z) = \frac{1}{1 - \beta_0 - \beta_1\frac{-\lambda + z^{-1}}{1 - \lambda z^{-1}}} = \frac{1 - \lambda z^{-1}}{(1-\beta_0+\lambda\beta_1) + (-\lambda + \lambda\beta_0 - \beta_1)z^{-1}}$$

In B/A numerator/denominator terms:

$$\frac{B}{A} = \frac{[1, -\lambda]}{[1-\beta_0+\lambda\beta_1, -\lambda + \beta_0\lambda - \beta_1]}$$

Let's try it out.

In [2]:
def wfir_reconstruction_firstorder(remainder: numpy.ndarray, fs: float) -> numpy.ndarray:     
    order = 1
    c = warped_remez_coefs(fs, order)

    a = bark_warp_coef(fs)

    B = [1, -a]
    A = [1-c[0]+a*c[1], -a + c[0]*a - c[1]]
    
    x = scipy.signal.lfilter(B, A, remainder, axis=0)
    return x
In [3]:
from IPython.display import Audio
import scipy.io, scipy.io.wavfile


female_speech_clip = './speech_f.wav'
fs, x = scipy.io.wavfile.read(female_speech_clip)              
Audio(x, rate=fs)
Out[3]:
In [4]:
remainder = wfir(x, fs, 1)
x_reconstructed = wfir_reconstruction_firstorder(remainder, fs)
Audio(x_reconstructed, rate=fs)
Out[4]:
In [10]:
import matplotlib.pyplot as plt

l = 5000
r = 25000
samples = numpy.arange(r)

fig1, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 14))

ax1.plot(samples[l:r], x[l:r], 'b', alpha=0.8, label='x[n]')
ax1.set_title(r'1st order WFIR on female speech - residual signal')
ax1.set_xlabel('n (samples)')
ax1.set_ylabel('amplitude')
ax1.plot(samples[l:r], remainder[l:r], 'r', linestyle='--', alpha=0.8, label='r[n]')

ax1.grid()
ax1.legend(loc='upper right')

ax2.plot(samples[l:r], x[l:r], 'b', alpha=0.8, label='x[n]')
ax2.set_title(r'1st order WFIR on female speech - reconstructed signal')
ax2.set_xlabel('n (samples)')
ax2.set_ylabel('amplitude')
ax2.plot(samples[l:r], x_reconstructed[l:r], 'r', linestyle=':', alpha=0.8, label='x[n] reconstructed')

ax2.grid()
ax2.legend(loc='upper right')

plt.axis('tight')
fig1.tight_layout()
plt.show()

The naive reconstruction function seems to be working well. Ideally, I could generalize it for higher orders. Let's refer to HRECONSTRUCTION(z) as H'(z), and derive an expression for an H'(z)M:

$$H'(z) = \frac{1}{1 - \sum_{i=0}^M\beta_i\left[D(z)\right]^i} = \frac{1}{1 - \sum_{i=0}^M\beta_i\left[\frac{-\lambda + z^{-1}}{1 - \lambda z^{-1}}\right]^i} = \frac{(1-\lambda z^{-1})^M}{(1-\lambda z^{-1})^M - \sum_{i=0}^{M}\beta_i(-\lambda + z^{-1})^i(1-\lambda z^{-1})^{M-i}}$$

Let's play around with numpy's poly1d ([5]).

In [6]:
def wfir_reconstruct_numerator(a, M):
    n = (numpy.poly1d([1, -a]))**M
    return n.c
    
print(wfir_reconstruct_numerator(0.5, 5))

def wfir_reconstruct_denominator(a, M, c):
    if len(c) != M+1:
        raise ValueError('len filter coefficient array must be order+1')
    expr1 = (numpy.poly1d([1, -a]))**M
    i = 0
    expr2 = c[i]*(numpy.poly1d([-a, 1])**0)*(numpy.poly1d([1, -a]))**(M-0)
    for i in range(1, len(c)):
        expr2 += c[i]*(numpy.poly1d([-a, 1])**i)*(numpy.poly1d([1, -a]))**(M-i)
    den = expr1 - expr2
    return den.c

print(wfir_reconstruct_denominator(0.5, 5, [1.0]*6))
[ 1.      -2.5      2.5     -1.25     0.3125  -0.03125]
[ 0.34375 -1.28125  1.84375 -1.90625  1.53125 -0.6875 ]
In [7]:
def wfir_reconstruct(r: numpy.ndarray, fs: float, order: int) -> numpy.ndarray:
    a = bark_warp_coef(fs)
    c = warped_remez_coefs(fs, order)
    
    B = wfir_reconstruct_numerator(a, order)
    A = wfir_reconstruct_denominator(a, order, c)
    
    x = scipy.signal.lfilter(B, A, r)
    return x
In [8]:
remainder = wfir(x, fs, 1)
x_reconstructed = wfir_reconstruct(remainder, fs, 1)
Audio(x_reconstructed, rate=fs)
Out[8]:
In [9]:
remainder = wfir(x, fs, 12)
x_reconstructed = wfir_reconstruct(remainder, fs, 12)
Audio(x_reconstructed, rate=fs)
Out[9]:
In [11]:
l = 5000
r = 25000
samples = numpy.arange(r)

fig1, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 14))

ax1.plot(samples[l:r], x[l:r], 'b', alpha=0.8, label='x[n]')
ax1.set_title(r'12th order WFIR on female speech - residual signal')
ax1.set_xlabel('n (samples)')
ax1.set_ylabel('amplitude')
ax1.plot(samples[l:r], remainder[l:r], 'r', linestyle='--', alpha=0.8, label='r[n]')

ax1.grid()
ax1.legend(loc='upper right')

ax2.plot(samples[l:r], x[l:r], 'b', alpha=0.8, label='x[n]')
ax2.set_title(r'12th order WFIR on female speech - reconstructed signal')
ax2.set_xlabel('n (samples)')
ax2.set_ylabel('amplitude')
ax2.plot(samples[l:r], x_reconstructed[l:r], 'r', linestyle=':', alpha=0.8, label='x[n] reconstructed')

ax2.grid()
ax2.legend(loc='upper right')

plt.axis('tight')
fig1.tight_layout()
plt.show()

Did we just create a WIIR?

Recall that an IIR is an infinite impulse response filter. Since everything we're doing takes warping into account, I have a hunch that wfir_reconstruct() is indeed wiir(). Let's verify the impulse response.

In [23]:
fig3, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(10, 20))
plt.xlabel('n (samples)')

for i, (ax, o, l) in enumerate([(ax1, 1, 200), (ax2, 5, 500), (ax3, 12, 1000), (ax4, 40, 5000)]):
    x_axis = numpy.arange(l)
    impulse = numpy.zeros(l); impulse[0] = 1.0
    ax.set_title('wfir_reconstruct impulse response, order = {0}'.format(o))

    impulse_response1 = wfir(impulse, 48000, o)
    impulse_response2 = wfir_reconstruct(impulse, 48000, o)
    
    ax.plot(x_axis, impulse_response1, color='b', alpha=0.9, label='wfir')
    ax.plot(x_axis, impulse_response2, color='r', alpha=0.9, label='wfir_reconstruct')
    ax.legend(loc='upper right')
    ax.grid()

fig3.tight_layout()
plt.show()

The answer seems to be no - but wfir_reconstruct() works, and I'll move on to the next and final step: incorporating wfir() and wfir_reconstruct() in an audio codec.

References

  1. M. Karjalainen, A. Harma, U. K. Laine and J. Huopaniemi, "Warped filters and their audio applications," Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1997, pp. 4 pp.-.doi: 10.1109/ASPAA.1997.625615

  2. Gerez S., "Lecture 3 : Filter Banks," VLSI Signal Processing Lecture, https://wwwhome.ewi.utwente.nl/~gerezsh/sendfile/vlsi_sp_k3a.pdf

  3. M. Karjalainen, A. Harma and U. K. Laine, "Realizable warped IIR filters and their properties," 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, 1997, pp. 2205-2208 vol.3. doi: 10.1109/ICASSP.1997.599488

  4. Härmä, Aki & Karjalainen, Matti & Avioja, Lauri & Välimäki, Vesa & Laine, Unto & Huopaniemi, Jyri. (2000). Frequency-Warped Signal Processing for Audio Applications. Journal of the Audio Engineering Society. 48. 1011-1031.

  5. NumPy documentation, "numpy.poly1d," https://docs.scipy.org/doc/numpy/reference/generated/numpy.poly1d.html