Architectural Optimization for Low power in a Reconfigurable UMTS filter

Dasalukunte, Deepak; Palsson, Andri; Kamuf, Matthias; Persson, Per; Veljanovski, Ronny; Öwall, Viktor

2006

Citation for published version (APA):
Architectural Optimization for Low Power in a Reconfigurable UMTS Filter

Deepak Dasalukunte†, Andri Pålsson†, Matthias Kamuf†, Per Persson†, Ronny Veljanovski‡ and Viktor Öwall†
† Department of Electroscience, Lund University, SE-22100 Lund, Sweden
‡ Victoria University, Melbourne, Victoria 8001, Australia

Abstract—This paper presents an improved architecture for an UMTS filter which reduces the switching power within the filter by 75% compared to the original design. The filter length is dynamically varied depending on the adjacent channel noise. This attains the least power consumption in noise free environments and required filter performance in the presence of noise. As a consequence, the battery life of the mobile device is improved without compromising with the 3GPP standard filter specifications. Furthermore, the design is simplified by reducing the number of clock domains from 3 to 2. In the improved design most blocks run at a slower clock, reducing switching power, and thus the overall power consumption.

Index Terms—UMTS, 3GPP, reconfigurable, low power

I. INTRODUCTION

W-CDMA (UMTS) is the third generation (3G) standard for wireless communication, which offers a variety of wide-band data and multimedia services. The emergence of W-CDMA standard has increased the amount of computing on the mobile devices. These systems are required to consume minimum power in order to extend the battery life of the device. Furthermore, Adjacent Channel Interference (ACI) has become a critical issue due to the increase in information transmission within a finite bandwidth. Thus, design of digital filters in mobile terminal receivers has become a challenge with tight and complex filter specifications for minimal power consumption.

The receiver block diagram is shown in Fig. 1. The digital channel filter is placed after the RF front-end and analog-to-digital converter (ADC) and performs the required filtering in order to obtain the desired frequency spectrum. The resulting frequency spectrum is descrambled, despread, and demodulated to retrieve the data that corresponds to the mobile terminal user. This data is termed as desired signal. The UTRA-TDD, one of the two duplex modes developed by the 3rd Generation Partnership Project (3GPP) [1], has a chip rate of 3.84 Mega chips per sec (Mcps) and a frame has a duration of 10ms, with each frame consisting of 15 timeslots (2560 chip/slot) as illustrated in Fig. 2.

A. The original FIR Filter

The FIR filter employs a maximum filter length of 65 and a minimum length of 5. This has been derived in [2] in order to satisfy the filter specifications of the UTRA-TDD receiver.
B. The Signal Power Measurement Unit

The Signal Power Measurement (SPM) unit consists of a full wave rectifier (FWR) and a first order infinite impulse response (IIR) filter. The block diagram of the SPM unit proposed in [2] and implemented in [3] is shown in Fig. 5. The IIR filter accumulates the absolute value of the signal from the FWR over a period of time to calculate the signal power. A truncation unit T is used just before the output is fed back into the system to maintain fixed wordlength [5]. The SPM unit calculates the signal power in every timeslot before starting afresh in the next timeslot. The filter coefficients $\delta$ and $\gamma$ are calculated according to

$$\delta = \frac{\cos \theta}{1 + \sin \theta} \quad (1)$$
$$\gamma = \frac{1 - \delta}{2} \quad , \quad (2)$$

where $\theta$ is the normalised frequency of $0.002\pi$ [2].

C. Control Unit

The control unit is the intelligence behind the reconfigurability of the system. It uses the inband, out-of-band and desired signal powers from SPM units to calculate the filter length. The desired signal is obtained by despreading the inband signal using the user’s orthogonal variable spreading code. The Adjacent Channel Performance (ACP) derived in [2] from the $E_b/N_0$ model is calculated according to the equation:

$$ACP = \frac{P_{\text{outband}}}{P_{\text{desired}}(1 + \frac{P_g}{E_b/N_0}) - P_{\text{inband}} - \eta} \quad (3)$$

where $P_{\text{outband}}$, $P_{\text{inband}}$, and $P_{\text{desired}}$ are out-of-band, in-band and desired signal powers respectively. The processing gain $P_g$ and thermal noise $\eta$ are constants, while the target $E_b/N_0$ is set. Using ACP and Adjacent Channel Leakage Ratio (ACLR) specified by the 3GPP, Adjacent Channel Selectivity (ACS) is calculated as:

$$ACS = \frac{1}{(\frac{1}{ACP}) - (\frac{1}{ACLR})} \quad . \quad (4)$$

The control unit adapts a lookup table based approach by using the new ACS value in order to determine the length of the filter. The control unit monitors the out-of-band and inband signal powers every timeslot and accordingly varies the filter length in order to keep the filter specifications such as stopband attenuation, consistent. The filter is reconfigured by switching the taps on/off depending on whether the filter length needs to be increased or decreased. The number of filter taps that need to be switched on/off is obtained from a lookup table for a particular value of ACS. Increasing the filter length is done at once, while switching off the filter taps is gradual. This is done to avoid poor filtering of the out-of-band signals when many taps are switched off and there is a surge in noise levels. This is termed as hysteresis protection [2].
III. OPTIMIZED ARCHITECTURE

The FIR filter occupies more than 50% of the entire design and is always active. By moving the decimators prior to the filter, the filter can be run at a lower clock, resulting in power savings due to reduced switching activity.

Since the decimation factor is 4, the FIR filter would be divided into four filterbanks [5], each $\frac{1}{4}$th the length of the original filter. The optimized architecture of the reconfigurable filter is shown in Fig. 6 in which the decimators now appear at the input and the filterbanks are denoted FIR1, FIR3, and FIR24. By doing so, the arithmetic operations are now reduced to a fourth. The coefficients corresponding to these banks are shown in Fig. 7. The coefficients corresponding to the 1st and 3rd filterbanks have inherent symmetry, while 2nd and the 4th filterbanks are merged together to achieve coefficient symmetry. This halves the number of multiplications since the filter can be folded.

A. Clock domain reduction

Under the assumption that integration in the SPMs had to be performed over an entire timeslot, the previous implementation [3] of the reconfigurable filter utilized 3 clocks. Two clocks, 15.36MHz and 3.84MHz are required because of different input and output data rates. Calculations within the control unit involve division operations to calculate ACP as in (3), and the new filter length. As a result, a faster clock was needed to complete the new filter length calculations before the data from the next TIMESLOT arrived. Thus, the third clock was needed for the control unit. The frequency was estimated to be 32 times 3.84MHz, i.e., 122.88MHz, as the control unit required 32 clock cycles to perform division and other operations to obtain the ACS [6].

However, through simulations it has been found that the SPM units saturate quite early, see Fig. 8, which shows the signal power for data in one timeslot. The early saturation of the SPM units is used to initiate the control unit to start with the new shaver value calculations a few clock cycles before the last data sample arrives. As the later samples do not contribute to the measurement of signal powers significantly the calculation of new filter length by neglecting the last few data samples introduces almost no errors. This results in the control unit to run on a slower clock, i.e., at 15.36MHz instead of 122.88MHz. The reduction in number of clock domains is important as it makes hardware implementation easier. The reduced clock frequency also lowers the demands on the hardware blocks being implemented. The control unit clock can further be reduced and run at 3.84MHz by neglecting more number of data samples in the received TIMESLOTS. However, the number of clock domains cannot be reduced further because of different input and output data rates of the system.

B. SPM Optimization

The SPM units are implemented in the same way as in the original design, but with a few more optimizations. The coefficients obtained from (1) and (2) are $\delta = 0.9937$ and $\gamma = 0.0031$, respectively. Scaling the coefficients by $2^8$, instead of $2^{12}$ as in the original design, results in $\gamma \approx 1$, and one multiplication can be omitted. Thus, one out of two coefficient multiplications are avoided in each of the three SPM units. However, the performance or the functionality as compared to the original SPM unit is unaffected with this optimization.
The complete system was initially modeled and simulated in MATLAB. The improved architecture was designed and implemented in a 0.35\(\mu\)m standard CMOS process. The photo of the fabricated chip is shown in Fig. 9 and it measures 2.7mm\(\times\)2.1mm. In the optimized design, the FIR filter has contributed to a significant reduction in power, due to the moving of decimators and hence running it 4 times slower. The original design was implemented in an FPGA but simulated for power estimation using NEC 0.25\(\mu\)m process [3] [6]. To compare the power consumption of both architectures 130nm CMOS standard cell libraries were used. The FIR filter occupied a large portion of the entire design and it was the one that was optimized significantly. So the comparison in power consumption was done by comparing the two FIR filters, one running at 15.36MHz and the other at 3.84MHz and the results are presented in Table I. By comparing the various parameters between the designs, Table II, it can be observed that the FIR filter now runs at a lower clock frequency. This could be further exploited to use a lower power supply. The control unit clock has also been reduced, in turn reducing the highest clock frequency from 122.88MHz to 15.36MHz. The reduction in clock domains contribute significantly during hardware design as it is simpler and easier to manage designs with fewer clock domains.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Original Design</th>
<th>Improved Design</th>
</tr>
</thead>
<tbody>
<tr>
<td>No. of clock domains</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>Control unit clock</td>
<td>122.88MHz</td>
<td>15.36MHz</td>
</tr>
<tr>
<td>Highest Clk frequency required</td>
<td>122.88MHz</td>
<td>15.36MHz</td>
</tr>
<tr>
<td>FIR filter clock</td>
<td>15.36MHz</td>
<td>3.84MHz</td>
</tr>
<tr>
<td>No. of decimators</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>No. of multipliers in SPM unit</td>
<td>2</td>
<td>1</td>
</tr>
</tbody>
</table>

V. Conclusion

As it is difficult to do a fair comparison on the power figures obtained from two different processes on which the original (0.25\(\mu\)m) and the optimized (0.35\(\mu\)m) architectures were implemented. The power consumption in the two architectures has been done by choosing a common process (130nm). The dynamic power consumption reduced by a factor of 4, by moving the decimators prior to the filter. The control unit clock was also reduced by exploiting the early saturation of the SPM units which in turn reduced the number of clock domains. Optimizations performed on the SPM units by a better representation of the coefficients reduced the multiplications from 2 to 1. The overall reduction in power is due to the reduced switching activity in the FIR filter.

REFERENCES