VLSI Implementation of High Speed FFT Processor

Mrs. Purnima Savadatti; Prof. V.  P.  Gejji; Prof. Mahesh Yanahgimath

VLSI Implementation of High Speed FFT Processor

Design and Performance Analysis of a High Speed FFT Processor for Wireless Technology

by Mrs. Purnima Savadatti*, Prof. V. P. Gejji, Prof. Mahesh Yanahgimath,

- Published in Journal of Advances in Science and Technology, E-ISSN: 2230-9659

Volume 12, Issue No. 25, Dec 2016, Pages 759 - 764 (6)

Published by: Ignited Minds Journals

ABSTRACT

In today’s wireless technology, use of FFT processor for processing the data adds more advantages like high speed and less power consumption and reduction in circuit complexity etc. Hence to meet the real time requirements it is very much necessary to design high performance FFT architecture. Here attempt is made to design, an alternative architecture for 16 point radix-2 based decimation in frequency (DIF), fast fourier transform (FFT) processor by combining two 8 point FFT’s successfully. The new 16 point DIF-FFT architecture design is simulated by using ModelSim and synthesized by Xilinx ISE project navigator. The synthesis reports of the proposed architectures are compared with the existing architectures. Finally comparison reports shows that the proposed 16 point DIF-FFT architecture is more efficient in speed, power and memory utilization. Hence the proposed architecture can be used for any application that requires low power and high speed operation.

KEYWORD

VLSI implementation, high speed, FFT processor, wireless technology, data processing, power consumption, circuit complexity, real time requirements, performance, architecture, simulation, synthesis, comparison, efficiency, speed, power, memory utilization, application, low power operation, high speed operation

I. INTRODUCTION

Fourier transform is among the most widely used tool for transforming data sequences in fourier analysis. It transforms time domain function into the frequency domain representation. The DFT requires discrete inputs which are created by sampling a continuous time function. The discrete input function is also having a limited (finite) duration. Therefore it is repeatedly said that the discrete fourier transform is a required for analysis of finite duration discrete-time functions. DFT is having a finite sequence of real or complex inputs which makes it suitable for processing information stored in computers. In signal processing, to analyze the frequency content in a sampled signal and to perform convolution operation DFT is widely used. A key enabling factor is that by using Fast Fourier Transform (FFT) algorithm DFT can be computed efficiently in practice for these above applications. Fast Fourier Transform (FFT) has become ubiquitous in many engineering applications. High speed FFT architectures are very much necessary to implement several communication systems. In today‘s wireless technology, by using FFT processor for the wireless communication adds more advantages like high speed and less power consumption etc. Hence to meet the real time requirements it is very much necessary to design high performance FFT architecture. The main motivation of this work is to design an alternative architecture for 16 point Decimation in Frequency (DIF), Fast Fourier Transform (FFT) processor. The main reason is that, if FFT algorithm satisfies the timing constraints, we can expect it to be more power efficient. Hence here attempt is made to design DIF-FFT for 16 point inputs using Verilog HDL as a design entity and its synthesis is done by Xilinx synthesis tool. The synthesis result shows that 16 point DIF-FFT processor is more efficient in Speed and memory utilization.

II. FAST FOURIER TRANSFORM ALGORITHM

Fast fourier transform (FFT) algorithm is one of most useful algorithm to compute the discrete Fourier transform (DFT) and inverse discrete fourier transform efficiently.

7

Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) is of order N2 as there are N data points to calculate, each of which requires N complex arithmetic operations. To compute discrete fourier transform of given input signal fast fourier transforms are used. In the same way it reduces total number of computations needed for DFT calculation. Basically there are two different Radix-2 algorithms, those are ―Decimation in Time‖ (DIT) and ―Decimation in Frequency‖ (DIF) algorithms. These two algorithms are mainly based on the recursive decomposition of an N point transforms. Here N point discrete fourier transform decomposed into two (N/2) point transforms. This process can be applied to any composite number (N). If N is divisible by 2 and if N is a regular power of two, the process of decomposition becomes easy but this process will repeatedly applied until the one point transform is reached at the end. Here we are using ―Radix-2 decimation-in-frequency FFT algorithm‖ to compute 16 point discrete fourier transform algorithms. Basically this algorithm also based on divide-conquer approach. Complete flow diagram of 8 point radix 2 decimation frequency fast fourier transform algorithms is clearly shown in Figure 2.1. Basically number computation required for fast fourier transform reduces as it requires very less number of complex multiplication and complex additions. In case of decimation in frequency fast fourier transform algorithms input sequence is in natural order but output sequence is in bit reversal form as shown in figure below.

Figure 2.1: Decimation in Frequency fast fourier transform algorithms for 8 point

As shown in the butterfly structure in every stage output requires shuffling of data inputs. And the in structure is as shown in the Figure 2.2.

Figure 2.2: Butterfly Scheme. A. Twiddle Factor

The twiddle factor WNk is most important evaluation part of the FFT, referring to the fig 2.2. We can check it that for every downward direction i.e. after subtraction of the two data‘s the resultant data signal should be multiplied with a term called ‗Twiddle Factor‘, it clearly shows that any wrong calculation of this multiplication with Twiddle Factor results in entirely different results, so it is necessary to pay attention at Twiddle Factor while calculating the FFT. The ‗Twiddle Factor‘ is expressed as follows: WNk = e -2πk/N_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ (i) Where: N – size of the FFT (i.e N-point FFT) and K – 0, 1,. . . . N-1

B. Applications of FFT

a) FFTs are finding great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers. b) In broadband wireless systems based on Orthogonal Frequency Division Multiplexing (OFDM).

III. DESIGN OF 16-POINT FFT PROCESSOR

In communication systems one of the complex and intensive computation module is FFT processor. But for such processors execution time and low power consumption are the main constraints. Complex multiplication is one of the most essential arithmetic operation in FFT, which is most expensive. The performance of such processors is determined by two factors. Those are speed and power consumption. In FFT processors more than 70% of power is consumed by complex multipliers. Hence it is very

Mrs. Purnima Savadatti1*, Prof. V. P. Gejji2, Prof. Mahesh Yanagimath3 7

low power application in mind. One new method is proposed in this paper i.e to reduce the complexity to replace the complex multiplication with less expensive real multiplications. We applied this method to an efficient 16-point FFT based on 8-point module.

A. FFT Algorithm

As we know Fast fourier transform is derived from the main function Discrete Fourier Transform. Using standard direct method N point DFT can be evaluated one by one for every point but in FFT the overall evaluation is done at a time which definitely reduces the calculation time. Radix-2 based 16 point FFT architecture contains 4 stages. Here there are 16 input values such as x[0] to x[15]. And output values are X(0) to X(15). Butterfly structure will give both addition and subtraction operation in FFT. The upward arrow in butterfly structure indicates addition operation. Similarly the downward arrow will indicate subtraction operation. The subtracted value is again multiplied with twiddle factor value WKN before processing into next stage. And this computation accomplished concurrently. Usually four real multiplications and two addition or subtraction operations are required for the complex multiplication with the twiddle factor.

B. Architectural design

The block diagram of the 16-point FFT processor is shown in figure figure. 3.1.

Figure 3.1: block diagram of the 16-point FFT processor

It consists of an input unit, two 8-point FFT units, a multiplier unit, an output unit, and a 4-bit binary counter that acts as the master controller for the entire architecture. However there are two bottlenecks in such a scheme. First, there is a large number of global wires resulting from multiplexing of complex data to the 8-point FFTs. the required speed.

Input Unit

Here the input unit is lookup table. After assertion of clock signal, the input data are serially inputted. When the input register bank is completely full, the appropriate data is treated as the input to the 8-point FFT. A 4-bit internal counter controls the serial inputting of the data. This counter is enabled by the assertion of the clock signal.

8-Point FFT Units:

To construct the 8-point FFT units, we have choosen the radix-2 DIF FFT algorithm.

Multiplier Unit:

As stated earlier 16 interdimentional constants are to be multiplied to the intermediate results coming out from the first 8-point FFT unit. Use of one complex multiplier, operating once in a clock cycle, needs seven clock cycles to complete the interdimentional constant multiplication to a full set of data coming out from the first 8-point FFT unit. Thus, the use of a single complex multiplier effectively results in a degradation of the speed. To achieve the full speed advantage it is necessary to use seven complex multipliers.

Output Unit:

For output unit we follow the same strategy as with the input unit. In essence, the output unit is an complementary structure to the input unit. To control the output from the output unit, a 4-bit internal counter is used. This counter is enabled with the assertion of the clock signal generated by the master control counter and counts up to 15 until the last output data is available at the serial output. This signal essentially means that the output data will be available from the next clock cycle.

7

Figure 3.2: Connecting 2, 8- point FFT’s to get 16- point output. The Control Mechanism:

The controller for the overall architecture is a simple 4-bit binary counter. The counter starts counting from 0 with the assertion of the clock signal from the input unit. At the 16th clock cycle the first output data is available. Here the method of divide and conquer is used. Firstly 8- point FFT is designed and then the first stage is connected to it by multiplying the bottom half with the twiddle factor. Tha idea is clearly shown in figure 3.2.

IV.SIMULATION RESULTS

(a) Design Summary of 8-point FFT

The following figure shows the Design Summary of 8-point FFT, where we can note that the number of slice registers usage is 11%, which is very less and can be mapped on any board.

(b) Design summary of 16-point FFT

The 16-point FFT also follows the same procedure as in case of 8-point FFT. First clock is applied and reset signal is made low to load the input data, after which the reset signal is made high and approx the output is available from 57th cycle to 73rd cycle. The figure 4.1 shows the 16-point FFT input.

Where x(n)={1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0}.

Figure 4.1 Input of 16 point DIF FFT

Fig 4.2 shows the output i.e

X(K)={8,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0}

Figure 4.2. Ouput of 16 point DIF FFT A) Device utilization summary report Selected Device : 7v285tlffg1157-1l

Selected Logic Utilization Used Available Utilization

No of Slice Registers 3533 357600 1% No of slice LUT‘s 2123 178800 1% No used as Logic 2039 178800 1% No used as Memory 84 55900 1% No used as SRL 84 Slice logic distribution No of LUT F/F pairs 3864 No with an unused F/F 331 3864 8%

Mrs. Purnima Savadatti1*, Prof. V. P. Gejji2, Prof. Mahesh Yanagimath3 7

No of fully used LUT-FF pairs 1792 3864 46% No of unique control sets 49 I/O Utilization No of IO‘s 870 No of bondedIOB‘s 490 600 81% Specific Feature Utilization: No of BUFG/BUFGCTRLs 2 32 6% No of DSP48E1s 15 700 2%

Table 4.1 Design Utilization summary

Design summary shows that number of slice register usage is 1% which is very less and can be mapped to any board. The above synthesis results shows that the computation for calculating the 16 point FFT is more efficient in terms of power consumption, memory utilization and time delay using the proposed method. Our proposed design for VLSI based high speed FFT processor optimizes the circuit complexity level as compared to the existing architectures. The new design deemed to result in a considerable reduction in cost, memory usage and power consumption for the existing wireless system.

B) Performance comparison of proposed and existing 16-point FFT architectures

The proposed 16 point FFT architecture design is simulated by using ModelSim and synthesised by Xilinx ISE project navigator. We simulate and synthesis overall FFT architecture compared to existing methodology.

Paper title Author Power (mW) Mem in kb Min Del

CORDIC Based 16-Point FFT Processor Allu Thanuja & Sarada V Dept. of ECE SRM university, Chennai

109 --- ---

Efficient vlsi architecture using dit-fft radix-2 and Split radix fft algorithm Rashmi M J, G S Biradar, Meenakshi P Dept of PG Studies, VTU RC, Gulbarga.

------ 195800

12.5 ms VLSI Architecture based 16-pt Adaptive Split Radix-2 FFT Architecture T.yasodha Dept. of E&C Engg, Christian College of Engg.& Techn. Odanchatram. Area Efficient 16 pt Complex fft algorithm for Efficient FPGA Implementation Using NEDA with Modified CSLA Sangeetha Vijayan, Dept of ECE, Saintgits, College of Engineering, Kottayam, Kerala, India 4.195 16.3ms

Proposed project ----------- 1.523 132208 10.57 ms Table 4.2: Performance comparison

Fig 4.3 Bar chart showing Time Delay

Fig 4.4 Bar chart showing Memory

7

Fig 4.5 Bar chart showing Power

V. CONCLUSION

A new low-power high-performance 16-point FFT processor for wireless applications has been successfully designed based on the architecture described. This architecture is based on a decomposition of the 16-point FFT in to two 8-point FFT‘s. Our proposed design for VLSI based high speed FFT processor optimizes the circuit complexity level as compared to the existing architectures. The computation for calculating the 16 point FFT is more efficient in terms of memory utilization and time delay using the proposed method and also results in a considerable reduction in cost, size, and power dissipation for the existing wireless systems.

REFERENCES

―Draft supplement of standard for Information Technology- Telecommunication and Information Exchange Between Systems- Local and Metropolitan Area Networks – Special Requirements – Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High Speed Physical Layer in the 5 GH: Band, IEEE P802.11a/D7.0‖ S. Monicadevi, Dr. T. Yasodha, ―An Efficient High Speed VLSI Architecture Based 16-pt Adaptive Split Radix-2 FFT‖ Int.Journal of Science Tech. & Engineering [Vol 2,issue 10,Apr- 2016.

K. Maharatna. E. Grass and U. Jagdhold. ― A novel 64-point FFT/IFFT Processor for IEEE 802.11a standard.‖ In Proc, ICASSP’03. Vol. II. Pp.II-321_II-324.

G.Purna Chandra Rao, B.Ashok, B.Saritha, ―Design and Implantation of High Speed FFT Processor for OFDMA System Using FPGA‖ International Journal of Advanced research in Computer and Communication Engineering, Vol. 2, Issue 7, July 2013. Implementation Using NEDA with Modified CSLA‖ International Journal of Advanced Research in Electrical, E & I Engineering Vol. 3,issue 10, October 2014. K. Maharatna, E. Grass and U. Jagdhold, ―A lower-power 64-point FFT/IFFT architecture for wireless broadband communication.‖ In proc. 5th int. OFDM workshop. Hamburg. Germany. Sept .2000.p.36. ―Xilix Product Specification. High Performance 64-point complex FFT/IFFT V1.0.5 [online]. Available: http://www.xilinx.com/ipcenter‖. Allu Thanuja& Sarada V, ―CORDIC Based 16-Point FFT Processor‖ in International Journal of research & innovative Technology vol 1, issue-1 April-14. Rashmi M J, G S Biradar, Meenakshi P, ―Efficient vlsi architecture using dit-fft radix-2 and Split radix fft algorithm‖ Int. Journal for technological Research in Engg.vol 1, issue 10, June-2014. Esai Amudha Vani G and Joseph Gladwin , ―Low Power FFT Architectures via Folding Transformation‖, ELSEVIER Journal.

Corresponding Author Mrs. Purnima Savadatti*

PG Student, Department of Electronics and Communication Engineering, Gogte institute of technology, Belagavi, India

E-Mail – purnimamy2013@gmail.com