A methodology for speeding up fast fourier transform focusing on memory architecture utilization

Michail, Harris; Kelefouras, Vasileios I.; Athanasiou, George S.; Alachiotis, Nikolaos; Kritikakou, Angeliki S.; Goutis, Costas E.

doi:10.1109/TSP.2011.2168525

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/20.500.14279/1640

Τίτλος:	A methodology for speeding up fast fourier transform focusing on memory architecture utilization
Συγγραφείς:	Michail, Harris Kelefouras, Vasileios I. Athanasiou, George S. Alachiotis, Nikolaos Kritikakou, Angeliki S. Goutis, Costas E.
Major Field of Science:	Engineering and Technology
Field Category:	Electrical Engineering - Electronic Engineering - Information Engineering
Λέξεις-κλειδιά:	Cache memory;Embedded computer systems;Multiplication;Compilers (Computer programs)
Ημερομηνία Έκδοσης:	Δεκ-2011
Πηγή:	IEEE Transactions on Signal Processing, 2011, vol. 59, no. 12, pp. 6217-6226
Volume:	59
Issue:	12
Start page:	6217
End page:	6226
Περιοδικό:	IEEE Transactions on Signal Processing
Περίληψη:	Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Transform in the West (FFTW) for fast Fourier transform (FFT). FFT is a highly important kernel and the performance of its software implementations depends on the memory hierarchy's utilization. FFTW minimizes register spills and data cache accesses by finding a schedule that is independent of the number of the registers and of the number of levels and size of the cache, which is a serious drawback. In this paper, a new methodology is presented, achieving improved performance by focusing on memory hierarchy utilization. The proposed methodology has three major advantages. First, the combination of production and consumption of butterflies' results, data reuse, FFT parallelism, symmetries of twiddle factors and also additions by zeros and multiplications by zeros and ones when twiddle factors are zero or one, are fully and simultaneously exploited. Second, the optimal solution is found according to the number of the registers, the data cache sizes, the number of the levels of data cache hierarchy, the main memory page size, the associativity of the data caches and the data cache line sizes, which are also considered simultaneously and not separate. Third, compilation time and source code size are very small compared with FFTW. The proposed methodology achieves performance gain about 40% (speed-up of 1.7) for architectures with small data cache sizes where memory management has a larger effect on performance and 20% (speed-up of 1.25) on average for architectures with large data cache sizes (Pentium) in comparison with FFTW.
URI:	https://hdl.handle.net/20.500.14279/1640
ISSN:	19410476
DOI:	10.1109/TSP.2011.2168525
Rights:	© IEEE
Type:	Article
Affiliation:	University of Patras
Affiliation:	University of Patras
Publication Type:	Peer Reviewed
Εμφανίζεται στις συλλογές:	Άρθρα/Articles

CORE Recommender

Δείξε την πλήρη περιγραφή του τεκμηρίου

SCOPUS^TM
Citations

9

checked on 9 Νοε 2023

WEB OF SCIENCE^TM
Citations

9

Last Week
0

Last month
0

checked on 29 Οκτ 2023

Page view(s)

553

Last Week
0

Last month
33

checked on 14 Μαρ 2025

Google Scholar^TM

Check

Altmetric

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM