A Data Locality Methodology for Matrix-matrix Multiplication Algorithm

Athanasiou, George S.; Kritikakou, Angeliki S.; Goutis, Costas E.; Kelefouras, Vasileios I.; Alachiotis, Nikolaos; Michail, Harris

doi:10.1007/s11227-010-0474-3

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/20.500.14279/13927

Τίτλος:	A Data Locality Methodology for Matrix-matrix Multiplication Algorithm
Συγγραφείς:	Athanasiou, George S. Kritikakou, Angeliki S. Goutis, Costas E. Kelefouras, Vasileios I. Alachiotis, Nikolaos Michail, Harris
Major Field of Science:	Engineering and Technology
Field Category:	Electrical Engineering - Electronic Engineering - Information Engineering
Λέξεις-κλειδιά:	Compilers;Data locality;Data reuse;Matrix-matrix multiplication;Memory management;Recursive array layouts;Scheduling;Strassen's algorithm
Ημερομηνία Έκδοσης:	7-Σεπ-2010
Πηγή:	Journal of Supercomputing, 2012, vol. 59, no. 2, pp. 830-851
Volume:	59
Issue:	2
Start page:	830
End page:	851
Περιοδικό:	Journal of Supercomputing
Περίληψη:	Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the performance of its implementations depends on the memory utilization and data locality. There are MMM algorithms, such as standard, Strassen-Winograd variant, and many recursive array layouts, such as Z-Morton or U-Morton. However, their data locality is lower than that of the proposed methodology. Moreover, several SOA (state of the art) self-tuning libraries exist, such as ATLAS for MMM algorithm, which tests many MMM implementations. During the installation of ATLAS, on the one hand an extremely complex empirical tuning step is required, and on the other hand a large number of compiler options are used, both of which are not included in the scope of this paper. In this paper, a new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. Compared with (Chatterjee et al. in IEEE Trans. Parallel Distrib. Syst. 13:1105, 2002; Li and Garzaran in Proc. of Lang. Compil. Parallel Comput., 2005; Bilmes et al. in Proc. of the 11th ACM Int. Conf. Super-comput., 1997; Aberdeen and Baxter in Concurr. Comput. Pract. Exp. 13:103, 2001), the proposed methodology has two major advantages. Firstly, the scheduling used for the tile level is different from the element level's one, having better data locality, suited to the sizes of memory hierarchy. Secondly, its exploration time is short, because it searches only for the number of the level of tiling used, and between (1, 2) (Sect. 4) for finding the best tile size for each cache level. A software tool (C-code) implementing the above methodology was developed, having the hardware model and the matrix sizes as input. This methodology has better performance against others at a wide range of architectures. Compared with the best existing related work, which we implemented, better performance up to 55% than the Standard MMM algorithm and up to 35% than Strassen's is observed, both under recursive data array layouts. © 2010 Springer Science+Business Media, LLC.
ISSN:	15730484
DOI:	10.1007/s11227-010-0474-3
Rights:	© Springer
Type:	Article
Affiliation:	University of Patras Cyprus University of Technology
Publication Type:	Peer Reviewed
Εμφανίζεται στις συλλογές:	Άρθρα/Articles

CORE Recommender

Δείξε την πλήρη περιγραφή του τεκμηρίου

SCOPUS^TM
Citations

10

checked on 14 Μαρ 2024

WEB OF SCIENCE^TM
Citations

10

Last Week
0

Last month
0

checked on 29 Οκτ 2023

Page view(s)

371

Last Week
0

Last month
36

checked on 14 Μαρ 2025

Google Scholar^TM

Check

Altmetric

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM