Toward data-driven architectural support in improving the performance of future HPC architectures

Matheou, George; Soteriou, Vassos; Evripidou, Paraskevas

doi:10.1016/j.parco.2019.04.011

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/18978

Title:	Toward data-driven architectural support in improving the performance of future HPC architectures
Authors:	Matheou, George Soteriou, Vassos Evripidou, Paraskevas
Major Field of Science:	Natural Sciences
Field Category:	Computer and Information Sciences
Keywords:	Data-driven multithreading;Data-flow execution;Multi-core architecture;Hardware thread scheduler;FPGA;HPC
Issue Date:	Aug-2019
Source:	Parallel Computing, 2019, vol. 86, pp. 82-106
Volume:	86
Start page:	82
End page:	106
Journal:	Parallel Computing
Abstract:	We propose architectures based on Data-Driven Multithreading (DDM), a hybrid control-flow/data-flow model, to address the concurrency challenges faced by future High-Performance Computing (HPC) systems. We focus on the design and implementation of an optimized hardware Thread Scheduling Unit (TSU) and its integration into a multi-core system dubbed MiDAS. The TSU is the core of the DDM model and it orchestrates the execution of multiple threads on sequential processors based on data availability. MiDAS was prototyped on a Xilinx Virtex-6 FPGA and extensively evaluated using several micro-benchmarks, showing that it achieves linearly-growing performance as the processing core count increases even when running benchmarks comprising very small problem sizes. Under the largest problem size tested and with all 8 available cores being utilized, MiDAS achieves an average speedup of 7.91×, exhibiting 98.8% utilization efficiency. Further, several results pertaining to the proposed hardware TSU are provided, including FPGA real estate requirements, where it is found that MiDAS’s TSU demands relatively small overheads and reduced power consumption, while various TSU operations adhere to low latency responses. To back said claims, the proposed DDM-based TSU is compared with the Task Superscalar architecture that implements the StarSs programming framework in hardware. As such, comparison results show that the proposed TSU requires much less of both hardware investment and energy consumption to operate. Specifically, Task Superscalar is found to be 4.94 ×  larger than the DDM-supporting TSU in terms of slice register requirements and 11.34 ×  larger with respect to the slice look-up table count. Last, the hardware TSU is compared with a software TSU implementation offering identical functionalities, with both being run on an FPGA fabric under a synthetic application, where a detailed performance evaluation shows that MiDAS’s hardware-implemented TSU significantly outperforms its software-based TSU counterpart.
URI:	https://hdl.handle.net/20.500.14279/18978
ISSN:	01678191
DOI:	10.1016/j.parco.2019.04.011
Rights:	© Elsevier
Type:	Article
Affiliation :	University of Cyprus Cyprus University of Technology
Publication Type:	Peer Reviewed
Appears in Collections:	Άρθρα/Articles

CORE Recommender

Show full item record

Page view(s)

311

Last Week
0

Last month
2

checked on Jan 5, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Altmetric

Google Scholar^TM