Optimizing Hierarchical Algorithms for GPGPUs

File(s)
Date
2010-05-15Author
Nere, Andrew
Department
Electrical Engineering
Advisor(s)
Lipasti, Mikko
Metadata
Show full item recordAbstract
The performance potential of future architectures, thanks to Moores Law, grows linearly
with the number of available devices per integrated circuit. Whether these future devices are
ultra-small CMOS transistors, nano-tubes, or even individual molecules, it is clearly understood
that there will be many of them available to computer architects. If current architecture trends
are a good indicator of future designs, likely many of these devices will be allocated as extra
cores on chip-multicore systems. However, the nature of highly parallel processors consisting
of ultra-small devices brings along with it some inherent difficulties. Between the complexity
of programming multiprocessor systems, increased power consumption, and higher fault rate
for these tiny devices, future architects will have their work cut out for them. However, recent
advances in neuroscientific understanding make parallel computing devices modeled after the
human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility.
In this paper we describe a GPGPU-accelerated extension to an intelligent model based on
the mammalian neocortex. Our cortical architecture, like the human brain, exhibits massive
amounts of processing parallelism, making todays GPGPUs a highly attractive and readily-
available hardware accelerator for such a model. Using NVIDIAs CUDA framework, we have
achieved up to 330x speedup over an unoptimized C++ serial implementation. We also con-
sider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor
utilization of GPGPU resources. We propose using a software work-queue structure to solve the
former, and pipelining the cortical architecture during training phase for the latter. We also
investigate applying these techniques to a few CUDA applications that exhibit some structural
similarities to our cortical architecture model. Additionally, from our success in extending our
model to the GPU, we estimate the hardware requirements for simulating the computational
abilities of mammalian brains.
Permanent Link
http://digital.library.wisc.edu/1793/46170Type
Project Report