• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Engineering, University of Wisconsin--Madison
    • Department of Electrical and Computer Engineering
    • Theses--Electrical Engineering
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Engineering, University of Wisconsin--Madison
    • Department of Electrical and Computer Engineering
    • Theses--Electrical Engineering
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Optimizing Hierarchical Algorithms for GPGPUs

    Thumbnail
    File(s)
    Nere_Masters.pdf (718.2Kb)
    Date
    2010-05-15
    Author
    Nere, Andrew
    Department
    Electrical Engineering
    Advisor(s)
    Lipasti, Mikko
    Metadata
    Show full item record
    Abstract
    The performance potential of future architectures, thanks to Moores Law, grows linearly with the number of available devices per integrated circuit. Whether these future devices are ultra-small CMOS transistors, nano-tubes, or even individual molecules, it is clearly understood that there will be many of them available to computer architects. If current architecture trends are a good indicator of future designs, likely many of these devices will be allocated as extra cores on chip-multicore systems. However, the nature of highly parallel processors consisting of ultra-small devices brings along with it some inherent difficulties. Between the complexity of programming multiprocessor systems, increased power consumption, and higher fault rate for these tiny devices, future architects will have their work cut out for them. However, recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. In this paper we describe a GPGPU-accelerated extension to an intelligent model based on the mammalian neocortex. Our cortical architecture, like the human brain, exhibits massive amounts of processing parallelism, making todays GPGPUs a highly attractive and readily- available hardware accelerator for such a model. Using NVIDIAs CUDA framework, we have achieved up to 330x speedup over an unoptimized C++ serial implementation. We also con- sider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose using a software work-queue structure to solve the former, and pipelining the cortical architecture during training phase for the latter. We also investigate applying these techniques to a few CUDA applications that exhibit some structural similarities to our cortical architecture model. Additionally, from our success in extending our model to the GPU, we estimate the hardware requirements for simulating the computational abilities of mammalian brains.
    Permanent Link
    http://digital.library.wisc.edu/1793/46170
    Type
    Project Report
    Part of
    • Theses--Electrical Engineering

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback