• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Milwaukee
    • UW Milwaukee Electronic Theses and Dissertations
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Milwaukee
    • UW Milwaukee Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient Computation of K-Nearest Neighbor Graphs for Large High-Dimensional Data Sets on GPU Clusters

    Thumbnail
    File(s)
    Main File (3.988Mb)
    Date
    2013-08-01
    Author
    Dashti, Ali
    Department
    Engineering
    Advisor(s)
    Roshan M. D'Souza
    Metadata
    Show full item record
    Abstract
    The k-Nearest Neighbor Graph (k-NNG) and the related k-Nearest Neighbor (k-NN) methods have a wide variety of applications in areas such as bioinformatics, machine learning, data mining, clustering analysis, and pattern recognition. Our application of interest is manifold embedding. Due to the large dimensionality of the input data ( This thesis presents the development and implementation of a distributed exact k-Nearest Neighbor Graph (k-NNG) construction method. The proposed method uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism for distributed computational systems using GPUs. It is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. The distance computation is formulated as a basic matrix multiplication and reduction operation. The optimized CUBLAS matrix multiplication library is used for this purpose. Various distance metrics such as Euclidian, cosine, and Pearson are supported. For k-NNG construction, two different methods are presented. The first is based on an approach called batch index sorting to build the k-NNG with three sorting operations. This method uses the optimized radix sort implementation in the Thrust library for GPU. The second is an efficient implementation using the latest GPU functionalities of a variant of the quick select algorithm. Overall, the batch index sorting based k-NNG method is approximately 13x faster than a distributed MATLAB implementation. The quick select algorithm itself has a 5x speedup over state-of-the art GPU methods. This has enabled the processing of k-NNG construction on a data set containing 20 million image vectors, each with dimension 15,000, as part of a manifold embedding technique for analyzing the conformations of biomolecules.
    Subject
    Diffusion Map
    GPU
    High Performance Computing
    K Nearest Neighbors
    Manifold Embedding
    Permanent Link
    http://digital.library.wisc.edu/1793/92781
    Type
    thesis
    Part of
    • UW Milwaukee Electronic Theses and Dissertations

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback