• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • University of Wisconsin-Madison Libraries
    • UW-Madison Open Dissertations and Theses
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • University of Wisconsin-Madison Libraries
    • UW-Madison Open Dissertations and Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Accessibility notice: If you need help accessing this archived item, Ask a Librarian.

    ACTIVE AND PASSIVE QUERY DESIGN FOR CROWDSOURCED CLUSTERING: ALGORITHMS, ANALYSIS, AND HUMAN-CENTRIC INSIGHTS

    Thumbnail
    File(s)
    Thesis_Yi_Chen_2025_compressed.pdf (1.895Mb)
    Date
    2025-05-09
    Author
    Chen, Yi
    Department
    Electrical and Computer Engineering
    Advisor(s)
    Korlakai Vinayak, Ramya
    Metadata
    Show full item record
    Abstract
    Crowdsourced clustering aims to partition n items into K clusters using noisy human input. This thesis explores both active and passive approaches to this challenge. For active crowdsourced clustering using pairwise queries ("Are items i and j clustered together?"), we introduce a novel, practical, and efficient algorithm. Notably, it operates without requiring prior knowledge of crowdworker error rates. We provide theoretical guarantees for cluster recovery and sample complexity bounds indicating superior performance over random querying. Experiments on a real crowdsourcing platform confirm these findings, revealing that the algorithm’s efficiency advantage is most pronounced for datasets with smaller clusters; passive methods may be preferable for datasets with large clusters. Shifting to passive crowdsourced clustering, we investigate the influence of task design—specifically, the number of items per query—on response quality. Our results show diminishing accuracy gains beyond 4 items per query. More critically, we uncover strong evidence of contextual bias: worker judgments are influenced by the items within a query. This research contributes both a ready-to-deploy active clustering algorithm and crucial insights into task design and the necessity of context-aware noise models for passive crowdsourcing, ultimately informing the development of more robust and efficient crowdsourced clustering systems.
    Subject
    Electrical and Computer Engineering
    Permanent Link
    http://digital.library.wisc.edu/1793/95171
    Type
    Thesis
    Part of
    • UW-Madison Open Dissertations and Theses

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback