Accessibility notice: If you need help accessing this archived item, Ask a Librarian.
ACTIVE AND PASSIVE QUERY DESIGN FOR CROWDSOURCED CLUSTERING: ALGORITHMS, ANALYSIS, AND HUMAN-CENTRIC INSIGHTS
Date
2025-05-09Author
Chen, Yi
Department
Electrical and Computer Engineering
Advisor(s)
Korlakai Vinayak, Ramya
Metadata
Show full item recordAbstract
Crowdsourced clustering aims to partition n items into K clusters using noisy human input. This thesis explores both active and passive approaches to this challenge.
For active crowdsourced clustering using pairwise queries ("Are items i and j clustered together?"), we introduce a novel, practical, and efficient algorithm. Notably, it operates without requiring prior knowledge of crowdworker error rates. We provide theoretical guarantees for cluster recovery and sample complexity bounds indicating superior performance over random querying. Experiments on a real crowdsourcing platform confirm these findings, revealing that the algorithm’s efficiency advantage is most pronounced for datasets with smaller clusters; passive methods may be preferable for datasets with large clusters.
Shifting to passive crowdsourced clustering, we investigate the influence of task design—specifically, the number of items per query—on response quality. Our results show diminishing accuracy gains beyond 4 items per query. More critically, we uncover strong evidence of contextual bias: worker judgments are influenced by the items within a query.
This research contributes both a ready-to-deploy active clustering algorithm and crucial insights into task design and the necessity of context-aware noise models for passive crowdsourcing, ultimately informing the development of more robust and efficient crowdsourced clustering systems.
Subject
Electrical and Computer Engineering
Permanent Link
http://digital.library.wisc.edu/1793/95171Type
Thesis

