Exploration on Deep Drug Discovery: Representation and Learning
Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the dataset and evaluation strategy. We start by considering a wide range of ligand-based machine learning and docking-based approaches for virtual screening, and present a strategy for choosing which algorithm is best for prospective compound prioritization. During this process, we find that input information may affect the model performance. Thus we emphasize the impacts of different levels of molecule representation and introduce N-gram graph, a novel representation for a molecular graph. N-gram graph on traditional machine learning models is able to reach the state-of-the-art performance. Another issue we observe is that multi-task learning can negatively impact the performance on some individual tasks. We propose a reinforced multi-task learning (RMTL) framework, and preliminary results show that RMTL can address the issue in the two-task cases.