Abstract:
One of the software engineering task is to find the practices, models, principles, and tools which can help the organization to reduce its cost and to save its time on software development project. Microblogging is a rich resource where these information can be found. However, the content of Microblogging message is short, rapidly changed, and diverse. Finding information in such source is not a trivial task. In this thesis, we propose the framework and the relevance-assessing metrics for classifying and retrieving the messages from Microblogging which are related to software engineering field. The Guide to Software Engineering Body of Knowledge (SWEBOK) is selected for constructing the term-frequency-based message classifiers. The message from Microblogging is classified and retrieved according to the score computed from its content similarity to classifiers and its social context: the combination of user feature and community feature. The experiments to assess the effectiveness of the proposed framework compared to the classic Information Retrieval approach are conducted. The classification effectiveness is measured by harmonic mean and the retrieval effectiveness is measured by weighted r-precision and discounted cumulative gain. With statistical analysis, it is shown that the proposed framework is more effective than classic Information Retrieval approach in both message classification and retrieval at a level of significant 0.05. We also develop the tool according to the proposed framework that can help software engineer to collect useful information from Microblogging.