At present, deep learning has emerged as a powerful technique for automated feature generation, since deep learning architecture can effectively capture highly complicated nonlinear features.
Promise Repository Datasets For Defect Prediction Software Defect PredictionShow citation An Approach to Semantic and Structural Features Learning for Software Defect Prediction Shi Meilong, 1 Peng He, 1, 2 Haitao Xiao, 1 Huixin Li, 1 and Cheng Zeng 1 1 School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China 2 Hubei Key Laboratory of Applied Mathematics, Hubei University, Wuhan 430062, China Show more Academic Editor: Chunlai Chai Received 20 Dec 2019 Revised 01 Feb 2020 Accepted 24 Feb 2020 Published 06 Apr 2020 Abstract Research on software defect prediction has achieved great success at modeling predictors.To build more accurate predictors, a number of hand-crafted features are proposed, such as static code features, process features, and social network features.
Few models, however, consider the semantic and structural features of programs. Understanding the context information of source code files could explain a lot about the cause of defects in software. In this paper, we leverage representation learning for semantic and structural features generation. Specifically, we first extract token vectors of code files based on the Abstract Syntax Trees (ASTs) and then feed the token vectors into Convolutional Neural Network (CNN) to automatically learn semantic features. Meanwhile, we also construct a complex network model based on the dependencies between code files, namely, software network (SN). After that, to learn the structural features, we apply the network embedding method to the resulting SN. Finally, we build a novel software defect prediction model based on the learned semantic and structural features (SDP-S2S). We evaluated our method on 6 projects collected from public PROMISE repositories. The results suggest that the contribution of structural features extracted from software network is prominent, and when combined with semantic features, the results seem to be better. In addition, compared with the traditional hand-crafted features, the F -measure values of SDP-S2S are generally increased, with a maximum growth rate of 99.5. We also explore the parameter sensitivity in the learning process of semantic and structural features and provide guidance for the optimization of predictors. Introduction Software defect is an error in the code or incorrect behavior in software execution, also defined as failure to meet intended or specified requirements. Software reliability is regarded as one of the crucial problems in software engineering. Thus, the models used to ensure software quality are required, and the software defect prediction model is one of them. Defect prediction can estimate the most defect-prone software components precisely and help developers allocate limited resources to those bits of the systems that are most likely to contain defects in testing and maintenance phases 1. As we all know, in software life cycle, the earlier you find the defect, the less it costs to fix 2. Promise Repository Datasets For Defect Prediction How To Detect DefectsTherefore, how to detect defects quickly and accurately is always an open challenge in the field of software engineering and has attracted extensive attention from industry and academia. Typical defect prediction is composed of two parts: features extraction from source files and classifiers construction using various machine learning algorithms. Existing methods are dominated by traditional hand-crafted features, namely, source code metrics (e.g., CK, Halstead, MOOD, and McCabes CC metrics). Unfortunately, these metrics generally overlook some important information implied in the code, such as semantic and structural information. Meanwhile, extensive machine learning algorithms have been adopted for software defect prediction, including Support Vector Machine (SVM), Nave Bayes (NB), Decision Tree (DT), etc. ![]() For example, Figure 1 shows two Java files, both of which contain an assignment statement, a while statement, a function call, and an increment statement. If we use traditional features to represent these two files, they are identical because of the same source code characteristics in terms of lines of code, function calls, raw programming tokens, etc. However, they are actually quite different according to semantic information. In other words, semantic information as new discriminative features should also be useful for characterizing defects for improving defect prediction.
0 Comments
Leave a Reply. |