Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
- 2026-06-02
- 출판일: 2009-05-01
Type-1, 2, 3, 4 클론을 정의한 논문.
Abstract
Over the last decade many techniques and tools for software clone detection have been proposed. In this paper, we provide a qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools, and organize the large amount of information into a coherent conceptual framework. We begin with background concepts, a generic clone detection process and an overall taxonomy of current techniques and tools. We then classify, compare and evaluate the techniques and tools in two different dimensions. First, we classify and compare approaches based on a number of facets, each of which has a set of (possibly overlapping) attributes. Second, we qualitatively evaluate the classified techniques and tools with respect to a taxonomy of editing scenarios designed to model the creation of Type-1, Type-2, Type-3 and Type-4 clones. Finally, we provide examples of how one might use the results of this study to choose the most appropriate clone detection tool or technique in the context of a particular set of goals and constraints. The primary contributions of this paper are: (1) a schema for classifying clone detection techniques and tools and a classification of current clone detectors based on this schema, and (2) a taxonomy of editing scenarios that produce different clone types and a qualitative evaluation of current clone detectors based on this taxonomy.
sciencedirect.com/science/article/pii/S0167642309000367
분류
- Type-1: Identical code fragments except for variations in whitespace, layout and comments.
- Type-2: Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments.
- Type-3: Copied fragments with further modifications such as changed, added or removed statements, in addition to variations in identifiers, literals, types, whitespace, layout and comments.
- Type-4: Two or more code fragments that perform the same computation but are implemented by different syntactic variants.