Hannes Thaller, Lukas Linsbauer, Alexander Egyed,
"Semantic Clone Detection via Probabilistic Software Modeling 2022"
: Fundamental Approaches to Software Engineering - 25th International Conference (FASE) Held as Part of the European Joint Conferences on Theory and Practice of Software (ETAPS) , Munich, Germany, Serie Lecture Notes in Computer Science (LNCS), Vol. 13241, Nummer 2, Springer, Seite(n) 288-309, 5-2022, ISBN: 978-1-59593-882-4
Original Titel:
Semantic Clone Detection via Probabilistic Software Modeling 2022
Sprache des Titels:
Englisch
Original Buchtitel:
Fundamental Approaches to Software Engineering - 25th International Conference (FASE) Held as Part of the European Joint Conferences on Theory and Practice of Software (ETAPS) , Munich, Germany
Original Kurzfassung:
Software product lines (SPLs) are known for improving productivityand reducing time-to-market through the systematic reuse of assets.SPLs are adopted mainly by re-engineering existing system vari-ants. Feature location techniques (FLTs) support the re-engineeringprocess by mapping the variants? features to their implementation.However, such FLTs do not perform well when applied to singlesystems. In this way, there is a lack of FLTs to aid the re-engineeringprocess of a single system into an SPL. In this work, we presenta hybrid technique that consists of two complementary types ofanalysis: i) a dynamic analysis by runtime monitoring traces ofscenarios in which features of the system are exercised individually,and ii) a static analysis for refining overlapping traces. We evaluateour technique on three subject systems by computing the commonmetrics used in FL research. We thus computed Precision, Recall,and F-Score at the line- and method-level of source code. In additionto that, one of the systems has a ground truth available, which wealso used for comparing results. Results show that our FLT reachedan average of 68-78% precision and 72-81% recall on two systemsat the line-level, and 67-65% precision and 68-48% recall at themethod-level. In these systems, most of the implementation can becovered by the exercise of the features. For the largest system, ourtechnique reached a precision of up to 99% at the line-level, 94%at the method-level, and 44% when comparing to traces. However,due to its size, it was difficult to reach high code coverage duringexecution, and thus the recall obtained was on average of 28% atthe line-level, 25% at the method-level, and 30% when comparingto traces. The main contribution of this work is a hybrid FLT, itspublicly available implementation, and a replication package forcomparisons and future studies.