Benchmarking Deep Learning Testing Techniques A Methodology and Its Application
Benchmarking Deep Learning Testing Techniques A Methodology and Its Application
Abstract
With the adoption of Deep Learning (DL) systems within the security and safetycritical
domains, a variety of traditional testing techniques, novel techniques, and
new ideas are increasingly being adopted and implemented within DL testing tools.
However, there is currently no benchmark method that can help practitioners to
compare the performance of the different DL testing tools. The primary objective
of this study is to attempt to construct a benchmarking method to help practitioners
in their selection of a DL testing tool. In this paper, we perform an exploratory study
on fifteen DL testing tools to construct a benchmarking method and have made one
of the first steps towards designing a benchmarking method for DL testing tools. We
propose a set of seven tasks using a requirement-scenario-task model, to benchmark
DL testing tools. We evaluated four DL testing tools using our benchmarking tool.
The results show that the current focus within the field of DL testing is on improving
the robustness of the DL systems, however, common performance metrics to evaluate
DL testing tools are difficult to establish. Our study suggests that even though there
is an increase in DL testing research papers, the field is still in an early phase; it is not
sufficiently developed to run a full benchmarking suite. However, the benchmarking
tasks defined in the benchmarking method can be helpful to the DL practitioners
in selecting a DL testing tool. For future research, we recommend a collaborative
effort between the DL testing tool researchers to extend the benchmarking method.
Degree
Student essay
Other description
With the adoption of Deep Learning (DL) systems within the security and safetycritical
domains, a variety of traditional testing techniques, novel techniques, and
new ideas are increasingly being adopted and implemented within DL testing tools.
However, there is currently no benchmark method that can help practitioners to
compare the performance of the different DL testing tools. The primary objective
of this study is to attempt to construct a benchmarking method to help practitioners
in their selection of a DL testing tool. In this paper, we perform an exploratory study
on fifteen DL testing tools to construct a benchmarking method and have made one
of the first steps towards designing a benchmarking method for DL testing tools. We
propose a set of seven tasks using a requirement-scenario-task model, to benchmark
DL testing tools. We evaluated four DL testing tools using our benchmarking tool.
The results show that the current focus within the field of DL testing is on improving
the robustness of the DL systems, however, common performance metrics to evaluate
DL testing tools are difficult to establish. Our study suggests that even though there
is an increase in DL testing research papers, the field is still in an early phase; it is not
sufficiently developed to run a full benchmarking suite. However, the benchmarking
tasks defined in the benchmarking method can be helpful to the DL practitioners
in selecting a DL testing tool. For future research, we recommend a collaborative
effort between the DL testing tool researchers to extend the benchmarking method.
Collections
View/ Open
Date
2020-07-06Author
Chuphal, Himanshu
Dimitrov, Kristiyan
Keywords
Deep Learning
DL
DL testing tools
testing
software engineering
design
benchmark
model
datasets
tasks
tools
Language
eng