A Study on Software Fault Detection Methods
Exploring Various Techniques and Algorithms for Software Fault Detection
by Neeta Tewari*, Dr. Alok Kumar Verma,
- Published in Journal of Advances and Scholarly Researches in Allied Education, E-ISSN: 2230-7540
Volume 11, Issue No. 22, Jul 2016, Pages 248 - 252 (5)
Published by: Ignited Minds Journals
ABSTRACT
Quality assurance activities such as testing, verification, and validation, defect tolerance, and fault prediction is some of the software engineering interests. When every organization doesn't have enough resources and money to check the whole program, a project manager may use other fault detection algorithms to classify systems components that are more vulnerable to faults. Throughout the field of information engineering, there are so many prediction methods including monitoring, health, and cost prediction. In this paper, the prediction of soft-faulting has been studied using several machines, for example, decision-making bodies, decision tables, random forest, neural networks, Naïve Bayes, and distinctive classifications of artificial immune systems, as most of them do not have a stable model.
KEYWORD
software fault detection methods, quality assurance, testing, verification, validation, defect tolerance, fault prediction, resources, money, fault detection algorithms, systems components, vulnerable, information engineering, prediction methods, monitoring, health prediction, cost prediction, soft-faulting, machines, decision-making bodies, decision tables, random forest, neural networks, Naïve Bayes, artificial immune systems, stable model
INTRODUCTION
Static and dynamic detection are the two forms of detection techniques. Statistical analysis requires tools to recognize faults and anomalies that are operating through code and data. Typically they are based on configurable laws and predefined values. Dynamic detection involves approaches like the analysis of runtime and sequence matching of trace and log data. Although static analysis may not involve the executing systems being checked, dynamic approaches require certain programmers. The identification activities take place during different life cycle periods of the project, and a schematic illustrates how they translate into project behavior. The set of approaches utilizes either one or another depending on need and criticality.[1] For standardization problems, automated tools are primarily used but Automated Static Analysis is still repeatedly used for a manual walkthrough of code. It helps in detecting flaws correlated with requirements non-compliance, potential memory leakage, variable utilization, etc. During the improvement phase, they are precariously positioned as they help to avoid extreme stress and imperfection correlated with leakage in research testing cycles. Some of Java technology's most widely employed tools are programming mistake detector, Check-Style & Find Bugs, and similarly, other technologies also have similar tools. Although this plays an important role in the growth process, it is not so commonly done in maintenance. Nonetheless, the castoff as an effective identification mechanism and hygiene consideration for methods which have an Adaptive Security Appliance tools foundation, are extremely costly like any error common in the field.[2] Adaptive Security Appliance software cannot detect multiple faults that result in faults on the repair process. Adaptive Security Appliance Open Source Software resources reveal that it includes fewer than 3% of the defects. For these situations, the greater amount of errors becomes highly useful despite the marginal commitment. Graph mining is a lively control flow that relies on the system used to recognize faults that cannot crash in the countryside. It uses call diagrams that are quickly interpreted. The graph node implies roles that are performed by boundaries and calls for certain purposes. Based on call rates, boundary weights are added. Call frequency differences and contact configuration changes are possible deficiencies. Odd events from routine events can be identified by a clustering algorithm or by a neural network. [3] Classifications are often based on the faulty runs and are recognized when a fault is detected. Naïve Bayes and Bagging are among the most popular classificatory. A guided learning methodology and an integer approach are proposed in the Bayesian system. This adopts a primary probabilistic paradigm which helps one to catch moral vagueness by defining the probability of outcomes. A bagging classifier is a meta-estimator collaborative that suits base classifiers for each arbitrary paragraph of the particular data set and then combined predictions (by voting or by averaging) to determine the outcome.[4] The secondary unmonitored model which assimilates the likelihood of propagation of the regular behavior, in-code area, to detect the events in the abnormal
of patterns is often categorized but requires special iterative methods for the classification of computer evidence in the identification of loss. A variety of biased mechanisms are initially introduced that extract recurring occurrences from the traces of system success. The collection of the functionalities is then made to choose the right grouping features.[5] For detecting errors, the classifier model is used which is qualified to use such functions. The rule-based logging method recommends the automated log entry by unique laws in an ordered way. Any anomalies in this flow can be detected and warned automatically. For problem fields with moving values and regularities, machine learning procedures are relatively effective. The output of machine learning algorithms is metric knowledge for software modules or applications with faulty details. Such methods allow the distribution likelihood and the study of errors. Decision trees, Bayesian belief networks, and Neural Network are the popular tools or techniques for software defect prediction. Data classification is done in a two-step process. Based on their characteristics attributes are classified then each record will refer to a class. Many documents are used for testing studies as datasets. During unregulated testing the items have unidentified names, so the amount of groups is typically unclear until testing. Clustering is sometimes called non-supervised learning.[6]
A. Naive Bayes Classifier
It is a simple method that uses Naïve Bayes models for classification. It helps in supervised learning. It uses maximum likelihood.
B. Decision Tree Induction
They form a tree-like structure following the basics of flowcharting, where internal nodes demonstrate the testing of the attribute; the class distribution is represented by leaf nodes. The classification of unknown samples is done by testing the sample against the decision tree, using the values of the attributes. The decision tree is used for deducing classification rules.[7]
C. Tree Pruning
It is a method of finding the wrong data. Defects in the training data result in multiple branches in the tree. Over-fitted data is identified. Statistical methods are used to do away with unstable branches.
D. Random Tree
A type of decision tree that cogitates ―K‖ randomly chosen attributes at each node and allows class
- By the bootstrapping technique, typically more than half of the dataset used for training does not exist in the bag which is called the out-of-bag data.
- Every tree is formed by random attributes. The features make nodes and leaves with standard tree-building algorithms.
- The tree is grown up to the maximum extent possible until Pruning is not done.
Failure Data
During the execution of software, the failure datasets are obtained manually. They are collected over a series of the time interval. Collecting fault data is the basic and necessary condition for model analysis, hence it is necessary to ensure the group of failure data accuracy, completeness, and timeliness. [9] Software failure data collection steps are as below: ► Take steps to determine what is the type of software failure; ► Also, determine the amount of failure data and no of people involved; ► Use automation tools for collecting failure data; ► Design the plan to implement, collect and calculate failure data, analyzing data; ► Collection and monitoring of the failure data. [10]
Failure Classification
Failures of software reliability are classified mainly into transient failure, recoverable failure, unrecoverable failure, permanent failure, and cosmetic failure.
- Random hardware failures are considered by failure rate that is either: constant, means where the constituent is in its useful life that is the effect of getting old is insignificant. Non-constant, meaning that the component is subjected to a burn-in phase or the wear-out phase.
Systematic failure: Failure linked in a deterministic way to a specific cause, which can only be eradicated by an amendment of the design or of the manufacturing process, operational procedures, documentation, or other relevant factors. Systematic failures always be repeated when triggering condition is available. Systematic faults may be introduced in any lifecycle phase and when adequately corrected, the failure, in theory, will never re-appear.
- Safe failure: Failure of an element and/or subsystem and/or system that plays a vital part in employing the safety function that:
► Results in the spurious operation of the safety function to put end-user computing (or portion thereof) into a secured state or maintain a secured state, or ► Increases the probability of the spurious operation of the safety function to put end-user computing into a secured state or keep a secured state.
- Detected - revealed by online diagnostics.
- Undetected - revealed by functional tests or upon a real demand for activation.
- Random hardware failure: Failure, taking place randomly in a period, which results from two or more possible degradation mechanisms in the hardware.
- Dangerous failure: Failure of a subsystem or system or element that plays a vital role in executing the safety function that:
► Averts a safety function from working when it is required (demand mode) or origins a safety function to fail (continuous mode) such that end-user inclines to be potentially hazardous state or ► Reduces the probability of the safety function operation [11].
Fault Classification
Design faults outcome mostly from human error in the growth process or maintenance. The possibility of the initiation of a fault in design is typically usage dependent and time-independent. By the increasing intricacy of hardware systems, design faults become an important issue for hardware reliability measurement. Defects in the hardware or software component of a system, in the hardware technology and bug fault in the software, are quite inevitable and are due to many reasons - Software systems with a significant number of states and activities, algorithms, complex formulas, and patron are often unclear in the needs and size of software and the number of people involved. Timing coordination, performance factor, recovery, hardware,
[12].
Application of Software Reliability Growth Models (SRGM)
Software Reliability Growth Models are based upon the assumption that the reliability of a program is a function of the number of faults that it contains. Such models apply statistical techniques to the determined failures during software testing and operation to predict a product's reliability. To be effective, the data utilized in the growth model is taken from where the software will be deployed. It is essential to apply the outcomes of software reliability assessment to management problems on software projects for achieving higher productivity and quality. Applying the software reliability growth models to the pragmatic software error data, the critical software reliability measures such as the number of errors lasting in the system and the software reliability function could be estimated. Software reliability analysis based on the software reliability growth model was described by non-homogeneous Poisson processes and considered an implemented software system. Software failure is an untimely departure of program operation caused by a software error remaining in order. The following usual conventions in the area of software reliability growth modeling are introduced. ► A software system is related to software failures at random times caused by software errors. ► Every time a software failure happens, the software error produced is proximately removed, and no new errors are introduced. Mobile software applications are generally a web-based system, and several web software reliability models are employed to check the time reliability and software reliability. Furthermore, there existed various techniques to characterize the workload in software [13]. However, the workload measures cannot directly apply to web software reliability study due to an environment of the complex web. Techniques utilized to estimate web software reliability depend on failure information taken from server records in addition to workload [14].
A. Optimal Software Release Problem
When the testing of the software‘s duration is extended, several software faults in a system are eliminated thereby increasing reliability. Though, it hints to raise the cost of the testing in addition to software delivery delay. In contrast, when the software testing span is short, a software system with low reliability is delivered, and it includes many software faults that have not been removed in the testing phase. Thus, the cost of maintenance at the time of the operation phase increases. So, it is essential
time. Such a decision problem is mentioned as an optimal software release problem. These decision problems have been examined in the preceding decade by several researchers. Optimal software release problems study present value besides a warranty period (in the processing stage) meanwhile the designer has to pay for fixing the faults perceived. Formerly it is needed for software growth managing to explain an optimal software testing time by incorporating the whole estimated testing cost in addition to the reliability necessity [15].
B. Maintenance Cost Model with Reliability Requirement
From the exponential software reliability development model, the function of software reliability can be described as the probability that a software failure doesn‘t consider throughout the time interval [T, T + x] subsequently the total testing time T, i.e., the release time. In reliable software development, the manager requires to spend in addition to controlling the testing resources by satisfying the reliability requirement instead of decreasing the cost [15].
CONCLUSION
We can conclude here that different kinds of feature selection and method-level metrics do not have a considerable effect on the performance of the algorithm, and the most important factor here is the type of algorithm itself; therefore, it is better to improve the algorithms to get better prediction results In this paper, we defined algorithms of fault prediction centered on various classifiers of machines and specific selection methods for apps. As the precision values are not accurate measures for success evaluation, 3 other indicators, Area under Curve, Probability of Detection, and Probability of Failures, were used that were historically not utilized together in other studies. Taking account of the high area under curve and probability of detection values along with the low probability of failure values as a successful benchmark, the random forest is best suited for large and small datasets.
REFERENCES
1. Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, Vol. 33(1), pp. 2-13. 2. Kumar, A., and Zhang, D. (2007). Hand-geometry recognition using entropy-based discretization. IEEE Transactions on Information Forensics and Security, Vol. 2(2), pp. 181-187. In IEEE International Conference on Advances in Computing, Control, & Telecommunication Technologies, 2009. ACT'09. (pp. 837-839). 4. Singh, P., & Verma, S. (2014). An efficient software fault prediction model using cluster-based classification. International Journal of Information Systems, Vol. 7(3), pp. 35-41. 5. A. Antony, G. Singh, AE Fernando, and EJLeavline, (2016). Software Fault Detection using Honey Bee Optimization. International Journal of Applied Information Systems, Vol. 11(1), pp. 1–9. 6. Akbar, M. S., & Rochimah, S. (2016). Prediksi Cacat Perangkat Lunak Dengan Optimasi Naive Bayes Menggunakan Pemilihan Fitur Gain Ratio. Jurnal Sistem dan Informatika (JSI), Vol. 11(1), pp. 147-155. 7. Novakovic, J. (2010). The impact of feature selection on the accuracy of naïve Bayes classifier. In 18th Telecommunications forum TELFOR (Vol. 2, pp. 1113-1116). 8. Rakesh Rana, Miroslaw Staron, Christian Berger, Jorgen Hansson, Martin Nilsson & Wilhelm Meding. (2016). Analyzing Defect Inflow Distribution and Applying Bayesian Inference Method for Software Defect Prediction in Large Software Projects. Journal of Systems & Software, Vol. 117, pp. 229-244. 9. Erturk, E., & Sezer, E. A. (2015). A comparison of some soft computing methods for software fault prediction. Expert systems with applications, Vol. 42(4), pp. 1872-1879. 10. M.H. Yaacob & W.A. Adnan. (1994). An integrated neural-fuzzy system of software reliability prediction. In Proceeding of the First International Conference on Software Testing, Reliability and Quality Assurance, New Delhi, India. 11. J. H. Park, J. Y. Park and S. U. Lee. (1999). Neural network modeling for software reliability prediction from failure time data. Journal of Electrical Engineering and Information Science, Vol. 4(4), pp. 533-538. 12. Kaveh Sheibani. (2006) Fuzzy Greedy Search and Job-Shop Problem. Networks. Technical Report No. 5. 14. Sona Taheri, Musa Mammadov. (2013). Learning the naive Bayes classifier with optimization models. International Journal of Applied Mathematics and Computer Science, Vol. 23(4), pp. 787–795. 15. Zachary C. Lipton, John Berkowitz, Charles Elkan. (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv preprint arXiv:1506.00019.
Corresponding Author Neeta Tewari*
Research Scholar, Mewar University, Chittorgarh, Rajasthan