Search This Blog

Monday, July 29, 2013

Metrics in Automation


                                                            AUTOMATION MTERICS

 



“When you can measure what you are speaking about, and can express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.”

-- Lord Kelvin, a physicist.

 

 

As part of a successful automated testing program it is important that goals and strategies are defined and then implemented. During implementation progress against these goals and strategies set out to be accomplished at the onset of the program needs to be continuously tracked and measured. This article discusses various types of automated and general testing metrics that can be used to measure and track progress.

Based on the outcome of these various metrics the defects remaining to be fixed in a testing cycle can be assessed; schedules can be adjusted accordingly or goals can be reduced. For example, if a feature is still left with too many high priority defects a decision can be made that the ship date is moved or that the system is shipped or even goes live without that specific feature.

Success is measured based on the goal we set out to accomplish relative to the expectations of our stakeholders and customers.

if you can measure something, then you have something you can quantify. If you can quantify something, then you can explain it in more detail and know something more about it. If you can explain it, then you have a better chance to attempt to improve upon it, and so on.

Metrics can provide insight into the status of automated testing efforts.

Automation efforts can provide a larger test coverage area and increase the overall quality of the product. Automation can also reduce the time of testing and the cost of

delivery. This benefit is typically realized over multiple test cycles and project cycles. Automated testing metrics can aid in making assessments as to whether progress, productivity and quality goals are being met.

What is a Metric?

The basic definition of a metric is a standard of measurement. It also can be described as a system of related measures that facilitates the quantification of some particular characteristic.1 For our purposes, a metric can be looked at as a measure which can be utilized to display past and present performance and/or used for predicting future performance.

What Are Automated Testing Metrics?

Automated testing metrics are metrics used to measure the performance (e.g. past, present, future) of the implemented automated testing process.

What Makes A Good Automated Testing Metric?

As with any metrics, automated testing metrics should have clearly defined goals of the automation effort. It serves no purpose to measure something for the sake of measuring. To be meaningful, it should be something that directly relates to the performance of the effort.

Prior to defining the automated testing metrics, there are metrics setting fundamentals you may want to review. Before measuring anything, set goals. What is it you are trying to accomplish? Goals are important, if you do not have goals, what is it that you are measuring? It is also important to continuously track and measure on an ongoing basis. Based on the metrics outcome, then you can decide if changes to deadlines, feature lists, process strategies, etc., need to be adjusted accordingly. As a step toward goal setting, there may be questions that need to be asked of the current state of affairs. Decide what questions can be asked to determine whether or not you are tracking towards the defined goals. For example:

                        How much time does it take to run the test plan?

                        How is test coverage defined (KLOC, FP, etc)?

                        How much time does it take to do data analysis?

                        How long does it take to build a scenario/driver?

                        How often do we run the test(s) selected?

                        How many permutations of the test(s) selected do we run?

                        How many people do we require to run the test(s) selected?

                        How much system time/lab time is required to run the test(s) selected?

Etc

In essence, a good automated testing metric has the following characteristics:

                        is Objective

                        is Measurable

                        is Meaningful

                        has data that is easily gathered

                        can help identify areas of test automation improvement

                        is Simple

 

A good metric is clear and not subjective, it is able to be measured, it has meaning to the project, it does not take enormous effort and/or resources to obtain the data for the metric, and it is simple to understand. A few more words about metrics being simple. Albert Einstein once said



“Make everything simple as possible, but not simpler.”

When applying this wisdom towards software testing, you will see that:

                        Simple reduces errors

                        Simple is more effective

                        Simple is elegant

                        Simple brings focus

 

Percent Automatable

At the beginning of an automated testing effort, the project is either automating existing manual test procedures, starting a new automation effort from scratch, or some combination of both. Whichever the case, a percent automatable metric can be determined.

Percent automatable can be defined as: of a set of given test cases, how many are automatable? This could be represented in the following equation:

ATC # of test cases automatable

PA (%) = -------- = ( ----------------------------------- )

TC # of total test cases

PA = Percent Automatable

ATC = # of test cases automatable

TC = # of total test cases

In evaluating test cases to be developed, what is to be considered automatable and what is not to be considered automatable? Given enough ingenuity and resources, one can argue that almost anything can be automated. So where do you draw the line? Something that can be considered ‘not automatable’ for example, could be an application area that is still under design, not very stable, and much of it is in flux. In cases such as this, we should:

“evaluate whether it make sense to automate”

 

We would evaluate for example, given the set of automatable test cases, which ones would provide the biggest return on investment:

“just because a test is automatable doesn’t necessary mean it should be automated”

When going through the test case development process, determine what tests can be AND makes sense to automate. Prioritize your automation effort based on your outcome. This metric can be used to summarize, for example, the % automatable of various projects or component within a project, and set the automation goal.

 

Automation Progress

Automation Progress refers to, of the percent automatable test cases, how many have been automated at a given time? Basically, how well are you doing in the goal of automated testing? The goal is to automat 100% of the “automatable” test cases. This metric is useful to track during the various stages of automated testing development.

AA # of actual test cases automated

AP (%) = -------- = ( -------------------------------------- )

ATC # of test cases automatable

AP = Automation Progress

AA = # of actual test cases automated

ATC = # of test cases automatable

The Automation Progress metric is a metric typically tracked over time. In the case below, time in “weeks”.

A common metric closely associated with progress of automation, yet not exclusive to automation is Test Progress. Test progress can simply be defined as the number of test cases attempted (or completed) over time.

TC # of test cases (attempted or completed)

TP = -------- = ( ------------------------------------------------ )

T time (days/weeks/months, etc)

TP = Test Progress

TC = # of test cases (either attempted or completed)

T = some unit of time (days / weeks / months, etc)

The purpose of this metric is to track test progress and compare it to the plan. This metric can be used to show where testing is tracking against the overall project plan. Test Progress over the period of time of a project usually follows an “S” shape. This typical “S” shape usually mirrors the testing activity during the project lifecycle. Little initial testing, followed by an increased amount of testing through the various development phases, into quality assurance, prior to release or delivery.

This is a metric to show progress over time. A more detailed analysis is needed to determine pass/fail, which can be represented in other metrics.

Percent of Automated Testing Test Coverage

Another automated software metric we want to consider is Percent of Automated Testing Test Coverage. That is a long title for a metric to determine what test coverage is the automated testing actually achieving? It is a metric which indicates the completeness of the testing. This metric is not so much measuring how much automation is being executed, but rather, how much of the product’s functionality is being covered. For example, 2000 test cases executing the same or similar data paths may take a lot of time and effort to execute, does not equate to a large percentage of test coverage. Percent of automatable testing coverage does not specify anything about the effectiveness of the testing taking place, it is a metric to measure its’ dimension.

AC automation coverage

PTC(%) = ------- = ( ------------------------------- )

C total coverage

PTC = Percent of Automatable testing coverage

AC = Automation coverage

C = Total Coverage (KLOC, FP, etc)

Size of system is usually counted as lines of code (KLOC) or function points (FP). KLOC is a common method of sizing a system, however, FP has also gained acceptance. Some argue that FPs can be used to size software applications more accurately. Function Point Analysis was developed in an attempt to overcome difficulties associated with KLOC (or just LOC) sizing. Function Points measure software size by quantifying the functionality provided to the user based logical design and functional specifications. There is a wealth of material available regarding the sizing or coverage of systems. A useful resourse is Stephen H Kan’s book entitled ”Metrics and Models in Software Quality Engineering” (Addison Wesley, 2003).

The Percent Automated Test Coverage metric can be used in conjunction with the standard software testing metric called Test Coverage.

TTP total # of TP

TC(%) = ------- = ( ----------------------------------- )

TTR total # of Test Requirements

TC = Percent of Testing Coverage

TTP = Total # of Test Procedures developed

TTR = Total # of defined Test Requirements

This measurement of test coverage divides the total number of test procedures developed, by the total number of defined test requirements. This metric provides the test team with a barometer to gage the depth of test coverage. The depth of test coverage is usually based on the defined acceptance criteria. When testing a mission critical system, such as operational medical systems, the test coverage indicator would need to be high relative to the depth of test coverage for non-mission critical systems. The depth of test coverage for a commercial software product that will be used by millions of end users may also be high relative to a government information system with a couple of hundred end users. 3

Defect Density

Measuring defects is a discipline to be implemented regardless if the testing effort is automated or not. Josh Bloch, Chief Architect at Google stated:



Regardless of how talented and meticulous a developer is, bugs and security vulnerabilities will be found in any body of code – open source or commercial.”, “Given this inevitably, it’s critical that all developers take the time and measures to find and fix these errors.”

Defect density is another well known metric not specific to automation. It is a measure of the total known defects divided by the size of the software entity being measured. For example, if there is a high defect density in a specific functionality, it is important to conduct a causal analysis. Is this functionality very complex, and therefore it is to be expected that the defect density is high? Is there a problem with the design/implementation of the functionality? Were the wrong (or not enough) resources assigned to the functionality, because an inaccurate risk had been assigned to it? It also could be inferred that the developer, responsible for this specific functionality, needs more training.

D # of known defects

DD = ------- = ( ------------------------------- )

SS total size of system

DD = Defect Density

D = # of known defects

SS = Total Size of system

One use of defect density is to map it against software component size. A typical defect density curve that we have experienced looks like the following, where we see small and lager sized components having a higher defect density ratio as shown below. Additionally, when evaluating defect density, the priority of the defect should be considered. For example, one application requirement may have as many as 50 low priority defects and still pass because the acceptance criteria have been satisfied. Still, another requirement might only have one open defect that prevents the acceptance criteria from being satisfied because it is a high priority. Higher priority requirements are generally weighted heavier.

The graph below shows one approach to utilizing the defect density metric. Projects can be tracked over time (for example, stages in the development cycle).

Another closely related metric to Defect Density is Defect Trend Analysis. Defect Trend Analysis is calculated as:

4 Graph adapted from article: http://www.teknologika.com/blog/SoftwareDevelopmentMetricsDefectTracking.aspx

D # of known defects

DTA = ------- = ( ------------------------------------ )

TPE # of test procedures executed

DTA = Defect Trend Analysis

D = # of known Defects

TPE = # of Test Procedures Executed over time

Defect Trend Analysis can help determine the trend of defects found. Is the trend improving as the testing phase is winding down or is the trend worsening? Defects the test automation uncovered that manual testing didn't or couldn't have is an additional way to demonstrate ROI. During the testing process, we have found defect trend analysis one of the more useful metrics to show the health of a project. One approach to show trend is to plot total number of defects along with number of open Software Problem Reports as shown in the graph below.

4

Effective Defect Tracking Analysis can present a clear view of the status of testing throughout the project. A few additional common metrics sometimes used related to defects are as follows:

 

􀂾 Cost to locate defect = Cost of testing / the number of defects located

 

􀂾 Defects detected in testing = Defects detected in testing / total system defects

 

 

􀂾 Defects detected in production = Defects detected in production/system size

 

Some of these metrics can be combined and used to enhance quality measurements as shown in the next section.

Actual Impact on Quality

One of the more popular metrics for tracking quality (if defect count is used as a measure of quality) through testing is Defect Removal Efficiency (DRE), not specific to automation, but very useful when used in conjunction with automation efforts. DRE is a metric used to determine the effectiveness of your defect removal efforts. It is also an indirect measurement of the quality of the product. The value of the DRE is calculated as a percentage. The higher the percentage, the higher positive impact on the quality of the product. This is because it represents the timely identification and removal of defects at any particular phase.

DT # of defects found during testing

DRE(%) = --------------- = ( -------------------------------------------- )

DT + DA # of defects found during testing +

# of defect found after delivery

DRE = Defect Removal Efficiency

DT = # of defects found during testing

DA = # of defects acceptance defects found after delivery The highest attainable value of DRE is “1” which equates to “100%”. In practice we have found that an efficiency rating of 100% is not likely. DRE should be measured during the different development phases. If the DRE is low during analysis and design, it may indicate that more time should be spent improving the way formal technical reviews are conducted, and so on.

This calculation can be extended for released products as a measure of the number of defects in the product that were not caught during the product development or testing phase.

Other Software Testing Metrics

Along with the metrics mentioned in the previous sections, here are a few more common test metrics. These metrics do not necessarily just apply to automation, but could be, and most often are, associated with software testing in general. These metrics are broken up into three categories:

                        Coverage: Meaningful parameters for measuring test scope and success.

.

Progress: Parameters that help identify test progress to be matched against success criteria. Progress metrics are collected iteratively over time. They can be used to graph the process itself (e.g. time to fix defects, time to test, etc).

 

                        Quality: Meaningful measures of excellence, worth, value, etc. of the testing product. It is difficult to measure quality directly; however, measuring the effects of quality is easier and possible.

 

5 Adapted from “Automated Software Testing” Addison Wesley, 1999, Dustin, et al



Metric Name

Description

Category

Test Coverage

Total number of test procedures/total number of test requirements.

The Test Coverage metric will indicate planned test coverage.

Coverage

System Coverage Analysis

The System Coverage Analysis measures the amount of coverage at the system interface level.

Coverage

Test Procedure Execution Status

Executed number of test procedures/total number of test procedures

This Test Procedure Execution metric will indicate the extent of the testing effort still outstanding.

Progress

Error Discovery Rate

Number total defects found/number of test procedures executed. The Error Discovery Rate metric uses the same calculation as the defect density metric. Metric used to analyze and support a rational product release decision

Progress

Defect Aging

Date Defect was opened versus date defect was fixed

Defect Aging metric provides an indication of turnaround of the defect.

Progress

Defect Fix Retest

Date defect was fixed & released in new build versus date defect was re-tested. The Defect Fix Retest metric provides an idea if the testing team is re-testing the fixes fast enough, in order to get an accurate progress metric

Progress

Current Quality Ratio

Number of test procedures successfully executed (without defects) versus the number of test procedures. Current Quality Ratio metric provides indications about the amount of functionality that has successfully been demonstrated.

Quality

Quality of Fixes

Number total defects reopened/total number of defects fixed

This Quality of Fixes metric will provide indications of development issues.

Quality

Ratio of previously working functionality versus new errors introduced

The Quality of Fixes metric will keep track of how often previously working functionality was adversarial affected by software fixes.

Quality
 

Problem Reports

Number of Software Problem Reports broken down by priority. The Problem Reports Resolved measure counts the number of software problems reported, listed by priority.

Quality

Test Effectiveness

Test effectiveness needs to be assessed statistically to determine how well the test data has exposed defects contained in the product.

Quality

Test Efficiency

Number of test required / the number of system errors

Quality