The Real World of Software Testing

September 17, 2003

Manual or Automated?
Summary:Automated test tools are powerful aids to improving the return on the testing investment when used wisely. Some tests inherently require an automated approach to be effective, but others must be manual. In addition, automated testing projects that fail are expensive and politically dangerous. How can we recognize whether to automate a test or run it manually, and how much money should we spend on a test?

When Test Automation Makes Sense

Let’s start with the tests that ideally are automated. These include:

Regression and confirmation. Rerunning a test against a new release to ensure that behavior remains unbroken—or to confirm that a bug fix did indeed fix the underlying problem—is a perfect fit for automated testing. The business case for test automation outlined in Software Test Automation by Mark Fewster and Dorothy Graham is built around this kind of testing.
Monkey (or random). Tests that fire large amounts or long sequences of data, transactions, or other inputs at a system in a random search for errors are easily and profitably automated
Load, volume, and capacity. Sometimes, systems must support tremendous loads. On one project, we had to test how the system would respond to 50,000 simultaneous users, which ruled out manual testing! Two Linux systems running custom load-generating programs filled the bill.
Performance and reliability. With the rise of Web-based systems, more and more automated testing is aimed at looking for slow or flaky behavior on Web systems.
Structural, especially API-based unit, component, and integration. Most structural testing involves harnesses of some sort, which brings you most of the way into automation. Again, the article I wrote with Greg Kubaczkowski, "Mission Made Possible" (STQE magazine, July/Aug. 2002), provides an example.

Other tests that are well-suited for automation exist, such as the static testing of complexity and code standards compliance that I mentioned in the previous article. In general, automated tests have higher upfront costs—tools, test development, environments, and so forth—and lower costs to repeat the test.

When to Focus on Manual Testing

High per-test or maintenance costs are one indicator that a test should be done manually. Another is the need for human judgment to assess the correctness of the result or extensive, ongoing human intervention to keep the test running. For these reasons, the following tests are a good fit for manual testing:
Installation, setup, operations, and maintenance. In many cases, these tests involve loading CD-ROMs and tapes, changing hardware, and other ongoing hand-holding by the tester.
Configuration and compatibility. Like operations and maintenance testing, these tests require reconfiguring systems and networks, installing software and hardware, and so forth, all requiring human intervention.
Error handling and recovery. Again, the need to force errors—by powering off a server, for example—means that people must stay engaged during test execution.
Localization. Only a human tester with appropriate skills can decide whether a translation makes no sense, is culturally offensive, or is otherwise inappropriate. (Currency, date, and time testing can be automated, but the need to rerun these tests for regression is limited.)
Usability. As with localization, human judgment is needed to check for problems with the facility, simplicity, and elegance of the user interface and workflows.
Documentation and help. Like usability and localization, checking documentation requires human judgment.

Wildcards

In some cases, tests can be done manually, be automated, or both.

Functional. Functionality testing can often be automated, and automated functional testing is often part of an effort to create a regression test suite or smoke test. However, it makes sense to get the testing process under control manually before trying to automate functional testing. In addition, you’ll want to keep some of the testing manual.
Use cases (user scenarios). By stringing together functional tests into workflows, you can create realistic user scenarios, whether manual or automated. The trick here is to avoid automation if many workflows involve human intervention.
User interface. Basic testing of the user interface can be automated, but beware of frequent or extensive changes to the user interface that can incur high maintenance costs for your automated suite.
Date and time handling. If the test system can reset the computer’s clocks automatically, then you can automate these tests.

Higher per-test costs and needs for human skills, judgment, and interaction push towards manual testing. A need to repeat tests many times or reduce the cycle time for test execution pushes towards automated testing.

Reasons to Be Careful with Automation

Automated testing is a huge investment, one of the biggest that organizations make in testing. Tool licenses can easily hit six or seven figures. Neophytes can’t use most of these tools—regardless of what any glossy test tool brochure says—so training, consulting, and expert contractors can cost more than the tools themselves. Then there’s maintenance of the test scripts, which generally is more difficult and time consuming than maintaining manual test cases.

- posted by Siva Rama Krishna @ 9/17/2003 09:47:00 AM

September 16, 2003

Investing in Software Testing
What Does Quality Cost?

The title of Phil Crosby book says it all: Quality Is Free. Why is quality free? Like Crosby and J.M. Juran, Jim Campenella also illustrates a technique for analyzing the costs of quality in Principles of Quality Costs. Campenella breaks down those costs as follows:

Cost of Quality = Cost of conformance + Cost of nonconformance

Conformance Costs include Prevention Costs and Appraisal Costs.
Prevention costs include money spent on quality assurance tasks like training, requirements and code reviews, and other activities that promote good software. Appraisal costs include money spent on planning test activities, developing test cases and data, and executing those test cases once.

Nonconformance costs come in two flavors: Internal Failures and External Failures. The costs of internal failure include all expenses that arise when test cases fail the first time they are run, as they often do. A programmer incurs a cost of internal failure while debugging problems found during her own unit and component testing.

Once we get into formal testing in an independent test team, the costs of internal failure increase. Think through the process: The tester researches and reports the failure, the programmer finds and fixes the fault, the release engineer produces a new release, the system administration team installs that release in the test environment, and the tester retests the new release to confirm the fix and to check for regression.

The costs of external failure are those incurred when, rather than a tester finding a bug, the customer does. These costs will be even higher than those associated with either kind of internal failure, programmer-found or tester-found. In these cases, not only does the same process described for tester-found bugs occur, but you also incur the technical support overhead and the more expensive process of releasing a fix to the field rather than to the test lab. In addition, consider the intangible costs: angry customers, damage to the company image, lost business, and maybe even lawsuits.

Two observations lay the foundation for the enlightened view of testing as an investment. First, like any cost equation in business, we will want to minimize the cost of quality. Second, while it is often cheaper to prevent problems than to repair them, if we must repair problems, internal failures cost less than external failures.

The Risks to System Quality

Myriad risks - i.e., factors possibly leading to loss or injury menace software development. When these risks become realities, some projects fail. Wise project managers plan for and manage risks. In any software development project, we can group risks into four categories.
Financial risks: How might the project overrun the budget?
Schedule risks: How might the project exceed the allotted time?
Feature risks: How might we build the wrong product?
Quality risks: How might the product lack customer-satisfying behaviors or possess customer-dissatisfying behaviors?

Testing allows us to assess the system against the various risks to system quality, which allows the project team to manage and balance quality risks against the other three areas.

Classes of Quality Risks
It's important for test professionals to remember that many kinds of quality risks exist. The most obvious is functionality: Does the software provide all the intended capabilities? For example, a word processing program that does not support adding new text in an existing document is worthless.
While functionality is important, remember my self-deprecating anecdote in the last article. In that example, my test team and I focused entirely on functionality to the exclusion of important items like installation. In general, it's easy to over-emphasize a single quality risk and misalign the testing effort with customer usage. Consider the following examples of other classes of quality risks.

Use cases: working features fail when used in realistic sequences.

Robustness: common errors are handled improperly.

Performance: the system functions properly, but too slowly.

Localization: problems with supported languages, time zones, currencies, etc.

Data quality: a database becomes corrupted or accepts improper data.

Usability: the software's interface is cumbersome or inexplicable.

Volume/capacity: at peak or sustained loads, the system fails.

Reliability: too often -- especially at peak loads -- the system crashes, hangs, kills sessions, and so forth.

Tailoring Testing to Quality Risk Priority

To provide maximum return on the testing investment, we have to adjust the amount of time, resources, and attention we pay to each risk based on its priority. The priority of a risk to system quality arises from the extent to which that risk can and might affect the customers’ and users’ experiences of quality. In other words, the more likely a problem or the more serious the impact of a problem, the more testing that problem area deserves.

You can prioritize in a number of ways. One approach I like is to use a descending scale from one (most risky) to five (least risky) along three dimensions.

Severity: How dangerous is a failure of the system in this area?
Priority: How much does a failure of the system in this area compromise the value of the product to customers and users?
Likelihood: What are the odds that a user will encounter a failure in this area, either due to usage profiles or the technical risk of the problem?

Many such scales exist and can be used to quantify levels of quality risk.

Analyzing Quality Risks

A slightly more formal approach is the one described in the International Standards Organization document ISO 9126. This standard proposes that the quality of a software system can be measured along six major characteristics:

Functionality: Does the system provide the required capabilities?
Reliability: Does the system work as needed when needed?
Usability: Is the system intuitive, comprehensible, and handy to the users?
Efficiency: Is the system sparing in its use of resources?
Maintainability: Can operators, programmers, and customers upgrade the system as needed?
Performance: Does the system fulfill the users’ requests speedily?

Not every quality risk can be a high priority. When discussing risks to system quality, I don’t ask people, "Do you want us to make sure this area works?" In the absence of tradeoffs, everyone wants better quality. Setting the standard for quality higher requires more money spent on testing, pushes out the release date, and can distract from more important priorities—like focusing the team on the next release. To determine the real priority of a potential problem, ask people, "How much money, time, and attention would you be willing to give to problems in this area? Would you pay for an extra tester to look for bugs in this area, and would you delay shipping the product if that tester succeeded in finding bugs?" While achieving better quality generates a positive return on investment in the long run, as with the stock market, you get a better return on investment where the risk is higher. Happily, unlike the stock market, the risk of your test effort failing does not increase when you take on the most important risks to system quality, but rather your chances of test success increase.

- posted by Siva Rama Krishna @ 9/16/2003 04:45:00 PM

XP Testing Without XP: Taking Advantage of Agile Testing Practices
Extreme Programming is a discipline of software development based on values of simplicity, communication, feedback, and courage. It works by bringing the whole team together in the presence of simple practices, with enough feedback to enable the team to see where they are and to tune the practices to their unique situation. (www.xprogramming.com)

How XP Testing is Different
XP testing was different in many ways from ‘traditional’ testing. The biggest difference is that on an XP project, the entire development team takes responsibility for quality. This means the whole team is responsible for all testing tasks, including acceptance test automation. When testers and programmers work together, the approaches to test automation can be pretty creative!

As Ron Jeffries says, XP isn’t about ‘roles’, it’s about a tight integration of skills and behaviors. Testing is an integrated activity on an XP team. The development team needs continual feedback, with the customer expressing their needs in terms of tests, and programmers expressing design and code in terms of tests. On an XP team, the tester will play both the customer and programmer ‘roles’. She’ll focus on acceptance testing and work to transfer her testing and quality assurance skills to the rest of the team.

XP Tester Activities

Here are some activities testers perform on XP teams.

Negotiate quality with the customer (it’s not YOUR standard of quality, it’s what the customer desires and is willing to pay for!)

Clarify stories, flush out hidden assumptions

Enable accurate estimates for both programming and testing tasks

Make sure the acceptance tests verify the quality specified by the customer

Help the team automate tests

Help the team produce testable code

Form an integral part of the continuous feedback loop that keeps the team on track.

The Nature of XP Testing

The biggest difference between XP projects and most ‘traditional’ software development projects is the concept of test-driven development. With XP, every chunk of code is covered by unit tests, which must all pass all the time. The absence of unit-level and regression bugs means that testers actually get to focus on their job: making sure the code does what the customer wanted. The acceptance tests define the level of quality the customer has specified (and paid for!)

Testers who are new to XP should keep in mind the XP values: communication, simplicity, feedback and courage. Courage may be the most important. As a tester, the idea of writing, automating and executing tests in speedy two or three week iterations, without the benefit of traditional requirements documents, can be daunting.

Testers need courage to let the customers make mistakes and learn from them. They need courage to determine the minimum testing that will prove the successful completion of a story. They need courage to ask their teammates to pair for test automation. They need courage to remind the team that we are all responsible for quality and testing. To bolster this courage, testers on XP teams should remind themselves that an XP tester is never alone – your team is always there to help you!

For more information on XP Testing, Please click here "XP Testing Without XP: Taking Advantage of Agile Testing Practices"

- posted by Siva Rama Krishna @ 9/16/2003 10:28:00 AM

September 15, 2003

FAQ - CSTE - Set 1
Will automated testing tools make testing easier?
Possibly. For small projects, the time needed to learn and implement them may not be worth it. For larger projects, or on-going long-term projects they can be valuable

What makes a good test engineer?

'Test to break' attitude

An ability to take the point of view of the customer

A strong desire for quality

An attention to detail

Tact and diplomacy

An ability to communicate with both technical and non-technical people

Understanding of S/W development Process

What is a 'test case'?
A test case is a document that describes an input, action, or event and an expected response, to determine if a feature of an application is working correctly. A test case should contain particulars such as test case identifier, test case name, objective, test conditions/setup, input data requirements, steps, and expected results.

What should be done after a bug is found?
The bug needs to be communicated and assigned to developers that can fix it. After the problem is resolved, fixes should be re-tested, and determinations made regarding requirements for regression testing to check that fixes did not create problems elsewhere. If a problem-tracking system is in place, it should encapsulate these processes.

What is 'configuration management'?
Configuration management covers the processes used to control, coordinate, and track: code, requirements, documentation, problems, change requests, designs, tools/compilers/libraries/patches, changes made to them, and who makes the changes

What if the software is so buggy it can't really be tested at all?
The best bet in this situation is for the testers to go through the process of reporting whatever bugs or blocking-type problems initially show up, with the focus being on critical bugs. Since this type of problem can severely affect schedules, and indicates deeper problems in the software development process (such as insufficient unit testing or insufficient integration testing, poor design, improper build or release procedures, etc.) managers should be notified, and provided with some documentation as evidence of the problem.

How can it be known when to stop testing?
This can be difficult to determine. Many modern software applications are so complex, and run in such an interdependent environment, that complete testing can never be done. Common factors in deciding when to stop are:

Deadlines (release deadlines, testing deadlines, etc.)

Test cases completed with certain percentage passed

Test budget depleted

Coverage of code/functionality/requirements reaches a specified point

Bug rate falls below a certain level

Beta or alpha testing period ends

What if there isn't enough time for thorough testing?
Use risk analysis to determine where testing should be focused.
Since it's rarely possible to test every possible aspect of an application, every possible combination of events, every dependency, or everything that could go wrong, risk analysis is appropriate to most software development projects. This requires judgement skills, common sense, and experience. (If warranted, formal methods are also available.) Considerations can include:

Which functionality is most important to the project's intended purpose?

Which functionality is most visible to the user?

Which functionality has the largest safety impact?

Which functionality has the largest financial impact on users?

Which aspects of the application are most important to the customer?

Which aspects of the application can be tested early in the development cycle?

Which parts of the code are most complex, and thus most subject to errors?

Which parts of the application were developed in rush or panic mode?

Which aspects of similar/related previous projects caused problems?

Which aspects of similar/related previous projects had large maintenance expenses?

Which parts of the requirements and design are unclear or poorly thought out?

What do the developers think are the highest-risk aspects of the application?

What kinds of problems would cause the worst publicity?

What kinds of problems would cause the most customer service complaints?

What kinds of tests could easily cover multiple functionalities?

Which tests will have the best high-risk-coverage to time-required ratio?

What can be done if requirements are changing continuously?
A common problem and a major headache.

Work with the project's stakeholders early on to understand how requirements might change so that alternate test plans and strategies can be worked out in advance, if possible.

It's helpful if the application's initial design allows for some adaptability so that later changes do not require redoing the application from scratch.

If the code is well-commented and well-documented this makes changes easier for the developers.

Use rapid prototyping whenever possible to help customers feel sure of their requirements and minimize changes.

The project's initial schedule should allow for some extra time commensurate with the possibility of changes.

Try to move new requirements to a 'Phase 2' version of an application, while using the original requirements for the 'Phase 1' version.

Negotiate to allow only easily-implemented new requirements into the project, while moving more difficult new requirements into future versions of the application.

Be sure that customers and management understand the scheduling impacts, inherent risks, and costs of significant requirements changes. Then let management or the customers (not the developers or testers) decide if the changes are warranted - after all, that's their job.

Balance the effort put into setting up automated testing with the expected effort required to re-do them to deal with changes.

Try to design some flexibility into automated test scripts.

Focus initial automated testing on application aspects that are most likely to remain unchanged.

Devote appropriate effort to risk analysis of changes to minimize regression testing needs.

Design some flexibility into test cases (this is not easily done; the best bet might be to minimize the detail in the test cases, or set up only higher-level generic-type test plans)

Focus less on detailed test plans and test cases and more on ad hoc testing (with an understanding of the added risk that this entails).

What if the application has functionality that wasn't in the requirements?
It may take serious effort to determine if an application has significant unexpected or hidden functionality, and it would indicate deeper problems in the software development process. If the functionality isn't necessary to the purpose of the application, it should be removed, as it may have unknown impacts or dependencies that were not taken into account by the designer or the customer. If not removed, design information will be needed to determine added testing needs or regression testing needs. Management should be made aware of any significant added risks as a result of the unexpected functionality. If the functionality only effects areas such as minor improvements in the user interface, for example, it may not be a significant risk.

How can Software QA processes be implemented without stifling productivity?
By implementing QA processes slowly over time, using consensus to reach agreement on processes, and adjusting and experimenting as an organization grows and matures, productivity will be improved instead of stifled. Problem prevention will lessen the need for problem detection, panics and burn-out will decrease, and there will be improved focus and less wasted effort. At the same time, attempts should be made to keep processes simple and efficient, minimize paperwork, promote computer-based processes and automated tracking and reporting, minimize time required in meetings, and promote training as part of the QA process. However, no one - especially talented technical types - likes rules or bureacracy, and in the short run things may slow down a bit. A typical scenario would be that more days of planning and development will be needed, but less time will be required for late-night bug-fixing and calming of irate customers.

How does a client/server environment affect testing?
Client/server applications can be quite complex due to the multiple dependencies among clients, data communications, hardware, and servers. Thus testing requirements can be extensive. When time is limited (as it usually is) the focus should be on integration and system testing. Additionally, load/stress/performance testing may be useful in determining client/server application limitations and capabilities. There are commercial tools to assist with such testing

How can World Wide Web sites be tested?
Web sites are essentially client/server applications - with web servers and 'browser' clients. Consideration should be given to the interactions between html pages, TCP/IP communications, Internet connections, firewalls, applications that run in web pages (such as applets, javascript, plug-in applications), and applications that run on the server side (such as cgi scripts, database interfaces, logging applications, dynamic page generators, asp, etc.). Additionally, there are a wide variety of servers and browsers, various versions of each, small but sometimes significant differences between them, variations in connection speeds, rapidly changing technologies, and multiple standards and protocols. The end result is that testing for web sites can become a major ongoing effort. Other considerations might include:

What are the expected loads on the server (e.g., number of hits per unit time?), and what kind of performance is required under such loads (such as web server response time, database query response times). What kinds of tools will be needed for performance testing (such as web load testing tools, other tools already in house that can be adapted, web robot downloading tools, etc.)?

Who is the target audience? What kind of browsers will they be using? What kind of connection speeds will they by using? Are they intra- organization (thus with likely high connection speeds and similar browsers) or Internet-wide (thus with a wide variety of connection speeds and browser types)?

What kind of performance is expected on the client side (e.g., how fast should pages appear, how fast should animations, applets, etc. load and run)?

Will down time for server and content maintenance/upgrades be allowed? how much?

What kinds of security (firewalls, encryptions, passwords, etc.) will be required and what is it expected to do? How can it be tested?

How reliable are the site's Internet connections required to be? And how does that affect backup system or redundant connection requirements and testing?

What processes will be required to manage updates to the web site's content, and what are the requirements for maintaining, tracking, and controlling page content, graphics, links, etc.?

Which HTML specification will be adhered to? How strictly? What variations will be allowed for targeted browsers?

Will there be any standards or requirements for page appearance and/or graphics throughout a site or parts of a site??

How will internal and external links be validated and updated? how often?

Can testing be done on the production system, or will a separate test system be required? How are browser caching, variations in browser option settings, dial-up connection variabilities, and real-world internet 'traffic congestion' problems to be accounted for in testing?

How extensive or customized are the server logging and reporting requirements; are they considered an integral part of the system and do they require testing?

How are cgi programs, applets, javascripts, ActiveX components, etc. to be maintained, tracked, controlled, and tested?

How is testing affected by object-oriented designs?
Well-engineered object-oriented design can make it easier to trace from code to internal design to functional design to requirements. While there will be little affect on black box testing (where an understanding of the internal design of the application is unnecessary), white-box testing can be oriented to the application's objects. If the application was well-designed this can simplify test design.

What is Extreme Programming and what's it got to do with testing?
Extreme Programming (XP) is a software development approach for small teams on risk-prone projects with unstable requirements. It was created by Kent Beck who described the approach in his book 'Extreme Programming Explained'. Testing ('extreme testing') is a core aspect of Extreme Programming. Programmers are expected to write unit and functional test code first - before the application is developed. Test code is under source control along with the rest of the code. Customers are expected to be an integral part of the project team and to help develope scenarios for acceptance/black box testing. Acceptance tests are preferably automated, and are modified and rerun for each of the frequent development iterations. QA and test personnel are also required to be an integral part of the project team. Detailed requirements documentation is not used, and frequent re-scheduling, re-estimating, and re-prioritizing is expected.

- posted by Siva Rama Krishna @ 9/15/2003 02:09:00 PM