Testing

The Many Uses of Tests

Most developers know the fundamental function of tests:

  • tests check that code works.

  • But tests serve other purposes as well.

  • They protect the code from future changes that unintentionally alter its behavior,

  • encourage clean code,

  • force developers to use their own APIs,

  • document how components are to be interacted with,

  • and serve as a playground for experimentation.

Types of Tests

These are the main types of tests.

  • Unit tests: verify “units” of code — a single method or behavior

  • Integration tests: verify that multiple components work together

  • System tests: verify a whole system. End-to-end (e2e, for short) workflows are run to simulate real user interactions in preproduction environments.

  • Synthetic monitoring scripts run in production to simulate user registration, browse for and purchase an item, and so on. Synthetic monitoring requires instrumentation that allows billing, accounting, and other systems to distinguish these production tests from real activity.

  • Performance tests: such as load and stress tests, measure system performance under different configurations.

  • Load tests measure performance under various levels of load: for example, how a system performs when 10, 100, or 1,000 users access it concurrently.

  • Stress tests push system load to the point of failure. Stress testing exposes how far a system is capable of going and what happens under excessive load. These tests are useful for capacity planning and defining SLOs.

  • Acceptance tests: are performed by a customer, or their proxy, to validate that the delivered software meets acceptance criteria.

Note

Don’t get too wrapped up in getting it perfectly right. Successful projects make real-world pragmatic testing decisions, and so should you. If you see an opportunity to improve the tests and test suites, by all means, do it! Don’t get hung up on naming and categorization, and refrain from passing judgment if the setup is not quite right; software entropy is a powerful force

Test Tools

Test tools fall into several categories:

  • Test-writing tools like mocking libraries help you write clean and efficient tests.

  • Test frameworks help run tests by modeling a test’s lifecycle from setup to teardown.

  • Code quality tools are used to analyze code coverage and code complexity, find bugs through static analysis, and check for style errors. Analysis tools are usually set up to run as part of a build or compile step.

Every tool added to your setup comes with baggage. Everyone must understand the tool, along with all of its idiosyncrasies. The tool might depend on many other libraries, which will further increase the complexity of the system. Some tools slow tests down. Therefore, avoid outside tools until you can justify the complexity trade-offs, and make sure your team is bought in.

Mocking Libraries: Mocks replace external dependencies with stubs that mimic the interface provided by the real system. Mocks implement functionality required for the test by responding to inputs with hard-coded responses. An excessive reliance on mocks is a code smell that suggests tight code coupling.

Test Frameworks: Frameworks do the following:

  • Manage test setup and teardown

  • Manage test execution and orchestration

  • Generate test result reports

  • Provide tooling such as extra assertion methods

  • Integrate with code coverage tools

Code Quality Tools:

  • Static code analyzers look for common mistakes, like leaving file handles open or using unset variables.

  • Code style checkers ensure all source code is formatted the same way: max characters per line, camelCasing versus snake_casing, proper indentation, that sort of thing.

  • Code complexity tools guard against overly complex logic by calculating cyclomatic complexity, or, roughly, the number of paths through your code.

  • Code coverage tools measure how many lines of code were exercised by the test suite.

Writing Your Own Tests

You are responsible for making sure your team’s code works as expected. Write your own tests; don’t expect others to clean up after you.

Many companies have formal quality assurance (QA) teams with varying responsibilities, including the following:

  • Writing black-box or white-box tests

  • Writing performance tests

  • Performing integration, user acceptance, or system tests

  • Providing and maintaining test tools

  • Maintaining test environments and infrastructure

  • Defining formal test certification and release processes

QA teams don’t write unit tests anymore; those days are long gone.

Write Clean Tests:

  • Write tests with the same care that you write other code.

  • Hacky tests have a high maintenance cost, which slows down future development.

  • Hacky tests are also less stable and less likely to provide reliable results.

  • Avoid hard-coded values, and don’t duplicate code.

  • Use design best practices to maintain a separation of concerns and to keep tests cohesive and decoupled.

  • Focus on testing fundamental functionality rather than implementation details. This helps when the codebase gets refactored, since tests will still run after the refactoring.

Don’t Overdo Testing:

  • Write tests that fail meaningfully.

  • Avoid chasing higher code coverage just to boost coverage metrics.

  • Testing thin database wrappers, third-party libraries, or basic variable assignments is worthless even if it boosts coverage metrics.

  • Focus on tests that have the largest effect on code risk. (Use risk matrix) Focus on high-likelihood, high-impact areas of the code first. Low-risk or throwaway code, like a proof of concept, isn’t worth testing.

Determinism in Tests

Deterministic code always produces the same output for the same input. By contrast, nondeterministic code can return different results for the same inputs. A unit test that invokes a call to a remote web service on a network socket is nondeterministic;

Nondeterministic tests degrade test value. Intermittent test failures (known as flapping tests) are hard to reproduce and debug because they don’t happen every run, or even every tenth run.

Nondeterminism is often introduced by improper handling of sleep, timeouts, and random number generation. Tests that leave side effects or interact with remote systems also cause nondeterminism. Escape nondeterminism by making time and randomness deterministic, cleaning up after tests, and avoiding network calls.

Seed Random Number Generators: Random number generators (RNGs) must be seeded with a value that dictates the random numbers you get from it. By default, random number generators will use the system clock as a seed. System clocks change over time, so two runs of a test with a random number generator will yield different results—nondeterminism.

Don’t Call Remote Systems in Unit Tests: Avoiding remote calls (which are slow) also keeps unit tests fast and portable. You can eliminate remote system calls in unit tests by using mocks or by refactoring code so remote systems are only required for integration tests.

Inject Clocks: This approach, called dependency injection, allows tests to override clock behavior by injecting a mock into the clock parameter.

Avoid Sleeps and Timeouts: Developers often use sleep() calls or timeouts when a test requires work in a separate thread, process, or machine to complete before the test can validate its results. The problem with this technique is that it assumes that the other thread of execution will finish in a specific amount of time, which is not something you can rely on. If the language virtual machine or interpreter garbage collects, or the operating system decides to starve the process executing the test, your tests will (sometimes) fail.

Close Network Sockets and File Handles: Many tests leak operating system resources because developers assume that tests are short lived and that the operating system will clean every- thing when the test terminates. However, test execution frameworks often use the same process for multiple tests, which means leaked system resources like network sockets or file handles won’t be immediately cleaned.

Bind to Port Zero: Instead, bind network sockets to port zero, which makes the operating system automatically pick an open port. Tests can retrieve the port that was picked and use that value through the remainder of the test.

Generate Unique File and Database Paths: Constant filepaths and database locations cause tests to interfere with each other.

Isolate and Clean Up Leftover Test State: You must reset state whether your tests pass or not; don’t let failed tests leave debris behind. Use setup and teardown methods to delete test files, clean databases, and reset in-memory test state between each execution.

Don’t Depend on Test Order: This pattern is bad for many reasons:

  • If the first test breaks, the second will break, too.

  • It’s harder to parallelize the tests, since you can’t run the second test until the first is done.

  • Changes to the first test might accidentally break the second.

  • Changes to the test runner might cause your tests to run in a different order.

Note

Sources:

  1. The Missing README: A Guide for the New Software Engineer © 2021 by Chris Riccomini and Dmitriy Ryaboy, Chapter 6.