Close

Costs when Scaling with Multiple Instances

One of the main reasons for testing in the cloud is to reduce the test duration, which is the time from starting the test to receiving the test results. This is technically achieved by distributing the tests across multiple parallel cloud instances.

The advantage of this highly scalable approach is that, within certain limits, the overall test duration can be reduced nearly linearly, while the additional instances required for this incur only minimal extra costs.

The future of ECU validation lies not in the development vehicle or HiL, but in the cloud, where scalability and flexibility enable new dimensions of testing.

Principle and Terms Regarding Scalability

The scalability in cloud testing is fundamentally based on the ability to divide the test execution time.

Technically, there is no difference between one instance performing tests for 100 hours and two instances each running tests for 50 hours. The total runtime of the tests remains the same.

However, the total duration can be halved with twice the number of instances if each instance requires the same execution time, and all instances are started at the same time.

Division of run time from one instance to two instances; Division does not change the run time.

Cost for the Life Time of an instance

“Each instance has a Life Time, which begins when an instance is powered up and ends when the instance is powered down. The core business model of cloud providers is a rental model.”

Cost of an instance = Instance runtime in minutes * Price per minute

Cost of an instance = Instance runtime in minutes * Price per minute

And that applies to each instance. There are other business models offered by cloud providers, but for cost considerations, we focus exclusively on the typical On-Demand service, which allows you to start as many instances as needed at any time.

Phases of Lifetime

Before testing, the operating system needs to be booted, tools must be launched, and, if necessary, test frameworks updated. Additionally, the test object and test data are copied into the instance. This process may take a moment.

Here, we refer to the duration of the test execution in an instance as “Run Time”. This depends on several factors:

 

  • Scope of the test object (e.g., architecture/design, interfaces, and lines of code)
  • Architecture/Design of the test cases
  • Performance of the instance (CPU, memory, access times, etc.)
  • Performance of the test environment
  • And others

After the test run, the test data needs to be backed up from the instance. To do this, the data must be made available outside the instance by moving/copying it to areas beyond the instance before shutting it down.

Even though start-up and shut-down times (referred to as Overhead, abbreviated as OH) are unproductive periods, they cannot be eliminated. They are absolutely necessary for the overall process in the cloud.

Through clever orchestration, the unproductive times were reduced to approximately 5 to 10 minutes in total for our two use cases.

Additional Costs and Limits of Scalability

When considering additional costs, there are two sides to the coin:

A perceived disadvantage when scaling

Although splitting tests reduces the runtimes of instances, the total costs increase with each additional instance compared to running with a single instance.

The reason is simple: Each additional instance incurs additional costs. The additional costs arise from the necessary overhead (start-up and shut-down phases) of each additional instance (see figure).

On the positive side of scaling

The duration of a test run can be significantly reduced.

Scaling is almost always worth it for long test execution times.

The additional costs are easily calculable in advance, and the best part is: the overhead costs only result in a slight increase in expenses.

Best-case scenario for splitting run times/life time for minimal total duration

Fastest test results? For that, the splitting must be optimal. The total test duration can be reduced to a maximum of the duration of the longest instance (see illustrations). 

And ideally, instantiation should be done in such a way that the life time of all instances is equal.

Execution duration with non-equidistant run-time division, exemplified with 2 instances

Execution duration with equidistant run-time division, exemplified with 2 instances

Assuming that unproductive times are nearly equal for all instances, the distribution of tests among instances should be chosen so that the run time across all instances is also as equal as possible for an optimal total duration.

Finiteness of Scaling

The division and the associated reduction of the Run Time can be done as many times as desired. The test run can theoretically be divided among as many instances as needed. The Run Time per instance is reduced according to the following formula:

Important Boundary Conditions:

  • Scalability has finite limits. There can be a maximum number of instances equal to the number of test cases. A test case is effectively atomic and cannot be further divided into multiple instances.
  • Infinite scalability is not practical; especially when the actual run time of an instance approaches the overhead, which includes start-up and shut-down, costs continue to rise without a noticeable benefit in terms of shorter duration.

Calculation Example

Assuming the overhead is five minutes and it is the same for one and two instances.

Important to Know!

The longer the execution duration of tests on a cloud instance, the more worthwhile it is to run them with multiple cloud instances.

If the run time is already low at the start (5-minute case), splitting it into two instances results in a time saving of 25 percent. However, the operating costs are 50 percent higher.

If the run time is very high (72-hour case), splitting it into two instances already results in a time saving of 49.94 percent. The costs increase by 0.12 percent.

Optimal Cost-Benefit Calculation of Scaling

Aside from additional costs, scaling also offers the advantage of faster test results. This has holistic positive effects on the entire software development process. The exact financial impact of these effects is not easy to determine, partly because calculating the total cost per product and organization is generally very individual.

These as an Approximation

A halving of the waiting time or duration until test results are available leads to an efficiency gain of at least 10 percent of the total system and software development efforts in automotive manufacturing.

Reasons

If a bug is fixed quickly, it won’t reappear in later integration stages (Rule of Ten). In the worst-case scenario, a bug integrated in higher integrations would require a significant effort to fix in multiple components.

When test results are quickly available, in case of an occurring bug, the developer can quickly address it without the need for reorientation or refocusing on already ongoing tasks.

Case Study

The current execution time of a test run of 3 days is to be reduced to 1 hour.

It’s clear that the costs will increase. But what value should be optimized now? There are at least these options: specifying the maximum execution time and specifying the maximum costs.

Interestingly, both are directly correlated.

In order to calculate the ideal cost-benefit optimum, the following is needed:

  • Measurements of start-up and shut-down times
  • At least one measurement of the run time of all tests with one instance
  • Cost models of the desired instances

With each additional instance, the execution time per instance can be reduced.

Depicted in orange is the time saved with multiple instances compared to running with a single instance. Costs (in blue) increase with each instance. The red hatching indicates the range where the additional time gained in % is lower than the additional costs in %.

At the latest, once the red area is reached, an additional instance is no longer cost-effective.

Whether these values can be implemented in practice depends individually on the development, IT, and testing processes within the company.

Technical Limits of Scaling

Atomicity of Test Cases

The maximum possible scaling depends on the execution time of each individual test case. The faster a test case can be executed on a machine, the less this limit needs to be considered.

The execution time of a typical open-loop unit test with TPT is about 1ms.
When running complex, highly compute-intensive test cases with 3D environment simulation models, such as TPT + CarMaker, a test case like this can also take 30 minutes or more. The reason for this is the simulation of, for example, multiple lane changes on the highway, high sampling rates (10ms), and a long scenario duration (15 minutes).

Unfortunately, the latter type of tests cannot be further divided without effort and therefore cannot be split into multiple instances. In this case, one can:

  • Accept the limits
  • Use a faster EC-2 instance to speed up execution time
  • Cleverly modify test design by breaking down aspects of a test case into multiple test cases to shorten the runtime of a test case

Always remember: an instance cannot be assigned fewer than at least one test case for execution.

Determination of Timings for Startup & Shutdown and Execution

These times are variable. Measurements from previous executions are indications. Determining the impact of changes to the System Under Test (SUT) and test design on execution times requires a significant effort.

A first approximation: always make estimates based on the last test run and allow for buffers for critical timings.

Attention

Especially the timings for test executions can change due to minor alterations in the code or test design. Unfortunately, the exact timings can only be measured during execution.

It is important to monitor the timings and adjust the distribution of tests when changes are detected, aiming for the most accurate equal distribution, i.e., ensuring that all instances run for the same duration.