4

I don't understand why the discrepancy between the measured and the specified durations when calling std::future::wait_for increases when the specified duration increases.

When I tell a std::future to wait for 10ns and measure the elapsed time I get ~2000ns. Now, 10ns is a very short duration, so maybe there's too much overhead involved with the associated function calls to wait for this short amount of time. But when I tell a std::future to wait for 100000ns and measure the elapsed time I get ~150000ns. A similar effect can be seen when waiting for 10 and 100 microseconds, respectively.

#include <chrono>
#include <future>
#include <iostream>
#include <thread>

using namespace std::chrono;
using namespace std::chrono_literals;

void f() { std::this_thread::sleep_for(1s); }

int main() {
  steady_clock::time_point start, end;

  std::future<void> future = std::async(std::launch::async, f);

  start = steady_clock::now();
  future.wait_for(10ns);
  end = steady_clock::now();
  std::cout << "10 -> " << (end - start).count() << '\n';

  start = steady_clock::now();
  future.wait_for(100000ns);
  end = steady_clock::now();
  std::cout << "100000 -> " << (end - start).count() << '\n';

  return 0;
}

I compile the above code with g++ future_test.cpp -lpthread, with g++ 7.3.0 on Ubuntu 18.04.

I could explain something like

10 -> 2000
100000 -> 102000

But that's not what I get. Here's a representative result of multiple executions:

10 -> 2193
100000 -> 154723

Why is the measured duration for 100'000ns more than ~2'000ns from the specified duration?

  • All you are guaranteed is that the function waits for at least the specified time. You have no upper bound on how much more it may wait. – Jesper Juhl Apr 18 at 15:17
10

Quoting the documentation:

std::future_status wait_for( const std::chrono::duration& timeout_duration );

This function may block for longer than timeout_duration due to scheduling or resource contention delays.

7

Sleeps and waits only promise to wait at least as long as you ask. Because hardware and scheduling limitations introduce an unpredictable delay, there is no fixed maximum to how much longer a sleep or wait might last.

For cyclical applications (like a timer) you can get a steady sleep (on average) if you wait until a timepoint and then increment that timepoint by a fixed amount. If you keep doing this, provided the thread wakes up frequently enough, you'll get the target delay on average over a period of time.

  • The problem is that you have little guarantee that the threads wakes up frequently enough. But this is probably the best can be done. – Davide Spataro Apr 18 at 14:52
  • 1
    @DavideSpataro It's true that this is still a problem. But that usually means that your system is not up to the task. There are work arounds, for example to skip iterations when you determine that the next timepoint is already in the past by adding multiples of the period. – François Andrieux Apr 18 at 14:53
2

A 4 GHz clock is 0.25 seconds per tick. 10 ns is then 40 ticks, or 40 instructions give or take.

Asking for a 10 ns delay to mean anything is pretty ridiculous; the time it takes to calculate the current time might easily be longer than 10 ns.

So what you are measuring here:

start = steady_clock::now();
future.wait_for(10ns);
end = steady_clock::now();
std::cout << "10 -> " << (end - start).count() << '\n';

the time it takes to calculate the current time, do the wait for overhead (check if it is ready, etc).

In the second case:

start = steady_clock::now();
future.wait_for(100000ns);
end = steady_clock::now();
std::cout << "100000 -> " << (end - start).count() << '\n';

the difference is about 50,000 ns. That is 1/20000th of a second.

Here we might be doing something like putting the CPU into a low power mode, or even setting up a spin lock.

It is possible you are context switching, but I'd guess not; switching to another context then back again would probably cost too much to bother with here.

Time slicing on an interactive OS is usually on the order of 1/50th of a second, or on the order of 20000000ns, when there is contention in the CPU.

0

You are asking for real time stuff, and this is extremely system dependant. It's OS job. But even if you use an RTOS, or preempt-RT or stuff like that I'm not sure that C++ stdlib can gains this kind of accuracy.

When you need it, provided that the system can provide it to you it's probably better to resort to OS dependent calls.

Anyway, in the realm of 1ms precision I usually spin. But mind you: You'll have outliers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.