测量C ++中函数的执行时间

137

我想弄清楚某个函数在我的C ++程序中要花多少时间才能在Linux上执行。之后，我想进行速度比较。我看到了几个时间函数，但最终还是从boost中获得了。计时：

process_user_cpu_clock, captures user-CPU time spent by the current process

现在，我不清楚是否使用上面的功能，我是否可以获得该功能唯一的CPU使用时间？

其次，我找不到使用上述功能的任何示例。谁能帮我如何使用以上功能？

PS：现在，我习惯于std::chrono::system_clock::now()以秒为单位获取时间，但这由于每次CPU负载不同而给我不同的结果。

c++ optimization profiling

— Xara
source

2

对于Linux使用：clock_gettime.. gcc将其他时钟定义为：typedef system_clock steady_clock; typedef system_clock high_resolution_clock;在Windows上，使用QueryPerformanceCounter。

— 布兰登

是不是这个问题的重复这一个还是场景做出的解决方案有什么不同？

— 北方人

我有一个函数的两个实现，并希望找到执行效果更好的函数。

— 北方人

非常重要：确保启用优化。未优化的代码具有不同的比正常优化的代码瓶颈，也不会告诉你任何有意义的东西。最终分配的C循环优化帮助（禁用编译器优化）。通常，微基准测试有很多陷阱，尤其是无法首先针对CPU频率和页面错误执行预热循环：惯用的性能评估方法吗？。而这个答案

— 彼得·柯德斯

另请参见您将如何基准测试功能的性能？适用于Google基准测试，从而避免了开发自己的微基准测试的许多陷阱。同样，简单的for（）循环基准测试与任何循环绑定都需要花费相同的时间，以获取更多有关优化如何与基准测试循环交互以及如何执行的信息。

— Peter Cordes

263

在C ++ 11中，这是一种非常易于使用的方法。您必须使用std::chrono::high_resolution_clockfrom <chrono>标头。

像这样使用它：

#include <iostream>
#include <chrono>

void function()
{
    long long number = 0;

    for( long long i = 0; i != 2000000; ++i )
    {
       number += 5;
    }
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    function();
    auto t2 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

    std::cout << duration;
    return 0;
}

这将测量功能的持续时间。

注意：您将不会总是获得相同的功能计时。这是因为计算机的CPU可以或多或少地被计算机上运行的其他进程使用，就像解决数学练习时您的思想或多或少地集中一样。在人的脑海中，我们可以记住数学问题的解决方案，但是对于计算机而言，相同的过程将永远是新事物。因此，正如我所说，您将不会总是获得相同的结果！

— 胜利者
source

当我使用此功能时，第一次运行时给了我118440535微秒，第二次运行相同功能时给了我83221031微秒。当我仅测量该功能的持续时间时，两个时间测量值是否应该相等？

— Xara

1

不可以。您可以使用更少或更多的计算机处理器。该high_resolution_clock给你，你的功能需要运行的物理和实时性。因此，在您的第一次运行中，CPU的使用率少于下次运行。“使用”是指其他应用程序使用CPU。

— 维克多

1

是的，如果您需要平均时间，那是获取时间的一种好方法。进行三轮，并计算平均值。

— 维克多

3

您能否在不使用“使用命名空间”的情况下发布代码。它使查看来自何处的内容变得更加容易。

— 雪人

1

这不是steady_clock吗？不可能high_resolution_clock是非单调时钟吗？

— Gillespie

15

这是一个函数，它将测量作为参数传递的任何函数的执行时间：

#include <chrono>
#include <utility>

typedef std::chrono::high_resolution_clock::time_point TimeVar;

#define duration(a) std::chrono::duration_cast<std::chrono::nanoseconds>(a).count()
#define timeNow() std::chrono::high_resolution_clock::now()

template<typename F, typename... Args>
double funcTime(F func, Args&&... args){
    TimeVar t1=timeNow();
    func(std::forward<Args>(args)...);
    return duration(timeNow()-t1);
}

用法示例：

#include <iostream>
#include <algorithm>

typedef std::string String;

//first test function doing something
int countCharInString(String s, char delim){
    int count=0;
    String::size_type pos = s.find_first_of(delim);
    while ((pos = s.find_first_of(delim, pos)) != String::npos){
        count++;pos++;
    }
    return count;
}

//second test function doing the same thing in different way
int countWithAlgorithm(String s, char delim){
    return std::count(s.begin(),s.end(),delim);
}


int main(){
    std::cout<<"norm: "<<funcTime(countCharInString,"precision=10",'=')<<"\n";
    std::cout<<"algo: "<<funcTime(countWithAlgorithm,"precision=10",'=');
    return 0;
}

输出：

norm: 15555
algo: 2976

— 贾希德
source

2

@ RestlessC0bra：它是定义的实现，high_resolution_clock可以是system_clock（壁钟）的别名，steady_clock或第三个独立的时钟。在这里查看详细信息。对于cpu时钟，std::clock可以使用

— -Jahid

2

两个宏和一个全局typedef-哪一个都不能保证一次击键-我肯定不会称呼优雅。此外，传递函数对象并分别完美地传递参数实在有点过分（即使在重载函数的情况下，也是如此）不方便），您只需要将定时代码放入lambda中即可。但是，只要传递参数是可选的即可。

— MikeMB'3

2

这是否违反了有关宏命名的每条准则？您不给它们加上前缀，不使用大写字母，而是选择一个非常常见的名称，该名称很可能与某些本地符号发生冲突，并且最重要的是：为什么要使用宏（而不是函数））？而当我们这样做时：为什么您首先将持续时间作为代表纳秒的双精度值返回？我们可能应该同意我们不同意。我的原始观点是：“这不是我所说的优雅代码”。

— MikeMB '17年

1

问题是它们没有作用域，我担心的是这样的宏最终会出现在我的代码中包含（可能间接作为库的一部分）的头文件中。通用名称用于宏，包括windows.h在非平凡的c ++项目中。assert首先，关于：“ quot licet iovi non licet bovi”;）。其次，现代标准并不是将标准库中的所有决定（有时要追溯到几十年前）实际上都视为一个好主意。这是有原因的，为什么C ++模块设计人员会非常努力地不默认情况下不导出宏。

— MikeMB

2

@Jahid：谢谢。在这种情况下，请考虑我的评论无效和无效。

— MikeMB

9

简单的程序查找函数执行所花费的时间。

#include <iostream>
#include <ctime> // time_t
#include <cstdio>

void function()
{
     for(long int i=0;i<1000000000;i++)
     {
        // do nothing
     }
}

int main()
{

time_t begin,end; // time_t is a datatype to store time values.

time (&begin); // note time before execution
function();
time (&end); // note time after execution

double difference = difftime (end,begin);
printf ("time taken for function() %.2lf seconds.\n", difference );

return 0;
}

— 阿卜杜拉·法维兹（Abdullah Farweez）
source

6

它非常不准确，仅显示秒，但不显示毫秒

— user25

6

在Scott Meyers的书中，我找到了一个通用通用lambda表达式的示例，该表达式可用于衡量函数执行时间。（C ++ 14）

auto timeFuncInvocation = 
    [](auto&& func, auto&&... params) {
        // get time before function invocation
        const auto& start = std::chrono::high_resolution_clock::now();
        // function invocation using perfect forwarding
        std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
        // get time after function invocation
        const auto& stop = std::chrono::high_resolution_clock::now();
        return stop - start;
     };

问题是您仅衡量一次执行，因此结果可能会大不相同。为了获得可靠的结果，您应该评估大量的执行。根据Andrei Alexandrescu在code :: dive 2015大会上的演讲-编写快速代码I：

测量时间：tm = t + tq + tn +至

哪里：

tm-测量（观察）的时间

t-实际的实际时间

tq-量化噪声增加的时间

tn-各种噪声源相加的时间

至-开销时间（测量，循环，调用函数）

根据他在演讲的稍后部分所说的，您应该从大量的执行中获取最少的结果。我鼓励您看一下他解释其原因的演讲。

谷歌还有一个很好的库-https: //github.com/google/benchmark。该库非常简单易用，功能强大。您可以在YouTube上查看钱德勒·卡鲁斯（Chandler Carruth）在实践中使用此库的一些讲座。例如CppCon 2017：钱德勒·卡鲁斯（Chandler Carruth）“走得更快”；

用法示例：

#include <iostream>
#include <chrono>
#include <vector>
auto timeFuncInvocation = 
    [](auto&& func, auto&&... params) {
        // get time before function invocation
        const auto& start = high_resolution_clock::now();
        // function invocation using perfect forwarding
        for(auto i = 0; i < 100000/*largeNumber*/; ++i) {
            std::forward<decltype(func)>(func)(std::forward<decltype(params)>(params)...);
        }
        // get time after function invocation
        const auto& stop = high_resolution_clock::now();
        return (stop - start)/100000/*largeNumber*/;
     };

void f(std::vector<int>& vec) {
    vec.push_back(1);
}

void f2(std::vector<int>& vec) {
    vec.emplace_back(1);
}
int main()
{
    std::vector<int> vec;
    std::vector<int> vec2;
    std::cout << timeFuncInvocation(f, vec).count() << std::endl;
    std::cout << timeFuncInvocation(f2, vec2).count() << std::endl;
    std::vector<int> vec3;
    vec3.reserve(100000);
    std::vector<int> vec4;
    vec4.reserve(100000);
    std::cout << timeFuncInvocation(f, vec3).count() << std::endl;
    std::cout << timeFuncInvocation(f2, vec4).count() << std::endl;
    return 0;
}

编辑：当然，您始终需要记住，您的编译器可以优化某些内容或没有进行优化。在这种情况下，像perf这样的工具可能会很有用。

— 克尔兹斯托夫·索默菲尔德
source

有趣-在函数模板上使用lambda有什么好处？

— user48956 '19

1

主要区别是它是一个可调用对象，但实际上，您可以使用可变参数模板和std :: result_of_t获得非常相似的东西。

— Krzysztof Sommerfeld

@KrzysztofSommerfeld如何对函数方法执行此操作，当我通过计时（Object.Method1）时，返回错误“非标准语法；使用'＆'创建指向成员的指针”

— RobinAtTech

timeFuncInvocation（[＆objectName]（auto && ... args）{objectName.methodName（std :: forward <decltype（args）>（args）...）;}，arg1，arg2，...）; 或在objectName之前省略并签名（然后您将拥有该对象的副本）

— Krzysztof Sommerfeld

4

较旧的C ++或C的简单方法：

#include <time.h> // includes clock_t and CLOCKS_PER_SEC

int main() {

    clock_t start, end;

    start = clock();
    // ...code to measure...
    end = clock();

    double duration_sec = double(end-start)/CLOCKS_PER_SEC;
    return 0;
}

计时精度（以秒为单位）为 1.0/CLOCKS_PER_SEC

— 查普林
source

1

这不是便携式的。它在Linux上测量处理器时间，在Windows上测量时钟时间。

— BugSquasher

2

这是C ++ 11中非常容易使用的方法。

我们可以使用标题中的std :: chrono :: high_resolution_clock

我们可以编写一种方法，以易于阅读的形式打印方法的执行时间。

例如，要找到1到1亿之间的所有素数，大约需要1分钟40秒。因此执行时间打印为：

Execution Time: 1 Minutes, 40 Seconds, 715 MicroSeconds, 715000 NanoSeconds

代码在这里：

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

typedef high_resolution_clock Clock;
typedef Clock::time_point ClockTime;

void findPrime(long n, string file);
void printExecutionTime(ClockTime start_time, ClockTime end_time);

int main()
{
    long n = long(1E+8);  // N = 100 million

    ClockTime start_time = Clock::now();

    // Write all the prime numbers from 1 to N to the file "prime.txt"
    findPrime(n, "C:\\prime.txt"); 

    ClockTime end_time = Clock::now();

    printExecutionTime(start_time, end_time);
}

void printExecutionTime(ClockTime start_time, ClockTime end_time)
{
    auto execution_time_ns = duration_cast<nanoseconds>(end_time - start_time).count();
    auto execution_time_ms = duration_cast<microseconds>(end_time - start_time).count();
    auto execution_time_sec = duration_cast<seconds>(end_time - start_time).count();
    auto execution_time_min = duration_cast<minutes>(end_time - start_time).count();
    auto execution_time_hour = duration_cast<hours>(end_time - start_time).count();

    cout << "\nExecution Time: ";
    if(execution_time_hour > 0)
    cout << "" << execution_time_hour << " Hours, ";
    if(execution_time_min > 0)
    cout << "" << execution_time_min % 60 << " Minutes, ";
    if(execution_time_sec > 0)
    cout << "" << execution_time_sec % 60 << " Seconds, ";
    if(execution_time_ms > 0)
    cout << "" << execution_time_ms % long(1E+3) << " MicroSeconds, ";
    if(execution_time_ns > 0)
    cout << "" << execution_time_ns % long(1E+6) << " NanoSeconds, ";
}

— 普拉蒂克·帕蒂尔（Pratik Patil）
source

0

这是一个出色的仅标头类模板，用于测量函数或任何代码块的运行时间：

#ifndef EXECUTION_TIMER_H
#define EXECUTION_TIMER_H

template<class Resolution = std::chrono::milliseconds>
class ExecutionTimer {
public:
    using Clock = std::conditional_t<std::chrono::high_resolution_clock::is_steady,
                                     std::chrono::high_resolution_clock,
                                     std::chrono::steady_clock>;
private:
    const Clock::time_point mStart = Clock::now();

public:
    ExecutionTimer() = default;
    ~ExecutionTimer() {
        const auto end = Clock::now();
        std::ostringstream strStream;
        strStream << "Destructor Elapsed: "
                  << std::chrono::duration_cast<Resolution>( end - mStart ).count()
                  << std::endl;
        std::cout << strStream.str() << std::endl;
    }    

    inline void stop() {
        const auto end = Clock::now();
        std::ostringstream strStream;
        strStream << "Stop Elapsed: "
                  << std::chrono::duration_cast<Resolution>(end - mStart).count()
                  << std::endl;
        std::cout << strStream.str() << std::endl;
    }

}; // ExecutionTimer

#endif // EXECUTION_TIMER_H

这是它的一些用法：

int main() {
    { // empty scope to display ExecutionTimer's destructor's message
         // displayed in milliseconds
         ExecutionTimer<std::chrono::milliseconds> timer;

         // function or code block here

         timer.stop();

    } 

    { // same as above
        ExecutionTimer<std::chrono::microseconds> timer;

        // code block here...

        timer.stop();
    }

    {  // same as above
       ExecutionTimer<std::chrono::nanoseconds> timer;

       // code block here...

       timer.stop();

    }

    {  // same as above
       ExecutionTimer<std::chrono::seconds> timer;

       // code block here...

       timer.stop();

    }              

    return 0;
}

由于该类是模板，因此我们可以轻松地真正指定希望如何测量和显示时间。这是用于进行基准标记的非常实用的实用程序类模板，并且非常易于使用。

— 弗朗西斯·库格勒
source

就个人而言，stop()不需要成员函数，因为析构函数会为您停止计时器。

— 凯西

@Casey类的设计不一定需要stop函数，但是出于特定原因它在那儿。在test code启动计时器之前创建对象时的默认构造。然后，在test code您明确使用计时器对象之后，调用其stop方法。要stop使用计时器时，必须手动调用它。该类不接受任何参数。同样，如果您按照我的说明使用了此类，您将看到对obj.stop和的调用之间的时间间隔最小destructor。

— 弗朗西斯·库格勒

@Casey ...这还允许在同一作用域内有多个计时器对象，不是一个真正需要它的对象，而只是另一个可行的选择。

— 弗朗西斯·库格勒

本示例无法以显示的形式进行编译。该错误与“操作符<< ...不匹配”有关！

— Celdor

@Celdor是否必须适当包括：如<chrono>？

— 弗朗西斯·库格勒

0

与相比，我建议使用steady_clock保证单调的形式high_resolution_clock。

#include <iostream>
#include <chrono>

using namespace std;

unsigned int stopwatch()
{
    static auto start_time = chrono::steady_clock::now();

    auto end_time = chrono::steady_clock::now();
    auto delta    = chrono::duration_cast<chrono::microseconds>(end_time - start_time);

    start_time = end_time;

    return delta.count();
}

int main() {
  stopwatch(); //Start stopwatch
  std::cout << "Hello World!\n";
  cout << stopwatch() << endl; //Time to execute last line
  for (int i=0; i<1000000; i++)
      string s = "ASDFAD";
  cout << stopwatch() << endl; //Time to execute for loop
}

输出：

Hello World!
62
163514

— 吉莱斯皮
source

0

您可以有一个简单的类，可以用于此类测量。

class duration_printer {
public:
    duration_printer() : __start(std::chrono::high_resolution_clock::now()) {}
    ~duration_printer() {
        using namespace std::chrono;
        high_resolution_clock::time_point end = high_resolution_clock::now();
        duration<double> dur = duration_cast<duration<double>>(end - __start);
        std::cout << dur.count() << " seconds" << std::endl;
    }
private:
    std::chrono::high_resolution_clock::time_point __start;
};

唯一需要做的就是在函数的开头创建一个对象

void veryLongExecutingFunction() {
    duration_calculator dc;
    for(int i = 0; i < 100000; ++i) std::cout << "Hello world" << std::endl;
}

int main() {
    veryLongExecutingFunction();
    return 0;
}

就是这样。可以修改该类以满足您的要求。

— 阿瑟德
source

0

由于所提供的答案均不十分准确或无法提供可重现的结果，因此我决定在我的代码中添加一个链接，该链接具有亚纳秒精度和科学统计数据。

请注意，这仅适用于测量运行时间（非常短）（又称几个时钟周期到几千个）的代码：如果它们运行的时间太长，很可能会被某些-heh-中断所中断，那么显然不可能给出可再现和准确的结果；结果是测量永远不会结束：也就是说，它将继续测量，直到统计上确定它具有99.9％的正确答案为止，而在代码运行时间太长的机器上运行其他进程的机器上，这永远不会发生。

https://github.com/CarloWood/cwds/blob/master/benchmark.h#L40

— 卡洛·伍德
source

0

如果您想要安全的时间和代码行，可以使测量函数执行时间成为一行宏：

a）如上面已经建议的那样实现一个时间测量类（这是我的android实现）：

class MeasureExecutionTime{
private:
    const std::chrono::steady_clock::time_point begin;
    const std::string caller;
public:
    MeasureExecutionTime(const std::string& caller):caller(caller),begin(std::chrono::steady_clock::now()){}
    ~MeasureExecutionTime(){
        const auto duration=std::chrono::steady_clock::now()-begin;
        LOGD("ExecutionTime")<<"For "<<caller<<" is "<<std::chrono::duration_cast<std::chrono::milliseconds>(duration).count()<<"ms";
    }
};

b）添加一个方便的宏，该宏使用当前函数名称作为TAG（此处使用宏很重要，否则__FUNCTION__将MeasureExecutionTime求和而不是您想要测量的函数

#ifndef MEASURE_FUNCTION_EXECUTION_TIME
#define MEASURE_FUNCTION_EXECUTION_TIME const MeasureExecutionTime measureExecutionTime(__FUNCTION__);
#endif

c）在要测量的函数开始处编写宏。例：

 void DecodeMJPEGtoANativeWindowBuffer(uvc_frame_t* frame_mjpeg,const ANativeWindow_Buffer& nativeWindowBuffer){
        MEASURE_FUNCTION_EXECUTION_TIME
        // Do some time-critical stuff 
}

这将导致以下输出：

ExecutionTime: For DecodeMJPEGtoANativeWindowBuffer is 54ms

请注意，此方法（与所有其他建议的解决方案一样）将测量函数调用到返回之间的时间，而不是CPU执行函数的时间。但是，如果您不给调度程序任何更改以通过调用sleep（）或类似方法来暂停运行的代码，则两者之间没有区别。

— 康斯坦丁·盖尔
source