1

I have a function that is executed many times in the application lifespan. In order to optimize the code, which solution is better?

Is this:

void foo() {
    static const cv::Mat zeroMat16 = cv::Mat::zeros(rows, cols, CV_16UC1);
    cv::Mat newMat = zeroMat16.clone();
    ...
}

faster than this:

void foo() {
    cv::Mat newMat = cv::Mat::zeros(rows, cols, CV_16UC1);
    ...
}

Or the efficiency is pretty much the same?

Doch88
  • 728
  • 1
  • 8
  • 22
  • 1
    why don't you profile it? – Miki Oct 19 '20 at 09:58
  • @Miki Because it's less error-prone if someone with more experience than me could give a general and motived answer, instead of using a profiler whose results depend on many factors and it gives you no explanation. – Doch88 Oct 19 '20 at 10:28
  • I do not recommend using the first method. Not only the program uses unnecessary memory but also it does not improve the performance. – Burak Oct 19 '20 at 10:39
  • @Burak so clone() and zeros() have the same computational cost? – Doch88 Oct 19 '20 at 15:36
  • 1
    Most probably. See [the implementation of `zeros`](https://github.com/opencv/opencv/blob/5ac0712cf1f25af2224afd1776ca9476e39f85d8/modules/core/src/matrix_expressions.cpp#L1751). – Burak Oct 19 '20 at 15:44

1 Answers1

2

clone uses copyTo without mask in its implementation.

inline Mat Mat::clone() const
{
  Mat m;
  copyTo(m);
  return m;
}

copyTo implementation -> memcpy


zeros implementation -> makeExpr -> MatExpr

which is then converted from MatExpr to Mat

MatExpr::operator Mat() -> assign

which calls m = Scalar();

Mat::operator= -> memset


memcpy is only slighly slower than memset

(~0.2 sec over 1 GB data),

which results zeros is slightly faster than copyTo.


Conclusion:

By using zeros instead of clone,

  • Unnecessary memory allocation is prevented.
  • The performance is increased very slightly.
Burak
  • 2,251
  • 1
  • 16
  • 33
  • can this depend on the device/platform/OS? – Micka Oct 19 '20 at 20:06
  • @Micka Given the source code and steps up to memcpy/memset functions, the implementation is platform independent. `copyTo` function has different `memcpy` implementations [for CUDA](https://github.com/opencv/opencv/blob/5ac0712cf1f25af2224afd1776ca9476e39f85d8/modules/core/src/cuda/gpu_mat.cu#L224), or [takes advantage of IPP](https://github.com/opencv/opencv/blob/5ac0712cf1f25af2224afd1776ca9476e39f85d8/modules/core/src/copy.cpp#L298) if possible. I didn't see such thing for `zeros`. – Burak Oct 19 '20 at 20:31
  • are memcpy and memset independent from the hardware, operating system and compiler it is used on? – Micka Oct 19 '20 at 21:33
  • @Micka That is beyond my knowledge. I can only say that these are the fastest functions for dynamically created memory. – Burak Oct 19 '20 at 21:39