1 |
|
---|
2 | namespace Eigen {
|
---|
3 |
|
---|
4 | /** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions
|
---|
5 |
|
---|
6 | In general achieving good performance with Eigen does no require any special effort:
|
---|
7 | simply write your expressions in the most high level way. This is especially true
|
---|
8 | for small fixed size matrices. For large matrices, however, it might be useful to
|
---|
9 | take some care when writing your expressions in order to minimize useless evaluations
|
---|
10 | and optimize the performance.
|
---|
11 | In this page we will give a brief overview of the Eigen's internal mechanism to simplify
|
---|
12 | and evaluate complex product expressions, and discuss the current limitations.
|
---|
13 | In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
|
---|
14 | all kind of matrix products and triangular solvers.
|
---|
15 |
|
---|
16 | Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar
|
---|
17 | to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
|
---|
18 | natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
|
---|
19 | Given an expression, the challenge is then to map it to a minimal set of routines.
|
---|
20 | As explained latter, this mechanism has some limitations, and knowing them will allow
|
---|
21 | you to write faster code by making your expressions more Eigen friendly.
|
---|
22 |
|
---|
23 | \section GEMM General Matrix-Matrix product (GEMM)
|
---|
24 |
|
---|
25 | Let's start with the most common primitive: the matrix product of general dense matrices.
|
---|
26 | In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
|
---|
27 | perform the following operation:
|
---|
28 | \f$ C.noalias() += \alpha op1(A) op2(B) \f$
|
---|
29 | where A, B, and C are column and/or row major matrices (or sub-matrices),
|
---|
30 | alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
|
---|
31 | When Eigen detects a matrix product, it analyzes both sides of the product to extract a
|
---|
32 | unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.
|
---|
33 | More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
|
---|
34 | negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order
|
---|
35 | and shape. All other expressions are immediately evaluated.
|
---|
36 | For instance, the following expression:
|
---|
37 | \code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2)) \endcode
|
---|
38 | is automatically simplified to:
|
---|
39 | \code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode
|
---|
40 | which exactly matches our GEMM routine.
|
---|
41 |
|
---|
42 | \subsection GEMM_Limitations Limitations
|
---|
43 | Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
|
---|
44 | handled by a single GEMM-like call are correctly detected.
|
---|
45 | <table class="manual" style="width:100%">
|
---|
46 | <tr>
|
---|
47 | <th>Not optimal expression</th>
|
---|
48 | <th>Evaluated as</th>
|
---|
49 | <th>Optimal version (single evaluation)</th>
|
---|
50 | <th>Comments</th>
|
---|
51 | </tr>
|
---|
52 | <tr>
|
---|
53 | <td>\code
|
---|
54 | m1 += m2 * m3; \endcode</td>
|
---|
55 | <td>\code
|
---|
56 | temp = m2 * m3;
|
---|
57 | m1 += temp; \endcode</td>
|
---|
58 | <td>\code
|
---|
59 | m1.noalias() += m2 * m3; \endcode</td>
|
---|
60 | <td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias.
|
---|
61 | Otherwise the product m2 * m3 is evaluated into a temporary.</td>
|
---|
62 | </tr>
|
---|
63 | <tr class="alt">
|
---|
64 | <td></td>
|
---|
65 | <td></td>
|
---|
66 | <td>\code
|
---|
67 | m1.noalias() += s1 * (m2 * m3); \endcode</td>
|
---|
68 | <td>This is a special feature of Eigen. Here the product between a scalar
|
---|
69 | and a matrix product does not evaluate the matrix product but instead it
|
---|
70 | returns a matrix product expression tracking the scalar scaling factor. <br>
|
---|
71 | Without this optimization, the matrix product would be evaluated into a
|
---|
72 | temporary as in the next example.</td>
|
---|
73 | </tr>
|
---|
74 | <tr>
|
---|
75 | <td>\code
|
---|
76 | m1.noalias() += (m2 * m3).adjoint(); \endcode</td>
|
---|
77 | <td>\code
|
---|
78 | temp = m2 * m3;
|
---|
79 | m1 += temp.adjoint(); \endcode</td>
|
---|
80 | <td>\code
|
---|
81 | m1.noalias() += m3.adjoint()
|
---|
82 | * * m2.adjoint(); \endcode</td>
|
---|
83 | <td>This is because the product expression has the EvalBeforeNesting bit which
|
---|
84 | enforces the evaluation of the product by the Tranpose expression.</td>
|
---|
85 | </tr>
|
---|
86 | <tr class="alt">
|
---|
87 | <td>\code
|
---|
88 | m1 = m1 + m2 * m3; \endcode</td>
|
---|
89 | <td>\code
|
---|
90 | temp = m2 * m3;
|
---|
91 | m1 = m1 + temp; \endcode</td>
|
---|
92 | <td>\code m1.noalias() += m2 * m3; \endcode</td>
|
---|
93 | <td>Here there is no way to detect at compile time that the two m1 are the same,
|
---|
94 | and so the matrix product will be immediately evaluated.</td>
|
---|
95 | </tr>
|
---|
96 | <tr>
|
---|
97 | <td>\code
|
---|
98 | m1.noalias() = m4 + m2 * m3; \endcode</td>
|
---|
99 | <td>\code
|
---|
100 | temp = m2 * m3;
|
---|
101 | m1 = m4 + temp; \endcode</td>
|
---|
102 | <td>\code
|
---|
103 | m1 = m4;
|
---|
104 | m1.noalias() += m2 * m3; \endcode</td>
|
---|
105 | <td>First of all, here the .noalias() in the first expression is useless because
|
---|
106 | m2*m3 will be evaluated anyway. However, note how this expression can be rewritten
|
---|
107 | so that no temporary is required. (tip: for very small fixed size matrix
|
---|
108 | it is slighlty better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>
|
---|
109 | </tr>
|
---|
110 | <tr class="alt">
|
---|
111 | <td>\code
|
---|
112 | m1.noalias() += (s1*m2).block(..) * m3; \endcode</td>
|
---|
113 | <td>\code
|
---|
114 | temp = (s1*m2).block(..);
|
---|
115 | m1 += temp * m3; \endcode</td>
|
---|
116 | <td>\code
|
---|
117 | m1.noalias() += s1 * m2.block(..) * m3; \endcode</td>
|
---|
118 | <td>This is because our expression analyzer is currently not able to extract trivial
|
---|
119 | expressions nested in a Block expression. Therefore the nested scalar
|
---|
120 | multiple cannot be properly extracted.</td>
|
---|
121 | </tr>
|
---|
122 | </table>
|
---|
123 |
|
---|
124 | Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.
|
---|
125 |
|
---|
126 | */
|
---|
127 |
|
---|
128 | }
|
---|