1 | Bench Template Library
|
---|
2 |
|
---|
3 | ****************************************
|
---|
4 | Introduction :
|
---|
5 |
|
---|
6 | The aim of this project is to compare the performance
|
---|
7 | of available numerical libraries. The code is designed
|
---|
8 | as generic and modular as possible. Thus, adding new
|
---|
9 | numerical libraries or new numerical tests should
|
---|
10 | require minimal effort.
|
---|
11 |
|
---|
12 |
|
---|
13 | *****************************************
|
---|
14 |
|
---|
15 | Installation :
|
---|
16 |
|
---|
17 | BTL uses cmake / ctest:
|
---|
18 |
|
---|
19 | 1 - create a build directory:
|
---|
20 |
|
---|
21 | $ mkdir build
|
---|
22 | $ cd build
|
---|
23 |
|
---|
24 | 2 - configure:
|
---|
25 |
|
---|
26 | $ ccmake ..
|
---|
27 |
|
---|
28 | 3 - run the bench using ctest:
|
---|
29 |
|
---|
30 | $ ctest -V
|
---|
31 |
|
---|
32 | You can run the benchmarks only on libraries matching a given regular expression:
|
---|
33 | ctest -V -R <regexp>
|
---|
34 | For instance:
|
---|
35 | ctest -V -R eigen2
|
---|
36 |
|
---|
37 | You can also select a given set of actions defining the environment variable BTL_CONFIG this way:
|
---|
38 | BTL_CONFIG="-a action1{:action2}*" ctest -V
|
---|
39 | An exemple:
|
---|
40 | BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2
|
---|
41 |
|
---|
42 | Finally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option:
|
---|
43 | BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2
|
---|
44 |
|
---|
45 | 4 : Analyze the result. different data files (.dat) are produced in each libs directories.
|
---|
46 | If gnuplot is available, choose a directory name in the data directory to store the results and type:
|
---|
47 | $ cd data
|
---|
48 | $ mkdir my_directory
|
---|
49 | $ cp ../libs/*/*.dat my_directory
|
---|
50 | Build the data utilities in this (data) directory
|
---|
51 | make
|
---|
52 | Then you can look the raw data,
|
---|
53 | go_mean my_directory
|
---|
54 | or smooth the data first :
|
---|
55 | smooth_all.sh my_directory
|
---|
56 | go_mean my_directory_smooth
|
---|
57 |
|
---|
58 |
|
---|
59 | *************************************************
|
---|
60 |
|
---|
61 | Files and directories :
|
---|
62 |
|
---|
63 | generic_bench : all the bench sources common to all libraries
|
---|
64 |
|
---|
65 | actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested.
|
---|
66 |
|
---|
67 | libs/* : bench sources specific to each tested libraries.
|
---|
68 |
|
---|
69 | machine_dep : directory used to store machine specific Makefile.in
|
---|
70 |
|
---|
71 | data : directory used to store gnuplot scripts and data analysis utilities
|
---|
72 |
|
---|
73 | **************************************************
|
---|
74 |
|
---|
75 | Principles : the code modularity is achieved by defining two concepts :
|
---|
76 |
|
---|
77 | ****** Action concept : This is a class defining which kind
|
---|
78 | of test must be performed (e.g. a matrix_vector_product).
|
---|
79 | An Action should define the following methods :
|
---|
80 |
|
---|
81 | *** Ctor using the size of the problem (matrix or vector size) as an argument
|
---|
82 | Action action(size);
|
---|
83 | *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments)
|
---|
84 | action.initialize();
|
---|
85 | *** calculate : this method actually launch the calculation to be benchmarked
|
---|
86 | action.calculate;
|
---|
87 | *** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation)
|
---|
88 | *** name() : this method returns the name of the action (std::string)
|
---|
89 |
|
---|
90 | ****** Interface concept : This is a class or namespace defining how to use a given library and
|
---|
91 | its specific containers (matrix and vector). Up to now an interface should following types
|
---|
92 |
|
---|
93 | *** real_type : kind of float to be used (float or double)
|
---|
94 | *** stl_vector : must correspond to std::vector<real_type>
|
---|
95 | *** stl_matrix : must correspond to std::vector<stl_vector>
|
---|
96 | *** gene_vector : the vector type for this interface --> e.g. (real_type *) for the C_interface
|
---|
97 | *** gene_matrix : the matrix type for this interface --> e.g. (gene_vector *) for the C_interface
|
---|
98 |
|
---|
99 | + the following common methods
|
---|
100 |
|
---|
101 | *** free_matrix(gene_matrix & A, int N) dealocation of a N sized gene_matrix A
|
---|
102 | *** free_vector(gene_vector & B) dealocation of a N sized gene_vector B
|
---|
103 | *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A.
|
---|
104 | The allocation of A is done in this function.
|
---|
105 | *** vector_to_stl(gene_vector & B, stl_vector & B_stl) copy the content of an stl_vector B_stl into a gene_vector B.
|
---|
106 | The allocation of B is done in this function.
|
---|
107 | *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl.
|
---|
108 | The size of A_STL must corresponds to the size of A.
|
---|
109 | *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl.
|
---|
110 | The size of B_STL must corresponds to the size of B.
|
---|
111 | *** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source
|
---|
112 | and cible must be sized NxN.
|
---|
113 | *** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source
|
---|
114 | and cible must be sized N.
|
---|
115 |
|
---|
116 | and the following method corresponding to the action one wants to be benchmarked :
|
---|
117 |
|
---|
118 | *** matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N)
|
---|
119 | *** matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N)
|
---|
120 | *** ata_product(const gene_matrix & A, gene_matrix & X, int N)
|
---|
121 | *** aat_product(const gene_matrix & A, gene_matrix & X, int N)
|
---|
122 | *** axpy(real coef, const gene_vector & X, gene_vector & Y, int N)
|
---|
123 |
|
---|
124 | The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with
|
---|
125 | an interface. A typical main.cpp source stored in a given library directory libs/A_LIB
|
---|
126 | looks like :
|
---|
127 |
|
---|
128 | bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
|
---|
129 |
|
---|
130 | this function will produce XY data file containing measured mflops as a function of the size for 50
|
---|
131 | sizes between 10 and 10000.
|
---|
132 |
|
---|
133 | This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time
|
---|
134 | measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides
|
---|
135 | a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer
|
---|
136 | so
|
---|
137 |
|
---|
138 | bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
|
---|
139 |
|
---|
140 | is equivalent to
|
---|
141 |
|
---|
142 | bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
|
---|
143 |
|
---|
144 | If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer).
|
---|
145 | replace
|
---|
146 | bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point);
|
---|
147 | with
|
---|
148 | bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point);
|
---|
149 | in generic/bench.hh
|
---|
150 |
|
---|
151 | .
|
---|
152 |
|
---|
153 |
|
---|
154 |
|
---|