| 1 | Bench Template Library
|
|---|
| 2 |
|
|---|
| 3 | ****************************************
|
|---|
| 4 | Introduction :
|
|---|
| 5 |
|
|---|
| 6 | The aim of this project is to compare the performance
|
|---|
| 7 | of available numerical libraries. The code is designed
|
|---|
| 8 | as generic and modular as possible. Thus, adding new
|
|---|
| 9 | numerical libraries or new numerical tests should
|
|---|
| 10 | require minimal effort.
|
|---|
| 11 |
|
|---|
| 12 |
|
|---|
| 13 | *****************************************
|
|---|
| 14 |
|
|---|
| 15 | Installation :
|
|---|
| 16 |
|
|---|
| 17 | BTL uses cmake / ctest:
|
|---|
| 18 |
|
|---|
| 19 | 1 - create a build directory:
|
|---|
| 20 |
|
|---|
| 21 | $ mkdir build
|
|---|
| 22 | $ cd build
|
|---|
| 23 |
|
|---|
| 24 | 2 - configure:
|
|---|
| 25 |
|
|---|
| 26 | $ ccmake ..
|
|---|
| 27 |
|
|---|
| 28 | 3 - run the bench using ctest:
|
|---|
| 29 |
|
|---|
| 30 | $ ctest -V
|
|---|
| 31 |
|
|---|
| 32 | You can run the benchmarks only on libraries matching a given regular expression:
|
|---|
| 33 | ctest -V -R <regexp>
|
|---|
| 34 | For instance:
|
|---|
| 35 | ctest -V -R eigen2
|
|---|
| 36 |
|
|---|
| 37 | You can also select a given set of actions defining the environment variable BTL_CONFIG this way:
|
|---|
| 38 | BTL_CONFIG="-a action1{:action2}*" ctest -V
|
|---|
| 39 | An exemple:
|
|---|
| 40 | BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata" ctest -V -R eigen2
|
|---|
| 41 |
|
|---|
| 42 | Finally, if bench results already exist (the bench*.dat files) then they merges by keeping the best for each matrix size. If you want to overwrite the previous ones you can simply add the "--overwrite" option:
|
|---|
| 43 | BTL_CONFIG="-a axpy:vector_matrix:trisolve:ata --overwrite" ctest -V -R eigen2
|
|---|
| 44 |
|
|---|
| 45 | 4 : Analyze the result. different data files (.dat) are produced in each libs directories.
|
|---|
| 46 | If gnuplot is available, choose a directory name in the data directory to store the results and type:
|
|---|
| 47 | $ cd data
|
|---|
| 48 | $ mkdir my_directory
|
|---|
| 49 | $ cp ../libs/*/*.dat my_directory
|
|---|
| 50 | Build the data utilities in this (data) directory
|
|---|
| 51 | make
|
|---|
| 52 | Then you can look the raw data,
|
|---|
| 53 | go_mean my_directory
|
|---|
| 54 | or smooth the data first :
|
|---|
| 55 | smooth_all.sh my_directory
|
|---|
| 56 | go_mean my_directory_smooth
|
|---|
| 57 |
|
|---|
| 58 |
|
|---|
| 59 | *************************************************
|
|---|
| 60 |
|
|---|
| 61 | Files and directories :
|
|---|
| 62 |
|
|---|
| 63 | generic_bench : all the bench sources common to all libraries
|
|---|
| 64 |
|
|---|
| 65 | actions : sources for different action wrappers (axpy, matrix-matrix product) to be tested.
|
|---|
| 66 |
|
|---|
| 67 | libs/* : bench sources specific to each tested libraries.
|
|---|
| 68 |
|
|---|
| 69 | machine_dep : directory used to store machine specific Makefile.in
|
|---|
| 70 |
|
|---|
| 71 | data : directory used to store gnuplot scripts and data analysis utilities
|
|---|
| 72 |
|
|---|
| 73 | **************************************************
|
|---|
| 74 |
|
|---|
| 75 | Principles : the code modularity is achieved by defining two concepts :
|
|---|
| 76 |
|
|---|
| 77 | ****** Action concept : This is a class defining which kind
|
|---|
| 78 | of test must be performed (e.g. a matrix_vector_product).
|
|---|
| 79 | An Action should define the following methods :
|
|---|
| 80 |
|
|---|
| 81 | *** Ctor using the size of the problem (matrix or vector size) as an argument
|
|---|
| 82 | Action action(size);
|
|---|
| 83 | *** initialize : this method initialize the calculation (e.g. initialize the matrices and vectors arguments)
|
|---|
| 84 | action.initialize();
|
|---|
| 85 | *** calculate : this method actually launch the calculation to be benchmarked
|
|---|
| 86 | action.calculate;
|
|---|
| 87 | *** nb_op_base() : this method returns the complexity of the calculate method (allowing the mflops evaluation)
|
|---|
| 88 | *** name() : this method returns the name of the action (std::string)
|
|---|
| 89 |
|
|---|
| 90 | ****** Interface concept : This is a class or namespace defining how to use a given library and
|
|---|
| 91 | its specific containers (matrix and vector). Up to now an interface should following types
|
|---|
| 92 |
|
|---|
| 93 | *** real_type : kind of float to be used (float or double)
|
|---|
| 94 | *** stl_vector : must correspond to std::vector<real_type>
|
|---|
| 95 | *** stl_matrix : must correspond to std::vector<stl_vector>
|
|---|
| 96 | *** gene_vector : the vector type for this interface --> e.g. (real_type *) for the C_interface
|
|---|
| 97 | *** gene_matrix : the matrix type for this interface --> e.g. (gene_vector *) for the C_interface
|
|---|
| 98 |
|
|---|
| 99 | + the following common methods
|
|---|
| 100 |
|
|---|
| 101 | *** free_matrix(gene_matrix & A, int N) dealocation of a N sized gene_matrix A
|
|---|
| 102 | *** free_vector(gene_vector & B) dealocation of a N sized gene_vector B
|
|---|
| 103 | *** matrix_from_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an stl_matrix A_stl into a gene_matrix A.
|
|---|
| 104 | The allocation of A is done in this function.
|
|---|
| 105 | *** vector_to_stl(gene_vector & B, stl_vector & B_stl) copy the content of an stl_vector B_stl into a gene_vector B.
|
|---|
| 106 | The allocation of B is done in this function.
|
|---|
| 107 | *** matrix_to_stl(gene_matrix & A, stl_matrix & A_stl) copy the content of an gene_matrix A into an stl_matrix A_stl.
|
|---|
| 108 | The size of A_STL must corresponds to the size of A.
|
|---|
| 109 | *** vector_to_stl(gene_vector & A, stl_vector & A_stl) copy the content of an gene_vector A into an stl_vector A_stl.
|
|---|
| 110 | The size of B_STL must corresponds to the size of B.
|
|---|
| 111 | *** copy_matrix(gene_matrix & source, gene_matrix & cible, int N) : copy the content of source in cible. Both source
|
|---|
| 112 | and cible must be sized NxN.
|
|---|
| 113 | *** copy_vector(gene_vector & source, gene_vector & cible, int N) : copy the content of source in cible. Both source
|
|---|
| 114 | and cible must be sized N.
|
|---|
| 115 |
|
|---|
| 116 | and the following method corresponding to the action one wants to be benchmarked :
|
|---|
| 117 |
|
|---|
| 118 | *** matrix_vector_product(const gene_matrix & A, const gene_vector & B, gene_vector & X, int N)
|
|---|
| 119 | *** matrix_matrix_product(const gene_matrix & A, const gene_matrix & B, gene_matrix & X, int N)
|
|---|
| 120 | *** ata_product(const gene_matrix & A, gene_matrix & X, int N)
|
|---|
| 121 | *** aat_product(const gene_matrix & A, gene_matrix & X, int N)
|
|---|
| 122 | *** axpy(real coef, const gene_vector & X, gene_vector & Y, int N)
|
|---|
| 123 |
|
|---|
| 124 | The bench algorithm (generic_bench/bench.hh) is templated with an action itself templated with
|
|---|
| 125 | an interface. A typical main.cpp source stored in a given library directory libs/A_LIB
|
|---|
| 126 | looks like :
|
|---|
| 127 |
|
|---|
| 128 | bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
|
|---|
| 129 |
|
|---|
| 130 | this function will produce XY data file containing measured mflops as a function of the size for 50
|
|---|
| 131 | sizes between 10 and 10000.
|
|---|
| 132 |
|
|---|
| 133 | This algorithm can be adapted by providing a given Perf_Analyzer object which determines how the time
|
|---|
| 134 | measurements must be done. For example, the X86_Perf_Analyzer use the asm rdtsc function and provides
|
|---|
| 135 | a very fast and accurate (but less portable) timing method. The default is the Portable_Perf_Analyzer
|
|---|
| 136 | so
|
|---|
| 137 |
|
|---|
| 138 | bench< AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
|
|---|
| 139 |
|
|---|
| 140 | is equivalent to
|
|---|
| 141 |
|
|---|
| 142 | bench< Portable_Perf_Analyzer,AN_ACTION < AN_INTERFACE > >( 10 , 1000 , 50 ) ;
|
|---|
| 143 |
|
|---|
| 144 | If your system supports it we suggest to use a mixed implementation (X86_perf_Analyzer+Portable_Perf_Analyzer).
|
|---|
| 145 | replace
|
|---|
| 146 | bench<Portable_Perf_Analyzer,Action>(size_min,size_max,nb_point);
|
|---|
| 147 | with
|
|---|
| 148 | bench<Mixed_Perf_Analyzer,Action>(size_min,size_max,nb_point);
|
|---|
| 149 | in generic/bench.hh
|
|---|
| 150 |
|
|---|
| 151 | .
|
|---|
| 152 |
|
|---|
| 153 |
|
|---|
| 154 |
|
|---|