Sunday, April 21, 2013

R performance benchmarks with sequential vs multi-threading MKL libraries

Using benchmark script from http://r.research.att.com/benchmarks/R-benchmark-25.R

Sequential MKL
 Loading required package: Matrix  
 Loading required package: methods  
 Loading required package: lattice  
 Loading required package: SuppDists  
 Warning messages:  
 1: In remove("a", "b") : object 'a' not found  
 2: In remove("a", "b") : object 'b' not found  
   
   
   R Benchmark 2.5  
   ===============  
 Number of times each test is run__________________________: 3  
   
   I. Matrix calculation  
   ---------------------  
 Creation, transp., deformation of a 2500x2500 matrix (sec): 1.51233333333333   
 2400x2400 normal distributed random matrix ^1000____ (sec): 0.914333333333333   
 Sorting of 7,000,000 random values__________________ (sec): 0.814666666666666   
 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 2.19166666666667   
 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.20833333333333   
            --------------------------------------------  
          Trimmed geom. mean (2 extremes eliminated): 1.18662349903113   
   
   II. Matrix functions  
   --------------------  
 FFT over 2,400,000 random values____________________ (sec): 0.968666666666666   
 Eigenvalues of a 640x640 random matrix______________ (sec): 0.536333333333334   
 Determinant of a 2500x2500 random matrix____________ (sec): 1.15233333333333   
 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.08166666666667   
 Inverse of a 1600x1600 random matrix________________ (sec): 0.947666666666668   
            --------------------------------------------  
         Trimmed geom. mean (2 extremes eliminated): 0.997641413170114   
   
   III. Programmation  
   ------------------  
 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.05033333333334   
 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.601666666666664   
 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.48566666666666   
 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.812666666666667   
 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.533000000000001   
            --------------------------------------------  
         Trimmed geom. mean (2 extremes eliminated): 0.800814353816818   
   
   
 Total time for all 15 tests_________________________ (sec): 15.8113333333333   
 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.982365467154044   
            --- End of test ---  
   


Multi-threading MKL
 Loading required package: Matrix  
 Loading required package: methods  
 Loading required package: lattice  
 Loading required package: SuppDists  
 Warning message:  
 In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :  
  there is no package called ‘SuppDists’  
 Warning messages:  
 1: In remove("a", "b") : object 'a' not found  
 2: In remove("a", "b") : object 'b' not found  
   
   
   R Benchmark 2.5  
   ===============  
 Number of times each test is run__________________________: 3  
   
   I. Matrix calculation  
   ---------------------  
 Creation, transp., deformation of a 2500x2500 matrix (sec): 1.52433333333333   
 2400x2400 normal distributed random matrix ^1000____ (sec): 0.911666666666667   
 Sorting of 7,000,000 random values__________________ (sec): 0.814666666666665   
 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 0.525000000000001   
 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.393333333333333   
            --------------------------------------------  
          Trimmed geom. mean (2 extremes eliminated): 0.730564293362432   
   
   II. Matrix functions  
   --------------------  
 FFT over 2,400,000 random values____________________ (sec): 0.969   
 Eigenvalues of a 640x640 random matrix______________ (sec): 0.794666666666667   
 Determinant of a 2500x2500 random matrix____________ (sec): 0.385000000000003   
 Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.349000000000001   
 Inverse of a 1600x1600 random matrix________________ (sec): 0.462999999999999   
            --------------------------------------------  
         Trimmed geom. mean (2 extremes eliminated): 0.521285412938827   
   
   III. Programmation  
   ------------------  
 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.045   
 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.600999999999999   
 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.55266666666667   
 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.814666666666668   
 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.06   
            --------------------------------------------  
         Trimmed geom. mean (2 extremes eliminated): 0.966349072573083   
   
   
 Total time for all 15 tests_________________________ (sec): 12.203   
 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.716620701088256   
            --- End of test ---  
   

Compiling R linking to MKL using gcc

I use R extensively for research, so making R faster with MKL is useful for me. Since MKL is free for non-commercial use, I decided to compile R for my computer that links to the fast MKL library.

Two configurations are covered here, one using multi-threading, and one using the sequential version of the library. Although using multi-threading makes certain matrix operations faster, if I have to run many jobs (bootstrap, cross-validation, etc), total time reduction is greater if I run jobs in parallel (using multicore package for R) rather than linear algebra operation in parallel. Hence the two different compilations.

I used gcc as the compiler rather than icc from intel since some packages on CRAN does not compile with icc. I do not know enough about compilers to know if some flags will fix this.

 I found that the configure script in R-3.0.0 does not find some functions in MKL correctly. I traced the cause of this problem to the following lines:
 29818a29819,29821  
 >  # if ${CC} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS} -o conftest${ac_exeext} \  
 >  #   conftest.${ac_objext} ${FLIBS} \  
 >  #   ${LIBM} ${BLAS_LIBS} 1>&5 2>&5;  
 29820,29821c29823,29824  
 <    conftest.${ac_objext} ${FLIBS} \  
 <    ${LIBM} ${BLAS_LIBS} 1>&5 2>&5;  
 ---  
 >    conftest.${ac_objext} ${BLAS_LIBS} ${FLIBS} \  
 >    ${LIBM} 1>&5 2>&5;  


To compile R using MKL sequential library,

  1. Apply the patch to configure script
  2. Add the following lines to config.site file
     CFLAGS='-g -O3 '                                                                                             
     FFLAGS='-g -O3 '                                                                                             
     CXXFLAGS='-g -O3 '                                                                                            
     FCFLAGS='-g -O3 '   
       
     MKL_LIB_PATH=/opt/intel/composer_xe_2013.3.163/mkl/lib/intel64  
     MKL=" -L${MKL_LIB_PATH}    \                                                                                      
        -Wl,--start-group    \                                                                                      
          -lmkl_gf_lp64    \                                                                                      
          -lmkl_sequential  \                                                                                      
          -lmkl_core     \                                                                                      
          -lm         \                                                                                      
        -Wl,--end-group     \                                                                                      
        -lpthread"  
     BLAS_LIBS="$MKL"  
    
  3. Now configure, build and install with following lines:
     ./configure --prefix=/some/directory1 --with-blas --with-lapack  
     make  
     sudo make install  
    

To compile R using MKL multi-threading library,
  1. Apply the patch to configure script
  2. Add the following lines to config.site file
     CFLAGS='-g -O3 '                                                                                             
     FFLAGS='-g -O3 '                                                                                             
     CXXFLAGS='-g -O3 '                                                                                            
     FCFLAGS='-g -O3 '   
       
     MKL_LIB_PATH=/opt/intel/composer_xe_2013.3.163/mkl/lib/intel64  
     MKL=" -L${MKL_LIB_PATH}    \                                                                                      
        -Wl,--start-group    \                                                                                      
          -lmkl_gf_lp64    \                                                                                      
          -lmkl_gnu_thread  \                                                                                      
          -lmkl_core     \                                                                                      
          -lm         \                                                                                      
        -Wl,--end-group     \                                                                                      
        -lgomp -lpthread"  
     BLAS_LIBS="$MKL"  
    
  3. Now configure, build and install with following lines:
     ./configure --prefix=/some/directory2 --with-blas --with-lapack  
     make  
     sudo make install  
    

When you run the configure command, last portion of the output should look something like this:
 R is now configured for x86_64-unknown-linux-gnu  
   
  Source directory:     .  
  Installation directory:  /usr/local  
   
  C compiler:        gcc -std=gnu99 -g -O3   
  Fortran 77 compiler:    gfortran -g -O3   
   
  C++ compiler:       g++ -g -O3   
  Fortran 90/95 compiler:  gfortran -g -O3   
  Obj-C compiler:          
   
  Interfaces supported:   X11  
  External libraries:    readline, BLAS(generic), LAPACK(in blas)  
  Additional capabilities:  JPEG, NLS  
  Options enabled:      R profiling  
   
  Recommended packages:   yes