Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

TAU Usage Examples

PAPI hardware counters data

First, define TAU_MAKEFILE with setenv or include a Makefile from the TAU Makefiles directory (either $TAU_ROOT_DIR/ia64/lib or $TAU_ROOT_DIR/xt3/lib) that includes the word 'papi'. Then instrument and execute the binary as described in the TAU document. Set the desired PAPI counters before executing the job. To see the list of available counters, first load the papi module (module load papi) then type 'papi_avail'.

Set the environment variables COUNTER[1-25] in the command-line or in the submission script as follows:

setenv COUNTER1 GET_TIME_OF_DAY
setenv COUNTER2 PAPI_FP_INS
setenv COUNTER3 PAPI_TOT_CYC
...................

Note that GET_TIME_OF_DAY is a system parameter. COUNTER1 is always set to GET_TIME_OF_DAY to allow TAU to synchronize time across tasks and provide a globally synchronized real-time clock for tracing.

Loop level instrumentation

To automatically instrument all the outer do loops in the routine 'foo', create a text file containing the following lines of code and include the text file, say loop_instru.txt, as a flag in your makefile,

%  cat loop_instru.txt

BEGIN_INSTRUMENT_SECTION
loops routine="FOO"
END_INSTRUMENT_SECTION 

The following code segment instruments the outer loop multiply in loop_test.cpp

BEGIN_INSTRUMENT_SECTION
loops file="loop_test.cpp" routine="double multiply#"
END_INSTRUMENT_SECTION

Selective instrumentation

Parts of the code can be included or excluded from instrumentation using the keywords SECTIONBEGIN_FILE_INCLUDE_LIST or SECTIONBEGIN_FILE_EXCLUDE_LIST.

The following causes only foo1, foo2, and foo3 files to be instrumented. Save this code segment in a text file and include the text file (select.txt) as a flag in your makefile, OPTS = -optTauSelectFile=select.txt.

SECTIONBEGIN_FILE_INCLUDE_LIST
foo1.f
foo2.f
foo3.c
END_FILE_INCLUDE_LIST

Sample TAU Profiling Report

Example 1

This example uses /usr/local/packages/TAU/tau-2.17.1/examples/taucompiler/c/ring.c.

%echo $TAU_MAKEFILE
/usr/local/packages/TAU/tau-2.17.1/ia64/lib/Makefile.tau-mpi-pdt

Compilation:

% tau.cc.sh -o ring ring.c

View TAU text report:

% pprof
NODE 0;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time   1Exclusive  2Inclusive      3#Call  4#Subrs   Inclusive     Name
         msec        total msec                       usec/call
---------------------------------------------------------------------------------------
100.0    0.172       1,010           1         5      1010725       int main(int, char **) C
 99.1    1,001       1,001           1         0      1001199       MPI_Finalize()
  0.8        7           7           1         0         7765       MPI_Init()
  0.2    0.111           1           1         8         1588       void func(int, int) C
  0.1        1           1           1         1         1224       MPI_Barrier()
  0.0    0.194       0.194           3         0           65       MPI_Recv()
  0.0    0.051       0.051           3         0           17       MPI_Send()
  0.0    0.027       0.027           1         0           27       MPI_Comm_free()
  0.0    0.008       0.008           1         0            8       MPI_Bcast()
  0.0    0.001       0.001           1         0            1       MPI_Comm_size()
  0.0        0           0           1         0            0       MPI_Comm_rank( )    

Example 2

This example uses /usr/local/packages/TAU/tau-2.17.1/examples/taucompiler/f90/ring.f90. Floating point operations counts are reported instead of time.

% echo $SHELL /usr/psc/shells/csh % setenv TAU_MAKEFILE /usr/local/packages/TAU/tau-2.17.1/ia64/lib/Makefile.tau-multiplecounters-mpi-papi-pdt % tau_f90.sh -o ring ring.f90

Define the following PAPI counters in the submission script:

setenv COUNTER1 GET_TIME_OF_DAY
setenv COUNTER2 PAPI_FP_OPS
% pprof -f MULTI__PAPI_FP_OPS/profile
NODE 0;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time  1Exclusive  2Inclusive         3#Call    4#Subrs   Count/Call    Name
        Counts      total counts
---------------------------------------------------------------------------------------
100.0     1568       2.126E+04          1           5        21262        MAIN
 68.2     1940        1.45E+04          1           4        14498        FUNC
 42.1     8588         8956             1           1         8956        MPI_Barrier()
 16.2     3436         3436             1           0         3436        MPI_Recv()
 14.2     3028         3028             1           0         3028        MPI_Init()
 10.1     2152         2152             1           0         2152        MPI_Finalize()
  1.7      368          368             1           0          368        MPI_Comm_free()
  0.4       92           92             1           0           92        MPI_Send()
  0.3       74           74             1           0           74        MPI_Bcast()
  0.0        8            8             1           0            8        MPI_Comm_rank()
  0.0        8            8             1           0            8        MPI_Comm_size()

1Exclusive time: the amount of time that passed while within that function, excluding the time spent in functions called from that function.

2Inclusive time: the amount of time that passed while within that function including the time spent in functions called from that function.

3Call: the number of calls made to the function.

4Subrs: the number of subroutines called from the function.