Document History

Date
Version
Author
Description of Revisions
16/05/13
0.0
Prush Palanichamy
Initial revision

Summary

This document shows steps to download, compile and run LMbench.

Theory


LMbench is a suite of simple, portable, ANSI/C microbenchmarks for UNIX/POSIX. In general, it measures two key features: latency and bandwidth. LMbench is intended to give system developers insight into basic costs of key operations.
The LMbench suite includes the following benchmarks
  • Bandwidth benchmarks
  • Latency benchmarks
    • Cached file read
    • Memory copy (bcopy)
    • Memory read
    • Memory write
    • Pipe
    • TCP
    • Context switching
    • Networking: connection establishment, pipe, TCP, UDP, and RPC
    • File system creates and deletes
    • Process creation
    • Signal handling
    • System call overhead
    • Memory read latency

Hardware Setup


Implementation Details
Design Type
PS
SW Type
Linux
Boards/Tools
ZC702
Xilinx Tools Version
IDE 14.5
Files Provided

Pre-built LMbench binaries and shell script to run the benchmark application. If you are using these files, you can skip the Download LMbench and Build LMbenchsteps and jump right to Running LMbench section.
Build System
Linux

1. Download LMbench


1) Go to http://sourceforge.net/projects/LMbench/ and download the tar file

2. Build LMbench


1) Untar the LMbench-3.0-a9.tgz
2) cd to LMbench directory
3) The command to make is “make CC=arm-xilinx-linux-gnueabi-gcc”.

Note: 4.7.2 version of the compiler “arm-xilinx-linux-gnueabi-gcc” is used for this tech tip.

4) Copy the folder lmbench (unzip the lmbench.zip file) to an SD card
5) Copy the shell scripts, basic.sh, bw.sh and lat.sh to SD card
6) Copy the 14.5 released kernel to SD card. You can find the released files at http://www.wiki.xilinx.com/Zynq+14.5+-+2013.1+Release
7) Power up the board in SD boot mode
a. login : root
b. Password : root
8) Mount the SD card using the command
  • mount /dev/mmcblk0p1

3. Run LMbench


1) Power up the zc702 board in SD boot mode
2) Mount sd card
>mount /dev/mmcvblk0p1 /mnt
3) cd to lmbench
4) Run the full.sh script
>./full.sh

4. Result


# ./full.sh
BANDWIDTH MEASUREMENTS
File read bandwidth with openclose
7.00 233.00
File read bandwidth with io only
7.00 230.94
Memory read
7.00 549.67
Mem write
7.00 382.05
Mem Read/write
7.00 334.61
Memory copy
7.00 238.26
Mem- file write
7.00 1640.50
Mem- File read
7.00 350.67
Mem- File cp
7.00 247.42
Mem- bzero
7.00 1281.58
Mem- bcopy
7.00 257.14
Mmap with openclose
7.00 261.18
Mmap only
7.00 350.35
pipe bandwidth
Pipe bandwidth: 165.48 MB/sec
Socket
AF_UNIX sock stream bandwidth: 289.56 MB/sec
LATENCY MEASUREMENTS
latency for ls command
lat_cmd: 2515.8333 microseconds
latency for connect
TCP/IP connection cost to localhost: 157.9634 microseconds
Latency- context switch

"size=128k ovr=318.29
2 234.13
latency DRAM
59.329605
Latency -fcntl
Fcntl lock latency: 9.8366 microseconds
Latency -FS
0k 53014 36518 28196
1k 29831 16649 37393
4k 28116 18324 23718
10k 17671 12002 17740
Latency - Mem rd
"stride=128
0.00049 7.655
0.00098 11.492
0.00195 8.542
0.00293 7.205
0.00391 7.527
0.00586 7.596
0.00781 7.430
0.01172 7.714
0.01562 7.591
0.02344 8.169
0.03125 18.561
0.04688 50.790
0.06250 47.329
0.09375 75.766
0.12500 59.946
0.18750 62.953
0.25000 63.243
0.37500 76.092
0.50000 136.610
0.75000 98.095
1.00000 91.529
Latency -Mmap
x: No such file or directory
Latency - Operations
integer bit: 2.24 nanoseconds
integer add: 2.24 nanoseconds
integer mul: 9.40 nanoseconds
integer div: 155.42 nanoseconds
integer mod: 52.76 nanoseconds
int64 bit: 1.85 nanoseconds
uint64 add: 2.77 nanoseconds
int64 mul: 20.30 nanoseconds
int64 div: 417.96 nanoseconds
int64 mod: 261.66 nanoseconds
float add: 16.71 nanoseconds
float mul: 18.28 nanoseconds
float div: 31.16 nanoseconds
double add: 10.02 nanoseconds
double mul: 16.73 nanoseconds
double div: 75.54 nanoseconds
float bogomflops: 47.64 nanoseconds
double bogomflops: 78.60 nanoseconds
Latency -Page fault
x: No such file or directory
Latency-Pipe
Pipe latency: 35.8538 microseconds
Latency - Process ops
Process fork+exit: 2075.0841 microseconds
Process fork+execve: 2818.9615 microseconds
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
sh: /tmp/hello: not found
Process fork+/bin/sh -c: 6247.0000 microseconds
Procedure call: 0.0203 microseconds
latency-semaphore
Semaphore latency: 5.7432 microseconds
Latency -TCP select
Select on 200 tcp fd's: 72.4160 microseconds
Latency-Signals
Signal handler installation: 2.3158 microseconds
Signal handler overhead: 6.2273 microseconds
mmap: Bad file descriptor
Latency - system calls
Simple fstat: 1.7319 microseconds
Simple stat: 9.9183 microseconds
Simple open/close: 20.6044 microseconds
Simple write: 1.6044 microseconds
Simple read: 0.8984 microseconds
Simple syscall: 0.6315 microseconds
Latency tcp/udp
TCP latency using localhost: 1.5221 microseconds
lat_udp client: recv failed: Connection refused
latency-sockets
AF_UNIX sock stream latency: 46.5033 microseconds
Latency sleep
usleep 100 microseconds: 227.3042 microseconds
nanosleep 100 microseconds: 214.0462 microseconds
select 100 microseconds: 199.9189 microseconds
itimer 100 microseconds: 170.3161 microseconds
Cache line size
32
lmdd testing
8.1920 MB in 0.0264 secs, 310.7975 MB/sec
960MB OK
960
CPU frequency
mhz: should take approximately 297 seconds
539 MHz, 1.8553 nanosec clock
Test sleep - 3 secs
Parallel memory ops
0.524288 5.12
integer bit parallelism: 2.47
integer add parallelism: 2.29
integer mul parallelism: 3.23
integer div parallelism: 1.83
integer mod parallelism: 1.05
int64 bit parallelism: 1.00
int64 add parallelism: 1.69
int64 mul parallelism: 1.00
int64 div parallelism: 1.14
int64 mod parallelism: 1.00
float add parallelism: 7.94
float mul parallelism: 3.05
float div parallelism: 1.62
double add parallelism: 5.02
double mul parallelism: 2.29
double div parallelism: 2.20
STREAM ops
STREAM copy latency: 19.86 nanoseconds
STREAM copy bandwidth: 805.64 MB/sec
STREAM scale latency: 33.26 nanoseconds
STREAM scale bandwidth: 481.07 MB/sec
STREAM add latency: 52.48 nanoseconds
STREAM add bandwidth: 457.31 MB/sec
STREAM triad latency: 33.53 nanoseconds
STREAM triad bandwidth: 715.67 MB/sec

5. Making sense of LMbench results


LMbench is a set of microbenchmarks. If you are using LMbench to derive performance number for a macro application like multimedia experience for example, you have to find out which one of these micro benchmarks are used most in that application. You can use Profiling to identify that. Profiling is a process whereby one analyses, for example, a real-world, macro-level workload such as video streaming, and determines what system-level, micro-level units of work (e.g. context switching) make up a significant portion of that higher-level workload. After profiling is done and the micro workloads that affect the system most are identified, you can run the micro benchmarks which are your key influencers.