Revision History

This wiki page complements the 2016.4 version of the Software Acceleration TRD. For other versions, refer to the Zynq UltraScale+ MPSoC Software Acceleration TRD overview page.

Change Log:
  • Fourth release of the TRD
  • This release includes,
  1. Design upgrade to 2016.4 SDx and 2016.3 Petalinux tool chain.

Introduction

This wiki page contains information on how to build various components of the Zynq UltraScale+ MPSoC Software Acceleration reference design (TRD) 2016.4 version. The page also has information on how to setup the hardware and software platforms and run the design on ZCU102 kit. The part used on ZCU102 board is xczu9eg-ffvb1156-1-i-es1.

About the TRD

The Software acceleration TRD is an embedded signal processing application designed to showcase various features and capabilities of the Zynq UltrScale+ MPSoC ZU9EG device for the embedded domain. The TRD consists of two elements: The Zynq UltraScale+ MPSoC Processing System (PS) and a signal processing application (FFT) implemented in Programmable Logic (PL). The MPSoC allows the user to implement a signal processing algorithm that performs FFT on samples (coming from TPG in APU or SYSMON through external channel) either as a software program running on the Zynq UltraScale+ MP SoC based PS or as a hardware accelerator inside the PL. The design has three accelerator cores generated using SDx for computing 4096, 16384 and 65536 point FFTs. The data transfers of the SDx accelerators is controlled by APU. There is one accelerator (LogiCore FFT IP from Vivado IP catalog) for 4096 point FFT controlled by RPU. The TRD demonstrates how the user can seamlessly switch between a software or a hardware implementation and evaluate the cost and benefit of each implementation. The TRD also demonstrates the value of offloading computation-intensive tasks onto PL, thereby freeing the CPU resources to be available for user-specific applications.
For detailed information on complete feature set, hardware and software architecture of the design, please refer to the TRD user guide here.

Download the TRD

The TRD archive (rdf0376-zcu102-swaccel-trd-2016-4.zip) can be downloaded from here.

Note :- The Current design doesn't support ES2 silicon. This TRD have been tested on Rev B/C/D of ZCU102 boards.

TRD Directory structure and package contents

The Software acceleration TRD package is released with the source code, hardware platform through Xilinx Vivado, SDK projects, and an SD card image that enables the user to run the demonstration and software application. It also includes the binaries necessary to configure and boot the ZCU102 board. Prior to running the steps mentioned in this wiki page, user has to download the TRD package and extract its contents to a directory referred to as ‘TRD_HOME' which is the home directory.

Directory_structure.png
The below table describes the content of each directory in detail.
Folder/file
Description


apu
Contains the software source files
petalinux
Contains the Petalinux project's configuration
Qt_gui
Contains GUI sources
zcu102_fft
SDx folder containg the hardware platform, pfm files and FFT accelerator C sources.
rpu

xsdk
Contains SDK project for building RPU firmware
sdcard
Contains ready to test binaries
BOOT.BIN
BIN file containing FSBL, PL bitstream, U-boot and ARM trusted firmware
image.ub
Kernel Image
autostart.sh
Script to launch the demo
bin
This directory contains the Qt GUI application.
README.txt
Contains design version history, steps to implement the design, Vivado and Petalinux versions to be used to build the design.
THIRD_PARTY_NOTICES.zip
Contains the Copyright text for third party libraries
IMPORTANT_NOTICE_CONCERNING_THIRD_PARTY-CONTENT.txt
Contains information about the third party licences


Pre-requisites
  • ZCU102 Evaluation Kit with Xilinx Vivado Design Suite, Device locked to xczu9eg-ffvb1156-1-i-es1.
  • A Linux development PC with following tools installed
  1. Xilinx Vivado 2016.4
  2. Xilinx SDK 2016.4
  3. Petalinux 2016.3
  4. Distributed version control system Git installed. For information, refer to the Xilinx Git wiki.
  5. GNU make utility version 3.81 or higher.

Known Issues


Running the demo

This section provides step by step instructions on bringing up the ZCU102 board for demonstration of the TRD and running different options from the Graphical User Interface (referred to as GUI).

The binaries required to run the design are in $TRD_HOME/sdcard folder. It also includes the binaries necessary to configure and boot the ZCU102 board.

Things to know before running the demo:
a) The SD-MMC card has to be formatted as FAT32 using a SD-MMC card reader. Copy the entire folder content from $TRD_HOME/sdcard onto the primary partition of the SD-MMC.

b) Petalinux console login details
User : root
Password : root

Hardware Setup Requirements

Requirements for theTRD demo setup

  • The ZCU102 Evaluation kit with the part xczu9eg-ffvb1156-1
  • AC power adapter (12 VDC)
  • Optional: An USB Type-A to USB Micro-B cable (for UART communication) and a Tera Term Pro (or similar) UART terminal program.
  • USB-UART drivers from Silicon Labs
  • USB Micro-B to female Adaptor with USB hub is needed for connecting a mouse.
  • USB mouse
  • 4K monitor with Display Port support
  • Certified Display Port cable (version 1.2); TRD tested with 6 feet long E342987, Cable matters
  • (Optional, required only for testing with external audio input): XA3 SYSMON Headphone Adapter card from Faster Technology
  • (Optional, required only for testing with external audio input) An audio source like MP3 player
  • (Optional, required only for testing with external audio input) An aux cable with 3.5mm male jack on both ends.
  • A SD-MMC flash card containing TRD binaries formatted with FAT32. The SD-MMC should have the required binaries in its primary partition. Copy the binaries from sdcard folder of the TRD zip file. The required binaries include :
    • BOOT.BIN
    • image.ub
    • autostart.sh
    • sw_accel_qt
    • bin/firmware/r5FFT.elf
    • fbdev.tar

Note: TRD supports Ultra HD (4K) and Full HD (1080p) resolutions. The binaries provided under sdcard folder have been tested with ViewSonic (4K), ASUS(4K), Acer (4K and Dell-P2414Hb (1080p) display monitors. However, the binaries should work well with any Display Port certified monitors supporting 4K/1080p resolution in its EDID database. Please make sure to use DP certified 1.2 version of the cable for connecting ZCU102 board to the monitor.

Board Setup

Steps for setting the board

Connect various cables to the ZCU102 board as shown in the below figure.
setup-1.jpg



2016-05-13 15.33.50.jpg

1. Connect a 4K monitor to the DP port on ZCU102 using DP 1.2 cable.
2. Connect an USB mouse to the Micro-B USB connector (Jumper J96 on ZCU102 board).
3. Optional: Connect an USB Micro-B cable into the micro USB port (J83) labeled USB UART on the ZCU102 board and the USB Type-A cable end into an open USB port on the host PC for UART communication.
4. Connect the power supply to the ZCU102 board. Do not switch the power ON.
5. Optional: Plug the XA3 Adapter card into the Sysmon Header on ZCU102 board (J3). Connect Jumpers J5 and J4 on XA3 card as shown in below figure.


IMG_20160518_143720.jpg

6. Optional: Connect the 3.5mm auxiliary cable to XA3 card and audio source. One end connects to audio source and the other end connects to 3.5mm female connector on XA3 card.
7. Insert a SD-MMC memory card, which contains the TRD binaries, into the SD receptacle on the ZCU102 board
8. Make sure the DIP switches (SW6) are set as shown in figure below, which allows the ZCU102 board to boot from the SD-MMC card.
IMG-20160511-WA0003.jpg
9. Optional: Open a serial communication terminal software like TeraTerm, and set up a new serial communicaiton as shown in below figure.
teraterm_2.png
Click on "New Connection" and select Interface 0 and click OK (as shown in below figure).
teraterm_1.png
Click on Setup -> Serial Port and make sure to setup as shown in below figure
teraterm_4.png
User can see the following on the serial terminal
teraterm_3.png
After linux boot is complete, you see the Petalinux login prompt, as shown in below figure
teraterm_5.png

Run Qt GUI application
A Linux application with Qt-based GUI is provided with the package included on the SD-MMC memory card. This application provides options to user to exercise different modes of the demonstration. User can select Test Pattern Generator (TPG) samples or External audio source (requires the XA3 adapter card, aux cable and audio source for testing).

User can select to perform FFT computation in APU (run as software code on the PS) or in PL (run in the FPGA fabric as a hardware IP core).

User can also apply various windowing techniques on input samples before performing FFT.

Powering on the Qt-based GUI application demo


  • Make sure the monitor is set for DP Ultra HD (4K) resolution.
  • Turn on power switch (J52)
Note: The Linux image and Qt based GUI application will be loaded from the SD-MMC memory card.
  • The Linux image will load and the frame buffer console is displayed on the monitor.
  • The Qt based GUI will load
  • When the GUI starts up, the demonstration starts with FFT being computed by software running in APU on samples coming from TPG in PL.

Running the Qt-based GUI application demo


Exercise different options by pressing the buttons available in the GUI to evaluate the different use cases mentioned below.

11.png


Test Start/Pause

Demonstration can be paused at any instant by clicking on Pause button, as shown in figure below.
IMG_20160511_115138.jpg

Input Source

There are two sources of data samples.
Use case
Input source
1
Test Pattern Generator (TPG in PL). This is the default option.
2
External audio input(through XA3 SYSMON Headphone Adapter card)
Note : To test the external audio (assuming that setup is made as per procedure mentioned above), play an audio from the MP3 player/Phone. The peak voltage of the audio source depends on the manufacturer. The voltage levels of the samples depend on the volume. If the output voltage of the audio signal goes beyond 1V, the waveform will be clipped. Adjust the volume on the audio source so that the voltage of the samples lies within 1V peak-to-peak.
12.png

FFT Computation Engine

For the two input sources mentioned in above table, user can select one of the following compute engines for FFT computation.
FFT Compute Engine
Description
APU (default)
FFT computation is done by software running on APU
NEON
FFT computation is done by software running on APU. Neon intrinsic APIs are used for FFT computation to make
sure that the instructions are executed on NEON.
APU controlled PL Accelerator
FFT computation is done by the FFT core in Programmable Logic(PL)
RPU as Co-processor
FFT computation is done by software running on RPU. APU is involved in moving samples from TPG in PL to PS DDR.
Samples from PS DDR are copied to OCM by APU software and that information is passed to RPU through OpenAMP channel.
RPU controlled PL Accelerator
FFT computation is done by PL FFT IP. RPU controls the AXI DMA transfers to/from PL FFT core from/to PS DDR.
APU is involved in moving samples from TPG in PL to PS DDR. Samples from PS DDR are copied to OCM by APU
software and that information is passed to RPU through OpenAMP channel. PL FFT core fetches samples from OCM
and computes FFT on the samples and writes samples back to OCM.
All
Runs FFT on all engines one at a time. This mode is useful for comparing computation times for various engines.


IMG_20160511_114619.jpg



FFT Length

FFT length determines the number of samples on which FFT computation is performed. User can run the following FFT sizes.
FFT Size
4096 (default)
16384
65536
IMG_20160804_143712.jpg


FFT Window
User can apply one of the window function on the input samples before FFT computation.
Window function
None (Default, No windowing)
Hann
Hamming
Blackman
Blackman Harris

IMG_20160511_114705.jpg

Frequency Zoom

User can select the following Frequency Zoom options
FFT Zoom option
Description
ZOOM
Selecting this option fixes the units on frequency axis in the Frequency domain plot to 512.
This enables users to closely observe the values on frequency axis. This is 5X zoom.
NONE (default)
This is the default option. None is No Zoom. Selecting this option will plot all points on frequency axis (Number of points equal to half of the FFT size)

IMG_20160511_114726.jpg

FFT Scale

User can select the different scales on Voltage/Amplitude axis. This option is important when using external audio source as input. The voltage of the samples is dependent on the volume of the audio signal. Depending on the amplitude of the audio samples, the scale can be selected. Available options are:
FFT Scale
1V (Default)
0.5V
0.25V
0.1V

IMG_20160511_114742.jpg



Sample Rate

The sampling rate of the SYSMON in PL can be changed on run time. Supported sampling rates are:
Sampling Rate
200 kSPS (default)
100 kSPS
50 kSPS

IMG_20160804_143912.jpg


Time and Frequency domain plots

The time domain plot plots the samples corresponding to data generated by either TPG or by external audio source. The number of points in the plot depends on the FFT size.
The frequency domain plot plots the power spectral density (not in logarithm scale). It is a function of voltage vs frequency bins. The value “Fp” on the extreme right corner of frequency domain plot depicts the frequency bin with highest energy. The number of frequency bins plotted is half of FFT size (half because of symmetry for real valued samples) when “NONE” is selected in Frequency Zoom control and 512 by default (ZOOM enabled).

FFT Computation time plot

The time taken for FFT computation by each engine is plotted on the “FFT computation plot”. The average computation times for 4096 point FFT are captured for reference in below table:
Computation Engine
~Average computation time (us)
APU
500
APU with Neon as Co-processor
350
APU controlled PL
120
RPU
1270*
RPU controlled PL
240*
  • RPU is running at 500 MHz and APU is running at 1.1G. Also, the OpenAMP communication latency is included which is approximately 100 μs.

CPU Utilization plot

The APU cluster (A53 cores) utilization is plotted in “CPU Utilization Plot”.

PS-PL Interface Performance plot

The bandwidth utilization of Full Power domain and Low power domain high performance ports is plotted by “PS-PL performance plot”. The write and read throughputs are plotted.

PL Die temperature

The PL Die temperature is read from the SYSMON and displayed on the GUI.

Block Diagram view

The top-level block diagram and the blocks involved in data path for each of the modes in Input source and FFT computation engines is displayed in the bottom right corner of the GUI.


Building the Software components


Building RPU firmware using XSDK


//**The instructions to build the RPU firmware are same as in 2016.1**//
For instructions to build the RPU firmware using XSDK, please refer to the section:
Building RPU firmware using XSDK in the link.

Build BitStream and FFT Shared Object using SDSoC


Setup SDx Working Environment

Assume the SDx 2016.4 is installed on at "/usr/sdx/"
$ source /usr/sdsoc/lin64/SDx/2016.4/settings.csh
This will set the SDx environment. Now launch the SDx tool by giving the command:
$ sdx
Import source code from package into the Workspace
1.png

Create a workspace. Provide a folder in the Workspace box as shown above and click OK.

Create a SDx Project. Click File --> New --> Project... will get the below wizard.

2.png

Provide a name to the new project. Below picture shows "fft" as the name of the new project and click Next.
3.png

Click "Add Custom Platform" to include the hardware platform to the project.
4.png

Browse to $TRD_HOME\apu\zcu102_fft and click OK. This will allow the user to select the platform provided in the package.
5.png

Select "zcu102_fft (custom)" and click Next
6.png

Check "Shared Library" and click Next
7.png

The next window in the wizard shows the Shared Library sample projects that are part of this package. The package contain the FFT shared object project.
Select FFT from the list.
8.png


Click on Finish. This completes the project creation. The next steps explain the build.

Build the Bitstream and FFT Shared Object


Select the build configuration to SDRelase, as shown in below picture.
Click on mouse Right button pointing to the fft project in Left pane.
9.png


Now, build the actual project as shown below.
This will take approx. 90 minutes, as it creates accelerators and create both shared object and bitstream.
10.png


At the end of the build, we see 2 output files:
  • shared object : <sdx workspace>/SDRelease/fft.so
  • bitstream : <sdx workspace>/SDRelease/fft.so.bit

We take these two files into Petalinux flow to build executables.
The Petalinux steps are explained in next section.

Build Linux and Boot images using Petalinux


Setup PetaLinux Working Environment


  • source petalinux's settings script from the petalinux installation path.

bash> source <path to PetaLinux installation>/settings.sh

Setup PetaLinux project for FFT


Below are the steps to build the Petalinux project
$ cd <TRD_HOME>/zcu102_fft/apu/petalinux
 
Apply hardware configuration
$ petalinux-config --get-hw-description=./hw-description --oldconfig
 
Build the Project
$ petalinux-build

Steps to build final application, along with FFT shared object and bitstream that are created with SDSoC above.
$ cd <TRD_HOME>/zcu102_fft/apu/petalinux
 
Copy the shared object from SDSoC workspace to Petalinux workspace
$ cp <sdsoc workspace>/SDRelease/fft.so components/libs/zynqmp_sdsoc_fft/modules/libswaccel.so
 
Build the FFT Library
$ petalinux-build -c rootfs/zynqmp_sdsoc_fft
 
Build the application
$ touch components/apps/fft_cmdline/fft_cmdline.c
$ petalinux-build -c rootfs/fft_cmdline
 
Build rootfs (push the newly build FFT library and application into rootfs)
$ petalinux-build -x package

steps to create BOOT.BIN
$ cd <TRD_HOME>/zcu102_fft/apu/petalinux/images/linux
 
Copy the bitstream from SDSoC workspace to Petalinux workspace
$ cp <sdsoc sdsoc workspace>/SDRelease/fft.so.bit .
 
Create BOOT.BIN
$ petalinux-package --force --boot --fsbl zynqmp_fsbl.elf --fpga fft.so.bit --uboot

Final images to test on board.
$ cd <TRD_HOME>/zcu102_fft/apu/petalinux/images/linux
 
Copy the below images onto SD card
BOOT.BIN
image.ub

Building QT GUI application

  • Verify petalinux environment is set.
$ echo $PETALINUX
This should return the path of petalinux installation.


**Setup Qt build environment**:
$ cd <TRD_HOME>/zcu102_fft/apu/Qt_gui
$ export SYSROOT=<TRD_HOME>/zcu102_fft/apu/petalinux/build/linux/rootfs/stage
 
**Assume, the QT libraries are installed at path: /opt/qt-5.7/**
$ export QT_INSTALL_PATH=/opt/qt-5.7
$ export PATH=/opt/qt-5.7/bin/:$PETALINUX/tools/yocto-sdk/sysroots/x86_64-petalinux-linux/usr/bin/qt5:$PATH $ ./qmake_set_env.sh
$ export QT_CONF_PATH=./qt.conf
 
**Build the Qt application**:
$ qmake sw_accel_qt.pro -r -spec devices/linux-zynqmp-g++/
$ make
 
The executable "sw_accel_qt" image is created.
  • This will build the application.
  • Copy the application onto SD card as shown in path : <TRD_HOME>/zcu102_fft/sdcard/bin

This completes the software build instructions.


User can now follow the above Board Setup steps to start the demo.

Related Links