Document History

Date
Version
Author
Description of Revisions
23 October 2014
1
Faster Technology
Initial posting - updated to 2014.3




Date
Author
Comment







Description/Summary



Virtually all electronic systems today contain some signal processing as part of their fundamental capabilities. The Zynq-7000 AP SoC is ideally suited to handling many of these functions in a single chip solution as will be demonstrated in this Tech Tip.



In Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3" a library of complex filtering functions was obtained and built. This Tech Tip describes a signal processing application that uses the NE10 library built in that prior Tech Tip. The application documented in this Tech TIp performs a complex FFT on a sampled input signal executing on either the ARM processor alone or on the NEON SIMD engine. The application is constructed so it can be used stand alone from the command line or integrated into a larger program. The subsequent Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 7 - Building and Running a Qt Based GUI Tech Tip 2014.3" will demonstrate how to integrate this Tech Tip into a larger graphical user system.



In addition to demonstrating a speed up of 1.25 to 1.85 when using the NEON SIMD engine vs the ARM processor alone, this Tech Tip will show how to use a standard library of functions in an application and modification of that library for a specific need. All of these are facilitated by the standard implementation of the ARM processor system (PS) in the Zynq-7000 AP AoC, opening up the vast ecosystem of available software to the Xilinx development community. We will also see the power of the debug capabilities in the Xilinx SDK.

Describe what you are doing or what is the issue and how this example addresses it.

Implementation

Implementation Details
Design Type
PS Only
SW Type
Linux
CPUs
1 CPU - standard ZC702 Frequency
PS Features
ARM processor and NEON SIMD engine; standard peripherals used by PetaLinux operating system
PL Cores
None
Boards/Tools
ZC702 with standard peripherals used for Linux TRD operation and control
Xilinx Tools Version
Vivado / SDK 2014.3
Other Details
Standard ZC702 setup for console terminal and Ethernet required


Files Provided
**fft-zynq.c**
FFT Application C source code
Ne10TestBuild2014dt3.zip
Tested starting point workspace for SDK

Block Diagram


FFT_App_only.jpg

Step by Step Instructions



A library of signal processing functions was built in the Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3" and tested in the Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 3 Accelerating Software - Running ARM Library Tests Tech Tip 2014.3". This Tech Tip is built starting with that compiled and tested library.



The application performs a Fast Fourier Transform (FFT) on sampled data from an input waveform. The input data is in a table in the processor memory space and the spectrum output of the FFT is also in a table in processor memory space. A register interface is used for controlling various parameters of the FFT process to facilitate integration of the FFT application with other software. An example of this integration is described in a subsequent Tech Tip "Zynq-7000 AP SoC <name> Tech Tip. The register values are also available in the command line version controlled by the following options:

-v --version Print program version
-h --help Print help message
-s --size Size of FFT
-t --type Type of FFT, real or complex
-i --input Input data type, int or float for 16 bit integers, or 32 bit floats
-o --output Output data type
-r --source Physical address of input data
-d --dest Physical address of output results
-a --arch Processor architecture, 0 = ARM, 1 = NEON, or 2 = CORE
-p --pipeline This is part of a continuous processing pipeline, and not a one time FFT
-g --debug Generate an impulse test pattern at location N-1 of the input table
-l --loop execution of the FFT for N iterations - for timing purposes only

Input and output buffer sizes are calculated from the FFT size, whether the FFT is real or complex and fixed or floating point.

The -a option in the application selects between using just the ARM processor for the computations or using the NEON SIMD engine. This enables the user to see the difference in execution time between these two software approaches. In a subsequent Tech Tip a hardware unit will be added in Programmable Logic to show the performance difference between the two software approaches and execution of the FFT in hardware. The Conclusions section at the end of this Tech Tip contains a simple table of execution time differences between the two software only approaches.

For this Tech Tip, the input source has additional code used in subsequent Tech Tips:

- code to enable using in a continuous sampling and processing system - pipeline mode
- an option to lock the FFT to a specific CPU in the PS (used in a demonstration system)
- code to use PL hardware to perform the FFT

Modification of the Ne10 Library


The Ne10 Library previously built and tested uses the following process to calculate the Complex FFT:

Input real and imaginary data:

x(n) = xa + j * ya
x(n+N/4 ) = xb + j * yb
x(n+N/2 ) = xc + j * yc
x(n+3N 4) = xd + j * yd
where N is length of FFT

Output real and imaginary data:

X(4r) = xa'+ j * ya'
X(4r+1) = xb'+ j * yb'
X(4r+2) = xc'+ j * yc'
X(4r+3) = xd'+ j * yd'

Twiddle factors for radix-4 FFT:

Wn = co1 + j * (- si1)
W2n = co2 + j * (- si2)
W3n = co3 + j * (- si3

Output from Radix-4 CFFT Results in Digit reversal order. Interchange middle two branches of every butterfly results in Bit reversed output.

Butterfly CFFT equations:

xa' = xa + xb + xc + xd
ya' = ya + yb + yc + yd
xc' = (xa+yb-xc-yd)* co1 + (ya-xb-yc+xd)* (si1)
yc' = (ya-xb-yc+xd)* co1 - (xa+yb-xc-yd)* (si1)
xb' = (xa-xb+xc-xd)* co2 + (ya-yb+yc-yd)* (si2)
yb' = (ya-yb+yc-yd)* co2 - (xa-xb+xc-xd)* (si2)
xd' = (xa-yb-xc+yd)* co3 + (ya+xb-yc-xd)* (si3)
yd' = (ya+xb-yc-xd)* co3 - (xa-yb-xc+yd)* (si3)

The "twiddle factors" in the Ne10 library are hard coded for the FFT sizes 16, 64, 256 and 1024. To expand the size to support 4096 size FFT, code was included in the application to calculate the twiddle factors and then bypass the hard coded tables in the original library. Line 117 in the fft-zynq.c source file is the start of this additional routine. The specifics of the calculations as well as the algorithm used for the CFFT in the Ne10 library are beyond the scope of this Tech Tip. A wealth of information is available on the web such as http://en.wikipedia.org/wiki/Fast_Fourier_transform.

Building the application



Download the C source file for this Tech Tip - "fft-zynq.c" and save it to a convenient location on your computer system. Note where it is saved. In our case, we saved it to G:\Projects.

This Tech Tip uses the workspace that resulted from building and testing the Ne10 library in the Tech Tips "Zynq-7000 AP SoC Spectrum Analyzer part 2 - Accelerating Software - Building ARM NEON Library Tech Tip 2014.3" and "Zynq-7000 AP SoC Spectrum Analyzer part 3 - Accelerating Software - Running ARM Library Tests Tech Tip 2014.3". If that workspace is available, skip to the instructions below to start SDK.

If the workspace is not available, or if there is a question if it was completed properly, the referenced file "Ne10TestBuild.zip" can be used to create a known working starting point for this Tech Tip.

Download the Zip file from the Ne10TestBuild2014dt3.zip link.

Create an empty directory where you will be implementing this Tech Tip. To be consistent with the balance of these step by step instructions, the directory could be:

G:\Projects\ZC702_Ne10

However, these steps to import a known workspace will work with any new folder of the user's choosing.

CAUTION:
Many users have unusual problems with SDK when using different directory structures and names. If you encounter any odd behaviors with SDK, it is advised to use the suggested directory structure and names.

Start SDK

Start -> All Programs -> Xilinx Design Tools -> Vivado 2014.3 -> Xilinx SDK 2014.3

In the Workspace Launcher, browse to and select G:\Projects\ZC702_Ne10 or the empty directory that you have created.

WorkspaceLauncher.PNG

Click OK to continue.

If you are presented with a welcome tab, close it by clicking on the X on the tab.

Welcome.png

SDK will start with a blank Project Explorer pane

Select File -> Import or right click on the white space in the Project Explorer pane and select Import.

The Import dialogue box will appear. Expand the General line and select Existing Projects into Workspace

ImportWorkspace1.PNG

Click Next

Click the Select archive file button. Then click Browse to navigate to the saved workspace file that you want to import and click Open. In our case this is Ne10TestBuild2014dt3.zip.

ImportWorkspace2.PNG


Click Finish

SDK will build the workspace automatically. Because SDK is already started and the workspace is in place, you can skip the following instructions to start SDK and go directly to after SDK is running with the workspace in place.

Old Workspace in Place

If you have the workspace from the previous Tech Tip work and have not already started SDK, do so by:

Start -> All Programs -> Xilinx Design Tools -> Vivado 2014.3 -> Xilinx SDK 2014.3

In the Workspace Launcher, browse to and select the existing workspace as G:\Projects\ZC702_Ne10

WorkspaceLauncher.PNG


Click OK to continue

When presented with the Welcome screen, click the X in the Welcome tab to close that screen

Welcome.png


The workspace should appear as:

FFTstart.PNG


The FFT application is a new project within SDK so we need to create it.

Select File -> New -> Project

TIP:

A new C project can also be created by right clicking in the white space of the Project Explorer pane and then selecting New -> C Project.

When the New Project dialogue box appears, expand the C/C++ line and select C Project

NewCProject.PNG

Click Next

The Project creation dialogue box opens. In the Project Name: box type fft-zynq to match the name of the source file.

Make sure the check box on the default location is checked.

In the Project type: box, select the "Xilinx ARM Linux Executable" type by clicking on it.

LinuxExecutableProject.PNG

Click Finish.

We now see the new project in the Project Explorer.

FFTZynqProject1.PNG

We can now import the source code for the FFT application into SDK.

Right click on fft-zynq in the Project Explorer column and select Import.

The import window appears.

ImportFiles.PNG

Be sure File System is highlighted (you may need to expand the General group), and click Next.

In the Import File System window, browse to the location where the C source file fft-zynq.c has been saved and select the file; check the box next to the source file.

C_file_to_import.PNG

Click Finish to add the source code to the project. The options can be left un-checked.

This source file is set up to support both software execution of the fft application as well as execution in the PL hardware fabric. Use of the hardware fft is controlled by a #define statement. Expand the fft-zynq project and double click on the fft-zynq.c file. This will open it in an editor window in SDK.

fftSrcEdit.PNG

Change the number 1 in line 28 to zero so that line reads:
#define USE_FFT_CORE 0

Save the modified source file by clicking File -> Save from the top menu line.

We now need to add the proper include files or paths to them that will be used in the build process.

Right click on fft-zynq in the Project Explorer column of SDK and then select "C/C++ Build Settings".

Expand the C/C++ General line in the left column and select the Paths and Symbols item.

P_and_S_before.PNG

Be sure the Configuration: option is set to [ All Configurations ] unless you have a specific reason to have the debug compiled differently than the release.

With the "Includes" tab selected, click Add (in the right column).

Add_path_blank.PNG

Check both the option boxes for "Add to all languages" and "Is a workspace path" and click the workspace button on the right side of the dialog box.

In the Folder Selection box, expand the Ne10-master item and select common.

Add_path_selected.PNG

Click OK to return to the Add directory path dialog box.

Add_path_filled.PNG

Click OK to add this path.

Repeat the process to add the following paths:
- Ne10-master / inc
- Ne10-master / modules / math
- Ne10-master / test / include

When completed you should have the following include directories set.

Paths_set.PNG

We now need to add some library paths for tools to find all of the required components.

In the left column of the Properties for fft-zynq window, expand C/C++ Build options, and select Settings. In the left portion of the right side of the window, under ARM Linux gcc linker, select Libraries.

At the top of the Libraries (-l) area, click on the add icon (looks like a sheet of paper with a plus sign on it).

Library_path_properties_blank.PNG

In the text entry box, type Ne10 and click OK. Then do the same steps for Linux libraries "m" and "rt".

In the Library search path pane, click the add icon (the paper with the plus sign as before) to get a pop up window:

Click Workspace and then select Ne10/Release.

ReleaseLibraryPath.PNG

Click OK.

You should now have the Properties for fft-zynq completed as follows:

LinkerReleasePath.PNG

Click Apply, then click OK.

We can now build the project.

To assure the fastest execution times, we want to use the Release default build settings. To enable these,

Right click on fft-zynq in the Project Explorer pane

Select Build Configurations > Set Active > Release

With the build set to use the Release options, we can now build the project.

In the Project Explorer pane, right click on the fft-zynq project and select Build Project. If Console is selected in the bottom middle pane, the progress will be displayed. The result should be a completed build although there will be some warnings.

fftProjectBuilt.PNG

In the Project Explorer pane, expand the fft-zynq project label and then expand the Binaries line under it. The fft-zynq.elf file that resulted from the build will be shown. The same binary file will be shown in the Release folder under the fft-zynq project as well. This assures us that not only did it build, but that it was built with the Release compiler options in place.

Expected Results



Testing the Application




With the application built, we can run it on the ZC702 and test that it operates as expected. Because the application is intended to be used in a larger system, testing at this point will be somewhat limited. See the subsequent Tech Tip describing integration into a graphics framework for demonstration purposes.



As noted in the Tech Tip "Zynq-7000 AP SoC Spectrum Analyzer part 3 - Accelerating Software - Running ARM Library Tests Tech Tip 2014.3" there are multiple ways to load the file into the ZC702 and execute it. For the balance of this Tech Tip, Remote System Explorer (RSE) will be used to control the ZC702 and execute various tests to demonstrate that the fft-zynq program is operating as expected.



We will access the ZC702 over an Ethernet connection from the PC where SDK is running. The ZC702 has a default IP address of 192.168.0.10 so be sure your computer can reach that sub-net.

CAUTION!!

The default IP address of the ZC702 is different for PetaLinux than for the OSL Linux used in prior versions of this series of Tech Tips. The default IP address is now 192.168.0.10 so be sure your computer can reach that sub-net.

If you are unable to directly reach the .0 subnet from your computer, it is possible to change the IP address of the ZC702 after PetaLinux has booted. To make this change, do the following:
- Connect the console serial over USB port to your PC with the supplied mini-usb adapter and appropriate cable
- Start TeraTerm or similar terminal emulator
- Boot the ZC702 as described below
- Once PetaLinux has booted, log in using the username root and password root
- Use the ifconfig command to change the IP address using - "ifconfig eth0 192.168.1.65" where the IP address is one that you can reach from your PC


Connect your ZC702 to your computer or network with an Ethernet cable.


The ZC702 must be set to boot from the SD-MMC card that has been updated with the 2014.2 TRD as described in the TRD Technical Article Zynq Base TRD 2014.2. As a caution before proceeding, be sure that the ZC702 will properly run the 2014.2 TRD.

Set the boot select switches as shown below, then power on your ZC702.


SD card switch settings.jpg


With the ZC702 running, we will set up a Remote System Explorer connection to the ZC702 board to allow us to download and run our program. Right click in the Project Explorer window, select New -> Other. In the pop up window, expand Remote System Explorer, highlight Connection.



NOTE:


The balance of these instructions assume that the IP address of the ZC702 has been changed to 192.168.1.65. If you have set the IP address differently, use that IP address.


With the ZC702 running, we will set up a Remote System Explorer connection to the ZC702 board to allow us to download and run our program. Right click in the Project Explorer window, select New -> Other. In the pop up window, expand Remote System Explorer, highlight Connection.

RSE Connection wizard.PNG

Click Next

In the next window, select SSH Only, and click Next

SSH connect.PNG

Use the IP address of the ZC702 (192.168.1.65) as the Host name and Connection name. Fill in the description field if you wish - it is not required.

SSH_host_name.PNG

Click Finish.

NOTE:
Although the connection has been established, there is nothing showing. To view the connection, a new perspective must be started, replacing the current perspective. To do this,

Click Window -> Open Perspective -> Other

OpenPerspective.PNG

Then select Remote System Explorer and click OK.
The Remote System Explorer connection will be displayed.

RemoteSystemConnection.PNG

Close this perspective by clicking

Window -> Close Perspective (As an alternative, click on the C/C++ tab in the upper right corner of the overall window. This returns to the build perspective with the RSE perspective still available.)

With the connection established and the standard Project Explorer perspective displayed, we can now download and run the test program directly to the ZC702.

In the Project Explorer, right click on fft-zynq, then select Run As -> Run Configurations.

A new window will open to create, manage and run configurations.

RunConfigurations1.PNG

Create a new Remote ARM Linux Application by double clicking on "Remote ARM Linux Application" in the left column. (This can also be done by following the instructions in the main window for using the "New" button.)

Select the IP address of the ZC702 in the pull down Connection menu.

RunConfigurations2.PNG

Then click on the Browse button next to the Remote Absolute File Path for C/C++ Application field .

Expand the Root directory. Under the root directory select the tmp directory. If prompted for the User ID and Password, use root for both.

PasswordPrompt.PNG

tmpSelect.PNG

Click OK

Append /fft-zynq.elf to /tmp in the Remote Absolute File Path for C/C++ Applications field.

RunConfigFileSet.PNG

Click Apply and then Run.

The fft-zynq.elf binary executable file will be downloaded to the ZC702 and executed. The console window will show that it has run successfully.

FirstRunNoOptions.PNG

To further exercise the fft-zynq program using the command line arguments while still using RSE there are at least two ways to proceed; adding command line arguments and using a remote terminal perspective.

Adding Command line arguments:

As just previously done, right click on fft-zynq in the Project Explorer pane, scroll down to Run As and in the expanded list click on Run Configurations...

The Run Configurations dialog box will appear with all of the information for running fft-zynq already in place. Double check that it matches and then click on the Arguments tab at the top of the right pane of the dialog box.

The fft-zynq program includes a simple test pattern generator that is invoked with the -g option. This option inserts a single impulse in the input table. Because the input table contains a real and imaginary component for each sample, odd N values will set the real portion of the complex input value while even N values will set the imaginary portion. For example, the option -g 1 will produce a single impulse in the real value for sample 0 (zero MHz frequency), effectively representing a DC value. The FFT is then executed showing the same value for the first 16 output samples confirming this effectively DC value. The time required to process the FFT, excluding overhead for setup, etc. is displayed after the last value of the output table is displayed.

Click inside the Program arguments box and type -g 1 as the first test to run.

RunArguments.PNG

Then click Run.

The program will run and the Console window will show the results. To see the full output listing it is convenient to expand the console to occupy the full SDK window. To do this, click the Maximize icon on the upper right corner of the window pane border just above the console display.

g1Execution.PNG

Here we see a display of the first 16 samples of the input data table used by the FFT calculation, the first 16 entries in the output table and the time required for calculation of the FFT.

The default is to have the FFT run entirely on the ARM processor. To force execution on the NEON SIMD extension, use the -a 1 option.

If you Maximized the console display window, minimize the console display by clicking on the Restore icon on the upper right corner of the window pane border. Then repeat the process of right clicking on fft-zynq in the Project Explorer pane, selecting Run As -> Run Configurations.... and then selecting the Arguments tab of the Run Configurations dialog box.

Click in the Program arguments: box and add -a 1 after the existing -g 1 argument, being sure to leave a space between the arguments. Then click Run.

The console now shows the same FFT calculation but executed on the NEON SIMD extension. Note that the execution time has been reduced from 1305 us (your mileage may vary slightly...) to 1050 us. Differences in the reported times for your operation, or between runs with the same command line arguments are due to the inaccuracies in the Linux based execution time routine being used, Linux interruptions that may happen during execution and other factors beyond the scope of this Tech Tip.

g1a1Execution.PNG

This process of adding command line arguments can be used to further explore the results of different sizes of FFT, etc. Note that this program does not currently support the hardware FFT option (-a 2 command line argument).

Remote Terminal Perspective:


An alternate way to explore running the fft-zynq program with different command line arguments is through a remote terminal perspective. This enables you to enter commands directly into the Linux system running on the ZC702 as opposed to having SDK controlling the execution as is the case with the previous method.

At the top of the SDK window, click on Window, scroll to Open Perspective and then select Other... from the list.

Click on Remote System Explorer and then click OK.

SDK replaces the Project Explorer pane with the Remote System pane and shows the Remote System details pane.

In the Remote System pane, expand the 192.168.1.65 entry, right click Ssh Terminals and click Launch Terminal. A new terminal will be displayed with a Linux prompt from the ZC702.

RemoteTerminalPrompt.PNG

For best viewing the FFT results, Maximize the Terminal window and then enter the desired commands.

The terminal window starts in the root directory. To run the fft-zynq program, recall that it was downloaded to the tmp directory. To run it, change to the tmp directory (cd /tmp) and then run it directly using ./fft-zynq.elf with the desired options. For example:

./fft-zynq.elf -g 5 -a 1

g5a1Console.PNG

NOTE:
The previous runs with -g 1 forced the real component of sample 0 to be 1. This is a DC value and the displayed output table results all have a real component of 1, indicating that the input is DC. In this instance with -g 5, the real component of sample 2 is set to 1 which is an impulse function having a range of different harmonic frequencies. This is seen clearly in the table display above as values in all of the first 16 entries in the output table.

Defaults and other options


The defaults that are used by fft-zynq are the following:
- FFT size is 4096 samples
- FFT type is Complex
- Input type is assumed to be Integer 16 (the -g option uses FLOAT)
- Output type is Floating Point 32
- Source table address, output table address and the control register area are in OCM of the PS
- ARM only is the assumed processing method
- The processing is a one time (non-pipelined)

The FFT size can be set to any power of 4 beginning at 16; 16, 64, 256, 1024 and 4096.

Conclusions


Using the -g option and various FFT sizes with execution on the ARM processor alone and then with the NEON SIMD engine provides a measure of the speed up possible using the NEON SIMD engine. The table below shows average execution times for 5 runs and the relative speed up achieved. Note that there is some variability in the execution times reported by Linux so user results will likely vary from these.

Size / execution unit
ARM Processor
NEON SIMD engine
Average Speed up
64
48.2
26.0
1.85
256
92.4
73.6
1.25
1024
221.0
158.4
1.39
4096
954.4
696.8
1.37

Execution time for the 4096 FFT is disproportionately longer than the increase in FFT size would typically indicate. While the number of raw calculations increases linearly with each increase in FFT size, the move from 1024 to 4096 introduces another variable; the size of L2 cache and the impact of cache misses. Based on 4096 samples and 32 bit complex numbers, the input data table alone would fill L2 cache. While the output table is smaller, the combination exceeds L2 cache and will cause some number of cache misses. This will increase the overall execution time more so than the linear move from 1024 to 4096 would indicate.

The ease of integration of a standard library of functions targeting the ARM processor and the SIMD engine can be seen. With multiple methods to control the ZC702, the ease of development and debug with the Xilinx SDK are demonstrated. These all contribute to faster time to system completion and time to market for complex compute intensive applications targeting the Zynbq-7000 AP SoC.

Saving the workspace


For ease of completing subsequent Tech Tips that use this completed build, it is wise to save the workspace so it can be restored later as a known starting point. Because only the /sw portion of the base TRD was modified, only that portion needs to be saved. If you choose to do this,

Select File -> Export or right click in the Project Explorer pane white space and select Export

In the Export dialogue box, expand the General line and select Archive file.

Click Next

The Export Archive dialogue will appear.

Click the Browse button and navigate to where you want to save the workspace and then create an appropriate file in which to save the workspace. In our case we are saving this as fftApp2014dt3.zip

Be sure the "Save in zip format" box is selected unless you are on a Linux system in which case you might select the tar format.

Click Select All to export all of the items in the workspace.

Then click Finish.

The workspace will be saved in the specified archive file for later use.