1. Introduction

The SMMU (System MMU) allows AXI masters, besides the CPUs, to have a virtualized view of the system memory. The SMMU is analogous to the MMU of the A53 . The SMMU requires masters such as DMAs to use virtual addresses rather than physical addresses. The SMMU is transparent to device drivers in Linux as the DMA framework knows how to handle it. The SMMU is equivalent to an IOMMU used in other system architectures.

This page is not intended to be a tutorial about the SMMU. The reader should refer to other documents (such as the MPSoC Technical Reference Manual and Software Developers Guide) for a more detailed understanding of MPSoC together with ARM documents such as the ARM System Memory Management Unit Architecture Specification and the ARM Cortex-A Series Programmers Guide for a more complete understanding of the SMMU operation. The primary focus of this page at this time is using the SMMU with devices in the programmable logic (PL).

Note that use of the SMMU with the PL is an advanced topic with some limitations that should be clearly understood. This page attempts to clarify the limitations to allow users to do prototyping with the SMMU. The Xilinx PetaLinux 2017.4 release does not enable the SMMU in the device tree by default. This page walks through the process of building a complete prototype system to support user space DMA using the SMMU.

2. User Space DMA

A primary motive for the use of the SMMU is to allow a user space DMA implementation. User space DMA for the context of this page is defined as allocation of memory and control of a DMA device from user space in Linux. User space DMA in the past has had several challenges that kept it from being an easy solution. Most users tend to write kernel space drivers rather than a user space solution,

2.1 User Space Cache Control

The 1st challenge is cache control from user space. For systems which are only software coherent, not hardware coherent, the software must do cache maintenance and that is challenging from user space as a kernel space driver is typically required.

2.2 User Space Memory Allocation

The 2nd challenge is memory allocation from user space. Systems without an SMMU use physical addresses with the DMA device such that contiguous memory is typically used or scatter gather with scattered memory pages. A kernel space driver is typically required to allocate contiguous memory.

2.3 MPSOC Solution

MPSOC provides solutions to both challenges described above. The SMMU allows user space memory allocation to be used for DMA. The hardware coherency of MPSOC allows cached memory to used for DMA from user space and removes the need for cache control. The SMMU also provides an additional level of protection in that DMAs cannot access memory other than the memory that has been setup in the SMMU.

3. Simple AXI DMA Hardware System

The simplest prototype system for this purpose is designed with the following characteristics:
  • Does not include any AXI interconnect for the DMA data interface
  • Connects the DMA data interface to HPC0 of the MPSOC
  • Uses a single AXI data interface on the DMA
  • Is hardware coherent such that all AXI transactions from the DMA are coherent as described in MPSOC Cache Coherency
  • Uses only simple DMA rather than scatter gather
  • Enables high addresses in Vivado as described in PL Masters
  • Loops the DMA transmit steram back to the DMA receive stream for easy testing.

3.1 Hardware Details

The following diagrams illustrate such a system with the AXI DMA connected to the HPC0 port of MPSOC.

simple-axi-dma-smmu.JPG

3.2 AXI DMA Configuration

axi-dma-64-bit.JPG

3.3 AXI Interconnect Limitations

Vivado adds AXI interconnects (or SmartConnect) by default when connecting DMA IP to the MPSOC. The interconnect contains crossbar switches which are configured to allow only physical addresses of slaves to be passed through the interconnect. This will not allow virtual addresses to be used for DMA as transactions get blocked at the crossbar switch. A user can manually insert an AXI crossbar switch and then manually configure the address ranges that pass through to open it up for virtual addresses. The following illustration shows an additional address range added to a crossbar switch.


addresseditor3.JPG


4. Device Tree

4.1 SMMU Enable

The device tree for the SMMU is disabled by default in PetaLinux 2017.4 so that it must be enabled. The following device tree snippet illustrates enabling the SMMU.
&smmu {
   status = "okay";
};

4.2 AXI DMA Stream IDs

The device tree for the AXI DMA must be setup with the "iommu" property to cause the SMMU to function with the device. Stream IDs are required for each master interface of the DMA in the iommu property. The number of stream IDs required for the device are challenging to understand as they don't always directly correlate to physical AXI master interfaces. The AXI DMA configured with the most master interfaces (scatter gather with seperate data interfaces) uses three stream IDs such that this is the easiest starting point. Stream ID details are explained on this page: Xen and PL Masters.

The following device tree snippet illustrates adding the iommu property to the AXI DMA in the device tree with 3 stream IDs for HPC0 port of MPSOC.
&axi_dma_0 {
    iommus = <&smmu 0x200>, <&smmu 0x201>, <&smmu 0x202>;
};

5. VFIO

The VFIO framework in Linux is designed to use the SMMU to allow DMA from user space. VFIO is similar to the UIO framework in that it provides a method to map a device into user space memory allowing register access of the device. VFIO also controls the SMMU such that DMA has a virtualized view of memory similar to the CPUs. Unlike UIO, VFIO has very few examples, minimal documentation and most of the examples are PCI related.

5.1 VFIO Device Driver

There are multiple VFIO drivers in the kernel. The VFIO platform driver is best suited for this solution with a device tree based Linux architecture.

5.1.1 Kernel Configuration

The following illustration shows the kernel configuration to build the driver as a kernel module.

vfio-kernel-configuration.png

5.1.2 Using the VFIO Platform Driver

The AXI DMA driver is typically built into the kernel statically by default. The driver must be unbound from the device to allow the VFIO platform driver to be used for the device. The VFIO platform driver by default requires a reset function which is not provided in the system at this time. A module parameter is used when inserting the VFIO platform driver to indicate the reset function is not required.

The following commands illustrate the steps required to load the VFIO platform driver, unbind an existing DMA driver and start the VFIO platform driver for the AXI DMA hardware which was connected to the HPC0 port of MPSOC.

modprobe vfio_platform reset_required=0
echo a0000000.dma > /sys/bus/platform/drivers/xilinx-vdma/unbind
echo vfio-platform > /sys/bus/platform/devices/a0000000.dma/driver_override
echo a0000000.dma > /sys/bus/platform/drivers_probe
The driver does not probe successfully without the reset_required = 0 module parameter and the error is illustrated below.
vfio: no reset function found for device a0000000.dma
vfio-platform: probe of a0000000.dma failed with error -2

5.2 VFIO User Space Application

The following attached Linux application source code is a working example of VFIO with the previously described AXI DMA system. The application assumes the system is coherent such that user space allocated memory is cached but does not require any s/w cache maintenance. The application is executed on the target as the last step.

6 Debug Methods

6.1 Challenges

The devmem utility is typically used for debugging Linux issues. The devmem utility might appear to no longer be useful once virtual memory addresses are being used by the DMA. The tracing functions described below allow the user to see the physical addresses such the devmem is still useful.

6.2 Tracing SMMU Events in Linux

The kernel must be configured for kernel function tracing. Enable function tracing for the IOMMU functions as shown below.
cd /sys/kernel/debug/tracing
echo 1 > events/iommu/map/enable
echo 1 > events/iommu/unmap/enable
echo 1 > events/iommu/io_page_fault/enable
After SMMU operations have occurred, dump the trace buffer as illustrated below.
cat trace
...
modprobe-2274 [001] .... 153.598323: map: IOMMU: iova=0x0000ffffffff8000 paddr=0x00000008797a2000 size=4096
modprobe-2274 [001] .... 153.598329: map: IOMMU: iova=0x0000ffffffff9000 paddr=0x00000008794c8000 size=4096
dma16chan0-dma1-2276 [003] .n.. 153.611087: unmap: IOMMU: iova=0x0000ffffffff0000 size=16384 unmapped_size=16384

6.3 DMA Hangs, Kernel Crashes, Kernel Hangs

If the SMMU is not setup correctly with the correct stream IDs in the device tree then you may see the application hang during the DMA transfer, a kernel crash or a kernel hang. When the stream IDs are not setup correctly a virtual address used by the DMA is not translated to a correct physical address. Using a System ILA in the hardware system may be required to watch the AXI transactions and verify the addresses being used. There is limited visibility once the AXI transaction enters the MPSOC where the SMMU performs the address translation from virtual to physical.

7 References

VFIO
Virtual Open Systems VFIO Prototyping