1 Introduction

This page describes methods to use BRAM from Linux with the focus being PetaLinux. The principles apply to the Linux kernel regardless of the distribution. Testing for the prototypes for this page was performed with PetaLinux 2017.2 and the 4.9 kernel on ARM64 (Zynq UltraScale+ MPSOC). Previous testing with the same principles has also been performed on ARM (Zynq 7K) successfully.

1.1.Terminology

1.1.1 Memory Attributes

In Linux the MMU of the CPU is setup with memory attributes to determine how the memory is accessed (cached, non-cached, device memory, etc...).

1.1.2 Sparse Memory

The DDR memory for MPSOC is not contiguous as it includes 2 memory ranges, 0 - 0x8000_0000, and 0x8_0000_0000 - 0x8_8000_0000, when using 4 GB on the ZCU102 board.

1.2 Methods to Access Hardware in Linux

The focus of this page is on user space access of the hardware through user space drivers. Kernel drivers should also be considered when the required skills are available.

1.2.1 The /dev/mem Device Driver

The /dev/mem device driver included in the kernel by default (for Xilinx kernel configurations) provides a method to access hardware from user space. This driver allows memory mapped hardware to be mapped into user space using the mmap() function call. There are many examples of using this driver in the open source community, but there are some nuances that are not obvious and not documented that well.

1.1.1.1 Cache Control

The open of the /dev/mem device allows optional flags. Use O_SYNC to cause the accesses to the hardware to non-cached. Without this flag the accesses to the hardware will be cached which can be chaotic and difficult to debug.

1.1.1.2 Accessing Registers

The most typical use of /dev/mem is to access device registers which is in an address range unknown to the Linux kernel (not in the memory node of the device tree). These accesses are performed as device memory or strongly ordered to ensure no side effects.

1.1.1.3 Accessing Memory

Another use of the /dev/mem device driver is to access a memory and in this case it is desirable to access the memory in a more efficient manner such as normal memory that is non-cached. The Linux system should be setup such that the memory is part of the kernels memory (as setup in the memory node of the device tree), is reserved such that the kernel does not use it, and is mapped into the kernel memory space by not using the "no-map" property in the device tree. The Linux reserved memory framework describes how to reserve memory in the device tree.

1.1.1.4 Other Important Details
  • The /dev/mem driver requires root privileges which may not be desired in all systems.
  • The O_SYNC flag in the open() of the /dev/mem driver would not be required in a cache coherent system as described at Cache Coherency.
  • Normal memory can be accessed unaligned without issues while device/strongly ordered will cause exceptions.

1.2.1 UIO Device Driver

The UIO device driver, uio_pdrv_genirq, in the Linux kernel is another method to access hardware in from user space. This driver works well with the device tree and allows memory mapped hardware to be mapped into user space using the mmap() function call. This method is the preferred approach for accessing registers rather than /dev/mem. Using the UIO device driver causes the memory attributes for the address range to be device / strongly ordered which is good for registers but not for a memory with regards to performance. There are other methods for using UIO which are more complex and not covered here.

2 Linux Kernel Code Details


The following code snippet from the kernel helps understand how physical memory is mapped when using the /dev/mem driver.

mmu.c.png

2.1 Paths of phys_mem_access_prot()

Each of the paths through the function are explored and described. The user can easily instrument this in the kernel to verify the intended operation of /dev/mem.

2.1.1 No Valid Page Frame Number (!pfn_valid(pfn))

In this case, no valid page frame number, meaning memory has not been setup in the page tables by the kernel, the memory attributes of the memory are are setup to be device or strongly ordered (non-cached). I believe it's really device memory but getting a clear answer for Linux is not easy. This is seen when accessing memory addresses outside the kernel memory (not RAM). This is the traditional case of using it for device registers which needs to be device or strongly ordered to prevent unwanted side effects. Low performance validates that device or strongly ordered is used.

2.1.2 A Valid Page Frame Number and O_SYNC Is Specified (file→f_flags & O_SYNC)

In this case this causes the memory to be altered to be write combined which is uncached buffered memory. This would typically be what you want for RAM such as a frame buffer or DMA memory.

2.1.3 A Valid Page Frame Number and O_SYNC Is Not Specified

In this case this causes the memory attributes to be unaltered such that it's whatever it was which could be and is likely cached. This can cause strange and unpredictable behavior unless cached memory is acceptable.

3 Accessing BRAM Using /dev/mem

3.1 Linux Device Tree

By default the device generation process will generate a node in the PL device tree (pl.dtsi) for the AXI BRAM Controller. Since there is not a driver for the BRAM this should not be an issue. This could change in the future such that disabling the driver in the status is done.

The device tree is altered to add the BRAM memory range to the memory node and to add a reserved memory node so that the kernel does not use the memory, but does map the memory into the kernel memory. Note that the "no-map" property should not be used in the reserved node. The following device tree snippet (to be added to system-user.dtsi in PetaLinux) illustrates the changes for adding a 32K BRAM at address 0xA000_0000.
/include/ "system-conf.dtsi"
/ {
   #address-cells = <2>;
   #size-cells = <2>;
   memory {
       device_type = "memory";
       reg = <0x0 0x0 0x0 0x80000000>, <0x0 0xA0000000 0x0 0x8000>, <0x00000008 0x00000000 0x0 0x80000000>;
   };
   reserved-memory {
       ranges;
       reserved {
           reg = <0x0 0xa0000000 0x0 0x8000>;
       };
   };
};
&axi_bram_ctrl_0 {
    status = "disabled";
};

3.2 U-boot Memory

U-boot is configured by the device tree to some extent and by the u-boot configuration. The default u-boot configuration for the Xilinx build has 2 memory banks supported (for MPSOC). Both memory banks are required to support the sparse 4 GB of DDR for the ZCU102 board. Another bank is added to the u-boot configuration to support the existing 4 GB of DDR and a new BRAM. This step is required because u-boot must see the full amount of memory that is desired in Linux. U-boot alters the amount of memory in the memory node of a loaded device tree before passing it on to Linux such that Linux only recognizes the amount of memory that u-boot is configured for.

The platform-top.h file in the <project>/project-spec/meta-user/recipes-bsp/u-boot/files directory of the Petalinux project is altered by adding the following line.

#define CONFIG_NR_DRAM_BANKS 3

Before u-boot has been altered, the bd command in u-boot reflects two memory banks as shown below.
   ZynqMP> bd
 
  arch_number = 0x00000000
  boot_params = 0x00000000
  DRAM bank = 0x00000000
  -> start = 0x00000000
  -> size = 0x80000000
  DRAM bank = 0x00000001
  -> start = 0x800000000
  -> size = 0x80000000

After u-boot has been altered, the board information should show 3 DRAM banks with the correct addresses and sizes.

   ZynqMP> bd
 
  arch_number = 0x00000000
  boot_params = 0x00000000
  DRAM bank = 0x00000000
  -> start = 0x00000000
  -> size = 0x80000000
  DRAM bank = 0x00000001
  -> start = 0xA0000000
  -> size = 0x00008000
  DRAM bank = 0x00000002
  -> start = 0x800000000
  -> size = 0x80000000

3.3 User Space Application

A user space application is used to access the BRAM using the /dev/mem device driver. The application is no different than any other /dev/mem application generally. The following code snippet is only a prototype to illustrate the principles and is not intended to be a robust properly coded application.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
 
// Make the SDK console work in the debugger
#define printf(...) \
 fprintf(stdout, __VA_ARGS__); \
 fflush(stdout);
 
typedef long long int u64;
 
int main()
{
   unsigned int bram_size = 0x8000;
   off_t bram_pbase = 0xA0000000; // physical base address
   u64 *bram64_vptr;
   int fd;
 
   // Map the BRAM physical address into user space getting a virtual address for it
   if ((fd = open("/dev/mem", O_RDWR | O_SYNC)) != -1) {
 
      bram64_vptr = (u64 *)mmap(NULL, bram_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, bram_pbase);
 
      // Write to the memory that was mapped, use devmem from the command line of Linux to verify it worked
      // it could be read back here also
 
      bram64_vptr[0] = 0xDEADBEEFFACEB00C;
      close(fd);
   }
 }
 

Before running the application, use devmem to verify the memory contents.

root@xilinx-zcu102-2017_2:~# devmem 0xa0000000 64
0x0000000000000000
 

After the application runs, use devmem again to verify the write was successful.

root@xilinx-zcu102-2017_2:~# devmem 0xa0000000 64
0xDEADBEEFFACEB00C
 

3.4 Kernel Page Tables Debug

It is typically challenging in Linux to verify that memory attributes are setup as expected. Kernel memory is easier to verify than user space memory. The ARM64 kernel allows the page tables to be dumped from user space and this can be a bit helpful. These methods are still being explored.

3.4.1 Configuring the Kernel

The kernel is not configured to allow the page tables to be dumped by default. The following screen shot shows how to enable the dumping of the page tables.

pagetabledump.JPG

3.4.2 Dumping the Page Tables

In this example of adding a 32K BRAM it is easy to see the page table entry as 32K is not a common memory size.

Dump the page tables when the BRAM is not added to the Linux system to get a baseline.

root@xilinx-zcu102-2017_2:/sys/kernel/debug# cat kernel_page_tables | grep 32K
0xffffff8008fd8000-0xffffff8008fe0000 32K PTE RW NX SHD AF UXN DEVICE/nGnRE
0xffffff8008ff1000-0xffffff8008ff9000 32K PTE RW NX SHD AF UXN MEM/NORMAL

Dump the page tables after adding the BRAM into the system as kernel memory and see the difference to verify the memory is mapped as normal memory rather than device / strongly ordered. The memory will show that it is cached as it is when the kernel maps it. User space causes the mapping to be changed to non-cached and that is not visible with this method.
root@xilinx-zcu102-2017_2:/sys/kernel/debug# cat kernel_page_tables | grep 32K
0xffffff8008fd8000-0xffffff8008fe0000 32K PTE RW NX SHD AF UXN DEVICE/nGnRE
0xffffff8008ff3000-0xffffff8008ffb000 32K PTE RW NX SHD AF UXN MEM/NORMAL
0xffffffc0a0000000-0xffffffc0a0008000 32K PTE RW NX SHD AF UXN MEM/NORMAL

4 Vivado Prototype System

The Vivado system is very simple with nothing but an AXI BRAM in the PL as illustrated below.
bram-system-vivado.JPG


address-editor.JPG