I am updating an older linux driver that transfers data via DMA to userspace pages which are passed down from the application via get_user_pages()
.
我正在更新一个较老的linux驱动程序,它通过DMA向用户空间页面传输数据,这些页面通过get_user_pages()从应用程序中传递下来。
My hardware is a new x86 Xeon based board with 12GB of RAM.
我的硬件是一个新的x86 Xeon基板,12GB内存。
The driver gets data from a VME to PCIe FPGA, and is supposed to write it into the main memory. I do a dma_map_pages()
for each page, I check it with dma_mapping_error()
and write the returned physical DMA address into the buffer descriptors of the DMA controller. Then I kick off the DMA. (We also can see the transfer starting in the FPGA tracer).
驱动程序从VME获取数据到PCIe FPGA,并将其写入主存。我为每个页面做一个dma_map_pages(),我使用dma_mapping_error()检查它,并将返回的物理地址写入DMA控制器的缓冲区描述符中。然后我启动DMA。(我们还可以看到从FPGA示踪器开始的传输)。
However, when I get the DMA finish IRQ I see no data. For control, I have the same VME address space accessible via PIO mode and that works. I also tried writing values to page_address(page) of the userpages and the application can see these. All ok.
但是,当我获得DMA finish IRQ时,我没有看到任何数据。对于控件,我有相同的通过PIO模式访问的VME地址空间,这是可行的。我还尝试将值写入用户页的page_address(page),应用程序可以看到这些。所有的好。
Digging deeper into the matter I checked the usual documentation like DMA-API.txt, but I could not find any other approach, also not in other drivers.
深入研究这个问题,我查看了通常的文档,如DMA-API。txt,但是我找不到其他的方法,也没有其他的驱动。
My a kernel is a self compiled 4.4.59 64bit with all kinds of debugs (debug DMA-API etc..) set to yes.
我的a内核是自编译的4.4.4.59 64位,所有类型的调试(调试DMA-API等)设置为yes。
I also tried to dig through drivers/iommu/ to see debug possibilities here but just a few pr_debugs there.
我还尝试挖掘驱动程序/iommu/以查看这里的调试可能性,但这里只有几个pr_debugs。
The interesting thing: I have another driver, an ethernet driver, which supports a NIC connected to PCI. This one works without problems!
有趣的是:我有另一个驱动程序,一个以太网驱动程序,它支持连接到PCI的NIC。这个工作没有问题!
When dumping and comparing the retrieved DMA dma_addr_t
's I see this:
当转储和比较检索到的DMA dma_addr_t时,我看到:
The NIC driver allocates memory via dma_alloc_coherent()
for buffer descriptors etc., it's addresses are in the "lower 4 GB":
NIC驱动程序通过dma_alloc_coherence()来分配缓冲描述符等,其地址在“4 GB以下”:
[ 3127.800567] dma_alloc_coherent: memVirtDma = ffff88006eeab000, memPhysDma = 000000006eeab000
[ 3127.801041] dma_alloc_coherent: memVirtDma = ffff880035d9b000, memPhysDma = 0000000035d9b000
[ 3127.801373] dma_alloc_coherent: memVirtDma = ffff88006ecd4000, memPhysDma = 000000006ecd4000
The VME driver, dma_map_page'ing the user space pages are > 4GB, the DMA address looks different: 0xffffe010 (with an offset from the application).
用户空间页面的VME驱动程序dma_map_page'是> 4GB, DMA地址看起来不同:0xffffe010(带有来自应用程序的偏移量)。
pageAddr=ffff88026b4b1000 off=10 dmaAddr=00000000ffffe010 length=100
DMA_BIT_MASK(32)
is set in both drivers, our FPGA cores are 32bit wide.
两个驱动程序都设置了DMA_BIT_MASK(32),我们的FPGA内核宽32位。
Question: do I have to have special prerequisites in order for this DMA to work? I read that highmem memory can not be used for DMA, is this still so?
问:要让这个DMA工作,我必须有特殊的先决条件吗?我读到highmem内存不能用于DMA,是吗?
Part of dmesg:
dmesg命令的一部分:
[ 0.539839] debug: unmapping init [mem 0xffff880037576000-0xffff880037ab2fff]
[ 0.549502] DMA-API: preallocated 65536 debug entries
[ 0.549509] DMA-API: debugging enabled by kernel config
[ 0.549545] DMAR: Host address width 46
[ 0.549550] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
[ 0.549573] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[ 0.549580] DMAR: RMRR base: 0x0000007bc14000 end: 0x0000007bc23fff
[ 0.549585] DMAR: ATSR flags: 0x0
[ 0.549590] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0
[ 0.549779] DMAR: dmar0: Using Queued invalidation
[ 0.549784] DMAR: dmar0: Number of Domains supported <65536>
[ 0.549796] DMAR: Setting RMRR:
[ 0.549809] DMAR: Set context mapping for 00:14.0
[ 0.549812] DMAR: Setting identity map for device 0000:00:14.0 [0x7bc14000 - 0x7bc23fff]
[ 0.549820] DMAR: Mapping reserved region 7bc14000-7bc23fff
[ 0.549829] DMAR: Set context mapping for 00:1d.0
[ 0.549831] DMAR: Setting identity map for device 0000:00:1d.0 [0x7bc14000 - 0x7bc23fff]
[ 0.549838] DMAR: Mapping reserved region 7bc14000-7bc23fff
[ 0.549845] DMAR: Prepare 0-16MiB unity mapping for LPC
[ 0.549853] DMAR: Set context mapping for 00:1f.0
[ 0.549855] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 0.549861] DMAR: Mapping reserved region 0-ffffff
[ 0.549892] DMAR: Intel(R) Virtualization Technology for Directed I/O
...
[ 0.551725] iommu: Adding device 0000:00:00.0 to group 10
[ 0.551753] iommu: Adding device 0000:00:01.0 to group 11
[ 0.551780] iommu: Adding device 0000:00:01.1 to group 12
[ 0.551806] iommu: Adding device 0000:00:02.0 to group 13
[ 0.551833] iommu: Adding device 0000:00:02.2 to group 14
[ 0.551860] iommu: Adding device 0000:00:03.0 to group 15
[ 0.551886] iommu: Adding device 0000:00:03.2 to group 16
[ 0.551962] iommu: Adding device 0000:00:05.0 to group 17
[ 0.551995] iommu: Adding device 0000:00:05.1 to group 17
[ 0.552027] iommu: Adding device 0000:00:05.2 to group 17
[ 0.552059] iommu: Adding device 0000:00:05.4 to group 17
[ 0.552083] iommu: Adding device 0000:00:14.0 to group 18
[ 0.552134] iommu: Adding device 0000:00:16.0 to group 19
[ 0.552166] iommu: Adding device 0000:00:16.1 to group 19
[ 0.552191] iommu: Adding device 0000:00:19.0 to group 20
[ 0.552216] iommu: Adding device 0000:00:1d.0 to group 21
[ 0.552272] iommu: Adding device 0000:00:1f.0 to group 22
[ 0.552305] iommu: Adding device 0000:00:1f.3 to group 22
[ 0.552332] iommu: Adding device 0000:01:00.0 to group 23
[ 0.552360] iommu: Adding device 0000:03:00.0 to group 24
[ 0.552437] iommu: Adding device 0000:04:00.0 to group 25
[ 0.552473] iommu: Adding device 0000:04:00.1 to group 25
[ 0.552510] iommu: Adding device 0000:04:00.2 to group 25
[ 0.552546] iommu: Adding device 0000:04:00.3 to group 25
[ 0.552575] iommu: Adding device 0000:05:00.0 to group 26
[ 0.552605] iommu: Adding device 0000:05:00.1 to group 27
1 个解决方案
#1
2
for completeness here the answer, we found it. Totally different reason: PCIe protocol bug in the FPGA PCIe core...
为了完整起见,我们找到了答案。完全不同的原因:FPGA PCIe内核中的PCIe协议错误……
#1
2
for completeness here the answer, we found it. Totally different reason: PCIe protocol bug in the FPGA PCIe core...
为了完整起见,我们找到了答案。完全不同的原因:FPGA PCIe内核中的PCIe协议错误……