Skip to content

System Device Tree Meeting Notes 2022

Nathalie Chan King Choy edited this page Mar 25, 2022 · 8 revisions

Table of Contents

2022-03-25

Attended

  • Sonal Santan, Bill Mills, Soren Soe, Rob Herring, Mark Dapoz, Valentin Monnot, Lizhi Hou, Bruce Ashfield, Stefano Stabellini, Loic Pallardy, Arnaud Pouliquen, Christopher, Tanmay Shah, Max Zhen, Nathalie Chan King Choy

Agenda

  • Discovery of PCIe apertures based on Device Tree overlays on ARM and x86. It is another example of using Device Tree to describe "more" compared to what it is commonly used for. In this case, it is used to describe PCIe apertures and other details of a PCIe device at runtime.
  • (Didn't have time for) Conclude on security topic: How to express resources that have both secure & non secure address that are different
    • Bill had Qs, but think we resolved most of the Qs over email
    • Stefano has example from Grant, which may address Bill's point

Decisions/Conclusions

  • There are 3 pieces that need to be worked on separately
    • How do you describe the PCI device? There are PCI bindings that existed for 25 years for DT. That part is a solved problem, just have to define what the device node looks like.
    • How you describe everything downstream of the PCI device. Do-able if the device has a node in DT (the first problem). Probably addressed with overlays if you want to have same overlay applied to different nodes & translated to different addresses. Address translation for PCI is bit more tricky b/c BARs are dynamic. Assigned addresses property would contain BAR addresses. Some challenges to apply different overlays to different base nodes.
    • ACPI: How do you do all this in a system that's either not DT or the base node is not described in DT b/c it is discoverable & dynamic. This is not just a problem for PCI. e.g. FTDI USB-serial chip w/ I2C, SPI, GPIO hanging off & want to plug in 10 to your system. This issue has been on the radar for some years & it's not solved. Not a hard problem to solve, but someone needs to look at it & come up with a proposal.
  • Solution you come up with needs to work first in a DT-based system. Then, we can discuss what it looks like in a non-DT system.
  • Easier to start w/ Arm-based platform w/ good DT support for PCI. There are examples of that. This is stepping stone to X86 w/ DT enabled.
  • Can test on QEMU

Notes

  • Recording (expires in 180 days, download/watch before then if you need to catch up)
  • Link to slides
  • Sonal:
    • Xilinx Run Time (XRT)
      • Lizhi trying to upstream the kernel drivers for Alveo
      • Have to add some infrastructure pieces to enable the use cases for Alveo
      • What is the right architectures to get support for Alveo in mainline Linux kernel
  • Lizhi from XRT team
    • Have been working with FPGA Linux community for a while
    • Got a lot of suggestions for overall architecture
    • Alveo HW
      • Alveo card is PCIe EP (not a bridge, no PCIe device underneath) & contains multiple HW apertures. These independent apertures are exposed on PCIe device BARs & each aperture has its own driver.
      • Each can be programmed with different FPGA images (shells), which can have different architectures w/ different # of apertures & apertures at different locations.
      • Different cards can be plugged into 1 server
      • FPGA image could be dynamically programmed back & forth between shells, so FPGA image must provide meta-data. Currently using flattened DT format.
    • Bill: Is aperture 1:1 per FPGA dynamic reconfig slice, or just a memory region?
    • Lizhi: Actually, HW has 2 types:
      • Basic shell, which is image saved in flash, which gets programmed to FPGA. User can re-flash another shell & cold reboot.
      • Multiple partition design: FPGA image in host file system. During driver boot, driver will program image to FPGA. Dynamically switched.
      • Also have user region on card & user can download FPGA image to that part of the card & this may have apertures.
    • Sonal: PCIe BAR layout: # of BARs is fixed, comes from the thin shell from the flash. But, what appears on the BAR is dynamic. Xilinx tools have liberty to place the IP anywhere in the BAR & then generate the flattened DT.
    • Rob: So A & B in slide 2 are same IP in both cards?
    • Sonal: Yes
    • Bill: So we can look at A, B, C as discrete interfaces
  • Rob: There are 3 pieces that need to be worked on separately
    • How do you describe the PCI device? There are PCI bindings that existed for 25 years for DT. That part is a solved problem, just have to define what the device node looks like.
    • How you describe everything downstream of the PCI device. Do-able if the device has a node in DT (the first problem). Probably addressed with overlays if you want to have same overlay applied to different nodes & translated to different addresses. Address translation for PCI is bit more tricky b/c BARs are dynamic. Assigned addresses property would contain BAR addresses. Some challenges to apply different overlays to different base nodes.
    • ACPI: How do you do all this in a system that's either not DT or the base node is not described in DT b/c it is discoverable & dynamic. This is not just a problem for PCI. e.g. FTDI USB-serial chip w/ I2C, SPI, GPIO hanging off & want to plug in 10 to your system. This issue has been on the radar for some years & it's not solved. Not a hard problem to solve, but someone needs to look at it & come up with a proposal.
  • Sonal: PCI DT bindings - since the BAR addresses are not fixed in PCI class of devices (unless root port) - how to intersect that with non-DT system?
    • Rob: For non-DT system, you'd have to tell Linux to create the device nodes & hierarchy. You could start with DT system where you're not describing discoverable devices in DT. For ACPI systems, you'd probably have to manufacture root port nodes in a skeleton DT & then once those are there, you can apply your overlays to those.
  • Sonal: ACPI system w/ skeleton DT - full hierarchy from RP to device?
    • Rob: Probably easiest to create the full hierarchy. Keep in mind, PCI DT spec was written originally for open firmware, not flattened DT. OF firmware went to look what PCI devices were there & populated DT based on that. So you got what was discovered, not the other way around like with the flattened DT. Assigned addresses came out of that - what config of system is.
  • Sonal: How to describe downstream devices w/ overlays?
    • Rob: Normally w/ flattened DT, you are describing device & function. The reg property in PCI devices is the device & function. Especially w/ PCIe, you have bunch of PCI-PCI bridges at each level. Good example of hierarchy with HiKey board: PCI root port with PCIe switch on the board, GPIO lines on downstream of PCIe switch.
  • Lizhi: What is benefit to create device node for all the PCIe devices?
    • Rob: Once that is in place, the kernel already supports associating PCI devices with device nodes. Would work same way if you are on DT system if you have it described already vs. if you are on ACPI system where you have to generate it.
  • Lizhi: PCI is self-discovered & it can create the struct device by enumeration, then why to create DT node for it - other functions & applications need the DT support?
    • Rob: Yes, think it will depend on what you need to describe in DT.
    • Stefano: You're going to need that PCI DT node, otherwise you have nothing to apply the overlay to.
  • Lizhi: Proposal: To create node for the apertures, rather than all the DT node for all the devices. Here, just need 1 node to apply the overlay.
    • Rob: Solution you come up with needs to work first in a DT-based system. Then, we can discuss what it looks like in a non-DT system.
  • Lizhi: Eventually, we want real device nodes (struct device) for all the apertures for all the different cards in the system. Then drivers can bind to the individual devices.
    • Requirements:
      • Need to discover apertures on PCIe BAR via metadata
      • Need to set I/O address for aperture device & eventually translate to host I/O address.
      • Driver binding: aperture node in fdt w/ compatible property
    • Think can leverage from Open Firmware infrastructure
    • Challenges we are facing (Rob mentioned them before)
      • X86_64 don't use OF
      • Don't have node to overlay on & need this node to act as bus node (not pci bus)
      • Minor: Overlay API doesn't support dynamic creation of target node
    • Proposed solution (see slides with detailed bullets)
      • If we already have a DT, there will be of_root & other device nodes already
      • Would introduce new function to of_pci
      • Can create dynamic DT node under of_root
      • Overlay will put all the DT node in the fdt in the newly created pcie-ep-bus DT node
      • If there is non-DT system, then create empty node
      • API that driver can call explicitly to create pci-ep-bus@addr
      • Pcie-ep-bus node for dynamically pluggable PCIe device, can have >1 card in system, in any slot, which can't be statically defined in base tree
      • Important: Creating ranges property based on PCIe BAR. PCIe BAR base address is being assigned & can generate ranges property to translate underneath device nodes into CPU I/O address.
      • If the first 2 parts of solution are addressed, then OF infrastructure can create the platform_device nodes we need for the apertures
  • Rob: Think this needs to be separated into different problems. Take out the dynamic part of it first and solve that if everything is static. Then work from there to do it dynamically. Think DT would look like host bridge in base DT w/ RP bridge. If that's directly connected to your device & your device node is under there. That will be standard PCI device node binding, which describes device & function & what bridge it's under. From there, looks like you need a simple_bus node for each BAR.
    • Lizhi: Correct
    • Rob: Since the BARs are not populated statically, that would need to be dynamic. Driver or PCI infrastructure would want to populate assigned addreses. If that' sin place & ranges properties are in place & correct, the address translation should all just work. e.g. ISA bus under PCI bus works already. ISA is basically a platform bus type. Then driver just has to call of_platform_populate & it will find the simple bus & create devices for all the devices on the simple bus. Drivers can do it for their child nodes.
    • Lizhi: There could be a simple bus under the PCI device instead of directly under the of_root?
    • Rob: Right, or might be a different name from simple bus. Simple bus means there's nothing to program to access the bus.
  • Max: Simple bus created by driver?
    • Rob: No, simple bus & devices would be in your overlay. Driver would apply the overlay to the PCI device node.
  • Lizhi: Think ranges at least has to be dynamic & can't be part of fdt
    • Rob: Perhaps.
    • Max: When PCIe driver attached, then we know the range of each BAR
    • Lizhi: Can be part of PCIe enumeration.
    • Rob: Think you can statically set assigned addresses & will set your BARs to what assigned addresses said. For translation, do the other way around, assign the BARs then populate assigned addresses. Can't recall what does/doesn't work off top of head, but there is code in kernel that has been there a long time. PPC dynamically populates PCI devices w/ DT & removes them.
  • Lizhi: For generating DT node for all the PCI devices, has this topic been discussed before w/ PCIe owner, or is it new topic?
    • Rob: New topic. Existing topic is if you need to describe stuff in DT, then you populate your DT w/ PCI hierarchy.
    • Lizhi: If we want that feature, how to convince PCI owner that this will be helpful to real use case.
    • Rob: Think will not be a problem - am a PCI reviewer & manager is a maintainer. Was recently thinking of adding something like this, so that ppl can make sure their PCIe hierarchy is correct. Was during process of getting HiKey's DT description correct to match the actual hierarchy (ppl often forget to put in the RP node). You could do something in userspace - look at sysfs PCI hierarchy & create DT nodes based on it.
  • Sonal: What Lizhi has proposed is very localized change. Sounds like you're proposing a full infrastructure. That's a functionality for all PCIe devices that would come up in standard Linux if enabled in global config.
    • Rob: Could be a specific driver that says to do it, instead of PCIe infrastructure doing it automatically
  • Sonal: Assigned address is infrastructure to do the reverse where BAR addresses have been assigned & need to create DT based on PCI device self-discovery that has happened?
    • Rob: Historically, it's what the firmware did for BAR assignment, or could be what Linux did for BAR assignment.
  • Sonal: Once we have infrastructure, then PCIe device driver can request Linux to build til my node, then rest of Lizhi's proposal should work on top of that?
    • Rob: Believe so, but those are 2 problems you can work on separately. Don't try to solve it all in 1 patch series.
  • Sonal: Not sure if we have an Arm64 system?
    • Stefano: ZU+ systems
    • Rob: Have reviewed Xilinx bindings for Arm platforms
  • Stefano: There are plenty of Arm servers. Which have DT support?
    • Rob: Lots of Arm boards have PCI now, but usually M2 or x1 connector at most. Rock chip Rock Pro x1
    • Bill: ZCU102 has x4 or could use adapter, pcie x1 to x16 extension cable
    • Bill: Sequencer developer box use EDK and U-Boot, so think they do ACPI & DT
    • Stefano: This is to test the description & discovery
    • Rob: Reviewed Versal CPM5
  • Lizhi: IBM Power?
    • Rob: That's actual open firmware systems generally. Power host bridge support is its own implementation, not the common one. It uses some of the common code, so may not matter. PPC has its own history & issues.
    • Lizhi: That may skip the entire part to create this DT node, the main feature Rob suggested.
  • Stefano: We have QEMU team at Xilinx who might be able to write an emulator for PCIe config space of Alveo card to be able to test the discovery. May be enough to get you going. You're not trying to do DMA or bitstream programming. Then you could do driver probing & test if it works.
  • Lizhi: 1st step is just to create DT node & don't need deice?
    • Rob: They are separate threads of work, you can tackle in any order.
  • Sonal: If we cannot get the HW, then we may have to try on X86 and make the change required. X86 currently has no DT & DT infrastructure is not compiled in. We can come up with that patch series first.
    • Rob: It can be enabled.
    • Sonal: We can start w/ PCIe DT changes first & test on X86 w DT infrastructure enabled.
    • Rob: And you can test on QEMU. Arm virt machine model supports PCI out of the box. You can add models of PCIe switches & bridges and create any # of nodes in the hierarchy
    • Bill: If they start on x86, don't they need DT to describe the whole x86 system
    • Rob: Not the whole system, but that is a separate issue to solve & not PCIe-specific: How to create a root DT on non-DT systems
  • Bill: Easier to start w/ Arm-based platform w/ good DT support for PCI. There are examples of that.
    • Sonal: Challenge: All the Alveo customers we know of are on x86.
    • Bill: This is just stepping stone.
  • Stefano: Today was not most traditional System DT topic, but relevant: adding DT to describe things that are not described before. Next time we can go back to traditional discussion & will follow up w/ Bill's question.

2022-02-08

Attended

  • Tammy Leino
  • Nathalie CKC
  • Marti Bolivar
  • Stefano Stabellini
  • Bruce Ashfield
  • Mark Dapoz
  • Loic Pallardy
  • Alexandre Torgue
  • Tomas Evensen
  • Bill Mills
  • Tanmay Shah
  • Sergei Korneichuk

Action items

  • Stefano: To raise Q w/ Rob in email
  • Stefano: Could write System DT part of spec just to add secure-address-map not tied to secure-bus. Secure-address-map could work for range specified through simple bus. Had always thought of the 3 features together, but this can be split out.
  • Stefano will write out proposal
    • Secure-address-map
    • What can be done simpler for simpler cases?
  • Stefano: Will continue conv w/ Rob about ranges & reg & what can be done to avoid duplication
  • Nathalie & Stefano: Set up another call w/ Rob.

Notes

  • Recording link (download recording before it expires!)
    • Passcode: 23F1qb?#
  • Stefano's slides in this attachment to mailing list
  • Scenario: Looking at secure bit is in the transaction
  • Marti has an example of such peripheral
  • How to represent the secure & non-secure sets of addresses?
  • What is the impact on other things such as the bus or cpu cluster?
    • Bus has range property that tells us how addresses of child nodes are translated to parent address space
    • If device has 2 addresses, how will ranges property work?
  • Example of secure vs. Non-secure addresses
    • Ranges is a very important property
    • Address-map: What is address map of a given CPU cluster
      • Sometimes there are multiple CPU clusters in an SoC & mapping of each cluster is a little different
      • Allows us to express what is mapped in a CPU cluster & what is not
    • Reg, ranges, address-map are all impacted if >1 address for the device (for secure & non-secure)
  • Previously discussed if we should introduce secure-reg or extend reg.
    • Example: extending reg
      • Has non-secure address & size + secure address & size
      • How to distinguish? 1 extra cell in each tuple that tells the execution mode.
      • How do we know if there is 1 more cell?
        • Need a different compatible string & bus type: e.g. compatible = "secure-bus"
        • #address-cells with multiple cells w/ address type (secure or non-secure)
    • Bill: Looks like very general implementation that could describe lots of interesting & complex situations. Concern: The simple cases are much more constrained than this. This could be error prone for the simple cases.
      • Simple case: Secure & non-secure address is same. Just secure bit that dictates response.
      • Each peripheral will declare if they are secure, non, both.
      • It looks like Arm says for Cortex-M you can't use same address for secure & non. So, entire bus ranges get duplicated.
      • If you could distinguish the simple cases, you could make it easier for a human to create.
    • Stefano: If only responding to secure, then it's different "address"
    • Bill M: Semantics
    • Stefano: This is a discussion where we could use Rob b/c he suggested this scheme. Maybe:
      • Could put all devices w/ same secure & non-secure addresses under simple bus
      • Could put devices where they are different w/ secure-bus
  • Stefano: Putting 2 addresses in reg isn't sufficient to solve the problem. Need ranges.
    • Ranges: 1st cell is new that tells us if the mapping is for secure or non-secure addresses.
    • Bill: Suggestion: Simple secure bus, cortex-M style secure bus, and this one that can do anything (the sledgehammer that can do everything for complex cases). It's a lot to get wrong for the simple cases.
    • Stefano: Maybe could handle in ranges so that reg could be same for simple case. Would have to ask Rob.
    • Bill: Does it have to be ranges? Could it be secure-ranges?
    • Stefano: Could be, or, ranges-address-cells?
    • Bill: Xilinx has good Cortex-A examples. Zephyr Cortex-M example w/ bank of peripherals w/ 1 address range & that appears at non-secure address & secure address () may help you see the simple case.
  • Next problem to solve: Address map
    • Showed all 3 to see what they could look like
      • Traditional just as a reference: Useful for special address for Cortex-R
      • Secure-address-map w/ property to describe secure mappings
      • OR extending address-map with a secure/non-secure execution mode cell
      • Think both options could work. Extending can be more cumbersome. Secure lets us have separate for simple.
    • Would it be invalid if we set address-cells = <0x1>?
    • Mark:
      • Doesn't address-cells affect both addresses in that mapping?
      • If 64-bit, then address-cells would be 0x3, but then that doesn't work.
    • Stefano: Don’t think address-cells can be lower than 2 to prevent ambiguity
      • Execution type
      • 1 or 2 for address
      • Full example would be w/ amba simple bus and a secure bus w/ devices w/ different addresses
    • Mark: Confusing. We're overloading the meaning of address-cells & part of it is defined above to mean something else.
    • Bill: In traditional DT it means 1 thing & System DT another
    • Stefano: Sounds like secure-address-map is the way to go
    • Bill: Sounds good. Since that's the solution there, would suggest ranges & secure-ranges and reg & secure-reg and make it all consistent. Then the simple case becomes simple & can assume if the secure one is missing..
    • Stefano: To raise Q w/ Rob in email
    • Stefano: "Secure" only applies to secure execution mode. But only secure will have __
    • Bill: Could be different for non-Arm architectures. Better to be explicit in the extension.
  • Stefano: What about execution domains?
    • Domains are unaffected b/c from execution mode, can tell if access is secure or non-secure
  • That's the full example
  • Marti: Will have to take some time to digest the slides
    • You were saying a fully generalized example would have a secure-bus and simple bus.
    • Nordic case: We don't have simple buses in SoC's we’re interested in, but have indirect buses and secure buses because 2 CPU core clusters both of which are running Zephyr, where they have un-equal access to the peripherals.
    • So, using address-map & secure-address-map has made it a bit easier to do the partitioning in prototypes w/ Bruce. +1 to that direction at cluster level.
  • Bruce: Have been prototyping Lops with Marti. Will depend what we settle on for System DT.
    • If see secure address map, know it's a secure cluster
    • Can check domain too
  • Loic: Same feedback as Bill. We have info in too many locations. Would like to see the register translations defined at 1 location. Domain, mode as secure or non-secure - how it could be linked to the CPU cluster address map?
    • Stefano: Difficult answer. Yes, there is duplication. Want to highlight duplication was used to make example more obvious. Ranges could be just w/ semicolon if want to show 1:1 map. Think it would work for secure-bus too. Your point still stands for the rest. Would need secure address-map if secure. For domain, need both address map & execution mode.
    • Loic: Using execution mode from domain to use the correct address map to apply?
    • Stefano: Yes.
    • Loic: e.g. 32-bit product requires some re-mapping, so will have to create several bases to fit with this b/c you are adding 1 offset between secure & non-secure.
    • Stefano: System DT parent address space is more conceptual than real. Each CPU cluster will have its own address map. Parent address space is everything else - can be thought of as global mapping even if no one uses it.
    • Loic: Why do you need both address level at peripheral level if translation done at top by Lopper?
    • Stefano: Good question.
      • Can't solve entire problem w/ address map b/c another non-cpu device (i.e. PCIe DMA master doesn't have an address-map) can do DMA to timer, which needs to use the global translation.
      • However, may be artificial to need addresses at both ranges & reg
      • Don't think we need repetition between range & reg.
    • Loic: For CPU, will use CPU view & Lopper will replace right physical address.
    • Stefano: Don’t have real example. Imagine could have co-processor on PCIe card
    • Loic: Other processor will have dedicated view through PCIe card, but you still need address map of this processor. If you just use address map, you have less impact on DT & simplifying the adoption.
    • Stefano: Could write System DT part of spec just to add secure-address-map not tied to secure-bus. Secure-address-map could work for range specified through simple bus. Had always thought of the 3 features together, but this can be split out.
    • Loic: MCU with cortex-M33 w/ TF-M & Zephyr. Would like DT with TF-M. Would be good to have 1 definition for peripheral, giving the right address on Zephyr side & TF-M side
    • Stefano: Can do. For rest, will try to sync up w/ Rob & bring the feedback to him. See Bill's point on secure-range & secure0reg that's easier to read.
    • Loic: Secure status doesn't make sense & should remove it. Have discussed w/ Rob about that. Was pushed by Arm, but not widely/correctly used. Need to define secure b/c if you need another status w/ another mode, would like to avoid duplication of different fields in peripheral node.
  • Bill: What are we trying to express? In domains, we are in realm of policy (firewall, MMU, hypervisor). Outside of domains, are we expressing just HW or is some policy in here?
    • Stefano: Supposed to express the HW. Is device responding to address or another address, based on secure bit set. Is there a different address mapping between secure/non?
  • Loic: Ranges at bus level w/ secure & non-secure mapping. If have several CPUs, they may not have exactly the same mapping b/c they can't see the same device & 32-bit mappings may change. If you do ranges at bus level, you assume you have same non-secure & secure mapping for all the processors. If we do same at peripheral level, we take same assumptions.
  • Stefano: True. Would be simplification for that case.
  • Loic: If we do same at peripheral level, we take same assumptions. Having address map at cpu-clusters: It's view by CPU and that should be propagated to all masters (e.g. DMA) programmed by this CPU b/c important if DMA configured by secure side, it accesses to peripheral by same register base. Think we need to do something generic, maybe assume all processors do not have exactly the same mappings.
  • Next steps
    • Stefano will write out proposal
      • Secure-address-map
      • What can be done simpler for simpler cases?
    • Will continue conv w/ Rob about ranges & reg & what can be done to avoid duplication
    • Nathalie & Stefano: Set up another call w/ Rob.
Clone this wiki locally