-
Notifications
You must be signed in to change notification settings - Fork 294
System Device Tree Meeting Notes 2022
Nathalie Chan King Choy edited this page Jun 6, 2022
·
8 revisions
-
Recording link
- Passcode: +F.gK7y*
- Alexandre Torgue
- Anirudha Sarangi
- Appana Durga Kedareswara Rao (Kedar)
- Arnaud Pouliquen
- Bill Mills
- Bruce Ashfield
- Christopher
- Loic Pallardy
- Nathalie Chan King Choy
- Siva Durga Prasad Paladugu
- Stefano Stabellini
- Tanmay Shah
- Tomas Evensen
- Valentin Monot
- Joakim Bech
- Rob Herring
- ST prototyping a generic way to transform dts file in c-struct for baremetal/rtos environment
- Xilinx example of DTS transformation for baremetal drivers
- If time allows, automatic generation of Xen passthrough device tree config files using lopper (not enough time)
- We need to make Lopper a little easier to get started with, without directly talking to Bruce
- Baremetal: We should see if there are synergies. Xilinx use case: Trying to minimize change to the drivers, but should be similar to other RTOS & baremetal
- More to discuss: Where is the true source for all this? If it's Linux, where does CPU cluster for the other cores live? How to create a System DT out of that? What if DT is only baremetal, no Linux?
- Kedar: Document the YAML & C structure info. Discuss w/ Anirudha where to post these.
-
Recording link (please download sooner than later if you need to catch up, before it expires)
- Passcode: DTj5u!#c
- Bruce
- Using more assists & lops
- Writing next level of assists to understand device more & do some validation
- As we find new ways to deploy Lopper, trying to abstract into libraries for reuse
- Various Xilinx & non-Xilinx use-cases
- Helped Marti with secure/non-secure split for Nordic
- Ppl can raise bugs & feature requests off of Devicetree.org
- Bill: Is there an example flow? Is it in Xilinx 2022.1 product?
- Tomas: We've been pushing it out through Yocto & engaging with customers via OSS right now. Xilinx still phasing out older flow.
- Bill: In meta-xilinx?
- Appana: Yes
- Bruce: Have examples/readmes, but so far have been helping individuals get started. Needs a bit more wrapping in getting started doc.
- Loic:
- Today we have __ where we have all the standard SW components on Cortex-A side configured thru DT & Cortex-M with baremetal. How to have 1 unique tool to configure baremetal OS that is OS-independent? DT is a good solution.
- Valentin has been working on prototype for 2 mo to generate C structure from DT in generic way, similar to how Zephyr is doing, but Zephyr-agnostic. How could this platform data be re-used by platform driver?
- Would like to share w/ you & collect feedback to ensure correct direction.
- Valentin
- Made a Lopper assist
- Input files: DTS file, dt-schemas, dt-bindings
- Made library in Python to access schemas & bindings by the compatible of the node to extract info to generate C data
- Fill in C structure w/ all the DT values
- Using STM32-MP1 as example input file. For now, clocks are void b/c WIP.
- WIP: generation for phandle nodes
- How to process optional nodes? (we don't process them yet)
- Stefano: This is very similar to what we're doing for Xen: starting from host DT w/ properties like what you show & need to generate DomU DT for 1 guest. Need to transform DT accordingly b/c nodes in host DT. Compatible string, regs, interrupts, clocks. Output another DT. Similar problem: What to do w/ optional properties?
- Loic:
- We are targeting to be very generic & have C struct that could be processed at compile or runtime by baremetal driver, whatever the environment/RTOS.
- Why process at compile time? Sometimes limited in memory
- Anirudha: Xilinx has 180 legacy baremetal drivers that all had their own way of working. All have some C structures we used to populate in a different way. Now we use Lopper assist files.
- Kedar showed legacy driver & updated to use Lopper assist output
- HW persona hands off XSA to SW persona
- YAML bindings similar to Linux YAML
- Also generate xparameters.h and linker script
- Have published the baremetal assist files to devicetree-org/lopper repo
- Anirudha:
- The lopper assist files are kind of biased to Xilinx due to the legacy drivers, but the YAML DT bindings assist is similar to Linux
- Clocks property: We will add very soon
- Loic: We are mainly YAML & don't have any dedicated C-struct definition
- Kedar: Document the YAML & C structure info. Discuss w/ Anirudha where to post these.
- Loic: Valentin will clean up his proposal & provide it on a branch on his GitHub so that others can review to give feedback. There are still some future features: e.g. compilation flags for some drivers, scatter file for reserved memory. We are not at System DT level. We are using Lopper to parse DT & bindings to get to C structure. Still need to get to domain -> generic C structures for 1 RTOS.
- Bill: Both approaches generate C structures at compile time, which don't change at runtime?
- Kedar: Yes
- Bill: And Xilinx tries to push to pre-processor to eliminate data & code where possible. Would be nice if user could choose what part at compile vs. runtime.
- Tomas: Had discussed generating a blob on host that could be loaded at boot time
- Anirudha: Did prototype & it works. But we had to make the legacy drivers work first. Then we need to make drivers independent of the configuration. PLM should be as generic as possible & just rely on config data & not compile-time parts.
- Tomas: Driver doesn't know when it points to the structure if it was loaded or made at compile time.
- Bill: Sounds like we have islands of work (Xen, ST, Xilinx, Nordic, Zephyr)
- Bill: If I'm a new person who wants to get involved. I have a Linux DT. Do I turn my Linux DT into System DT? What is onboarding flow? Have to check kernel DTs into kernel.
- Tomas: Think of Lopper as a framework.
- It can prune a System DT into Linux DT or other DT
- After that processing, you do the assists. So, you could just start with assist processing on Linux DT.
- Lopper doesn't do a bunch of things on its own in any pre-defined flow, more does what you tell it you want. Has internal representation & can use assists on them.
- Bill: If you do want benefits of System DT… Linux DT only addresses Linux & not R5. I want to start w/ what I have on Linux & then generate some stuff for R5. Is there a way for Lopper to take Linux DT & transform it into a System DT that we can prune for the R5 & generate C structures? Or, do I have to switch to System DT on a date to use both Linux & R5.
- Tomas: Are you asking can I add _ in a non-intrusive way to a Linux DT?
- Bill: Yes, can you say what the transforms look like & undo?
- Stefano: It's possible
- One part of System DT is the domains, which isn't needed for this discussion on R5
- For R5, need to add CPU cluster node
- Could have address map that is/isn't 1:1
- Some minor differences here & there to allow address transformation (simple vs indirect bus)
- Can think of System DT as approximately Linux DT + CPU cluster node
- Tomas: Think of Lopper as a framework.
- Bill: Looking at upstream kernel today & DT that is in kernel for Xilinx today, that's pure Linux DT? Where is Xilinx's canonical System DT? If there is some of this, maybe we need to start collecting this communally at devicetree-org.
- Tomas:
- Relates to DT Evolution discussion of where to store the System DTs if Linux is THE source?
- Xilinx has other DTs if you use PL
- Yocto flow runs Lopper
- Bill:
- Not trying to get kernel to move
- For the stuff not going into kernel, we should collect those someplace so we can compare them
- Can we share more of the source of what goes in besides what goes into kernel
- Stefano:
- Agree. Maybe we can't get to 1 place, but can we at least get to 2 places (everything not Linux)
- Many projects have their DT definitions
- Linux is the largest project, which makes it the default
- Rob involved with both Linux & spec
- QEMU, Zephyr, OP-TEE, TF, U-Boot has own bindings
- Loic:
- Have been discussing w/ Arm, Rob
- Superset repo that could be reference to populate the other
- Rob:
- It's not Linux bindings, it's DT bindings hosted in Linux tree. Used on U-Boot, BSD.
- We have stripped down tree that is filtered out from kernel tree. Would take patches against that tree, but haven't gotten any patches.
- For common bindings, we're trying to host DT schemas
- Lives on kernel.org, work started to move it to devicetree-org but it fizzled.
- Ian Campbell has a machine that runs a script, but he's not active, so would be good if we could move to someone active. Main issue is it's signed with his key.
- Stefano: If TF or Xen was introducing a binding that is not meant to be parsed by Linux?
- Rob: I will accept those, doesn't have to be for Linux.
- Loic: ST has already pushed some TF-A & _ to common bindings
- Bill: If we added some extra DT source that goes to this tree, the synchronizations script wouldn't delete it?
- Rob: Would transform the patch to apply it to the Linux tree for bindings.
- Bill: If we put non-synchronized stuff in its own directory, could adjust the script to not touch that.
- Stefano: Maybe need to do better at reaching out to the different communities
- Tomas:
- Tomas: Summary
- We need to make Lopper a little easier to get started with, without directly talking to Bruce
- Baremetal: We should see if there are synergies. Xilinx use case: Trying to minimize change to the drivers, but should be similar to other RTOS & baremetal
- More to discuss: Where is the true source for all this? If it's Linux, where does CPU cluster for the other cores live? How to create a System DT out of that? What if DT is only baremetal, no Linux?
- Ayan Kumar Halder
- Oleksii Moisieiev
- Bill Mills
- Bruce Ashfield
- Max Zhen
- Mark Dapoz
- Loic Pallardy
- Stefano Stabellini
- Arnaud Pouliquen
- Andrew Wafaa
- Nathalie Chan King Choy
- Lizhi Hou
- Rob Herring
- Christopher
- Etienne Carriere
- Ed Mooring
- Tanmay Shah
- Sonal Santan
- Stefano, Oleksii: SCMI IDs
- Bruce, Ayan: Xen passthrough device tree & Lopper (ran out of time)
- Lizhi: Follow-up questions from previous call (ran out of time)
- Stefano: Wrap up secure/non-secure topic (ran out of time)
- Continue discussion on LKML. Sounds like more promising is more traditional service provider-consumer approach
-
Download the recording sooner than later if you need to catch up, before it expires
- Passcode: Y&1.**1B
- Problem statement regarding SCMI IDs
- SCMI spec is for power mgmt, clocks, resets, and related
- Device IDs are 32-bit integers & used to restrict access to devices on certain SCMI channels
- e.g. Might be able to change power state on one device & not another
- SCMI IDs are not used by Linux yet & not described in DT today. Will start to be used by Xen with Oleksii's series.
- Operation to restrict access is usually done by higher privilege entity than OS (e.g. firmware, hypervisor)
- Why interesting to System DT group: SCMI IDs are cross-project by nature (Xen & TF at minimum, future maybe OP-TEE, Linux, etc.)
- Can we make the DT description of device IDs generic? Generic Device ID in DT could be useful beyond SCMI.
- e.g. Xilinx EEMI IDs & SMIDs
- Questions to discuss
- 32-bit or 64-bit?
- Do we need a phandle?
- Where would the bindings live for use by TF-A, Xen, etc. if not used by Linux?
- Bill: You're not trying to define any semantics to the value used, those are specific to SoC?
- Stefano: Not trying to do that. Could be any number, as long as it's unique.
- Rob: Don't we already have device IDs e.g. clock bindings where SCMI is clock controller & clock cells is device ID or clock ID?
- Oleksii: But one device can have >1 clock, >1 power, etc. SCMI spec says have to stick to exact Device ID.
- Rob: What is device ID used for?
- Oleksii: Used for permissions - restricting access to device on certain SCMI channels.
- Rob: Sounds like what ST and others have proposed for controlling or getting status if you have access to a device (e.g. w/ Trust Zone controller)
- Oleksii: Think bunch of different mechanisms need same. I'm working on SCMI protocol.
- Stefano: Note: SCMI has 1 SCMI call to set authorization/privilege of a channel & takes device ID
- Oleksii: Takes agent ID for agent using the channel & the device ID. SCMI spec doesn't describe where agent should take the Device ID, it's implementation specific. That's why we suggest to add it to DT binding.
- Etienne: Device ID relates to client, agent, HW, back-end HW resource? Device can be lot of things.
- Oleksii: We have firmware, which includes SCMI _ which receives messages from client, which is OS. Each OS using SCMI should open channel & then send messages to the FW, which is TF-A in our case. One of the SCMI channels is privileged to give access to base commands. Privileged agent can set permissions for other agents. SCMI driver in FW has some set of pre-defined devices (clocks, power, etc.) and FW knows about this relation. What client should know? What device is related to which ID. Client needs to provide this ID to SCMI when it sets device permission.
- Stefano: In simple case, Xen passes dev-id to TF to tell it that Linux can access this device, or Xen tells TF that this device ID is not accessible by the channel used by Linux.
- Etienne: Relates to specific device? Groups HW resources?
- Stefano: Not really grouping resources. If you use same device ID for 3 devices, it means you can't isolate. That's not how meant to be used. How meant to be used: Each device has own ID & DMA master or node in DT could correspond to device. SW could say which devices are permitted or restricted on a channel.
- Etienne: So, device ID relates to specific HW resource exposed by SCMI server. SCMI i/f for privileged client to configure which device ID is accessible from which agent. This info is dedicated to the server & privileged client & it is abstracted/opaque ID.
- Stefano: Yes
- Bill: Same ID for every agent?
- Etienne: Only privileged & F/W can manipulate Device ID
- Rob: FW can make up whatever it relation wants, whether 1:1 to device or not
- Bill: Likewise, if peripheral could be successfully used by >1 agent, it could be enabled for >1 agent if SCMI server is smart enough. SCMI server implementation that defines what these IDs are & what they mean.
- Rob: Sounds like what ST has proposed & looked for solution for.
- Loic: Yes & no. ST would like to check at probe level the access rights. Here, the need is more to build the access right table rules that will be included in the SCMI servers according to the different user of these resources. It's for the runtime, but device ID of the IP that will be used by SW components according to which SW component using this IP, we define list of SCMI agent that could have access to resource.
- Rob: Whether you're setting or getting permissions, doesn't matter. Clock provider can know or set frequency, it's not encoded in binding.
- Stefano: Difference is that clock & reset is manipulated via SCMI, but before you can do that, you have to set device ID. So, it's 1 step before agent can use the clock manipulation calls.
- Rob: For what you want to do, SCMI is 1 implementation. There could be many implementations you do this with. So, device ID is purely SCMI & not generic.
- Etienne: It is SCMI specific ID. It's an abstract ID to be known between agent & server
- Rob: Way to solve in DT is to have provider & consumer (phandle + arg cells). What the cells are is defined by provider & opaque to client. Name (instead of dev-id) should be related to what you're providing.
- Stefano: That makes sense.
- Bill: Would be good to see what the reset binding looks like for a peripheral b/c
- Rob: It's the same pattern. Clocks is same pattern.
- Loic: What to do with pattern after? What is goal?
- Stefano: When you start VM w/ assigned devices @ runtime, then Xen can enforce that DomU can access SCMI, but only change state of that 1 device & not others. Xen will parse host DT, be asked to assign timer to a guest, check that there is SCMI device ID & say that the channel will only be open for that Device ID & make call to TF-A. Then DomU Linux can change power state of device on that channel. Doesn't matter if you give the dev ID to Linux b/c Linux can't do anything with it. If operation not permitted, TF-A should reject.
- Ayan: If SCMI device ID is not used by Linux, can't the FW have a shim layer to match register space w/ some device ID, since only FW will use device ID?
- Stefano: Not just FW, hypervisor also needs device ID so it can set permissions at runtime.
- Etienne: SCMI spec says can have agents that can configure these access rights, but there is only a command that allows agent to set rights, but no command for agent to query if it has access. It's only for configuration.
- Stefano: Need 3 entities for it to be useful:
- TF-A implements the calls
- Privileged entity: Xen, OP-TEE, Linux if user space is unprivileged
- Unprivileged entity is target
- e.g. Xen + Linux
- e.g. Linux + VFIO driver
- Bill: Would be helpful to see the whole device binding w/ clock reset & power entry. How did Linux not need device ID before?
- Stefano: Because it wasn't privileged.
- Etienne: Today in Linux, there is only single SCMI client & SCP, with only 1 agent & client, no need to restrict.
- Loic: It's hard coded in the DT. We have #define, e.g. with base address or offset, up to SoC provider to decide how you want to address the different devices.
- ~30:00 in recording Oleksii showed example of usb_dmac0 with scmi-devid
- Refer to the recording for discussion of the example
- Example is in the System DT mailing list archive
- Stefano: Binding is same with service provider & consumer?
- Rob: But Device ID is not a service
- Stefano: So what should it be called?
- Rob: Go back & look at prev proposals where it does in HW and come up w/ something that works for SCMI, or something else.
- Loic: Way we proposed at ST when wanted to modify firewall to get or check access, it's complementary functions. First we need to check that we have access. If we have access we need to request SCMI services access also. That could be linked to our initial proposal.
- Loic: After many reviews from Greg KH, it's more to have specific ST bus with specific probe functions. Reference ID is the reg (base address of device). Can do correlations to firewall to make sure we have correct access rights. Easy to add or verify during probe if there is SCMI phandle to perform the request & Device ID parameter will also be base address b/c it's the common reference in the SoC for the device. Don't know if need specific binding b/c can be automatically done at bus level.
- Bill: Then we would have new bus for each SoC?
- Loic: Was new bus for ST b/c pin_ctrl framework was rejected by lot of community developers. Greg proposed to have a bus & we were only one to have this problem. Now it looks to be a common problem, so now could look at making standard bus functions.
- Stefano: This doesn't look like a bus function, it's an ID that is understood by SCMI implementation/back-end & privileged client, so not a bus property, it's identifier for device.
- Stefano: Provider & consumer model is stretch b/c it's an identifier instead of provider or property, but could reword as scmi-access. But there isn't a bus.
- Bill: Kind of is a bus if you look at it sideways, if all these devices were on PCI, you wouldn't have this problem. Want a smart platform device that kind of has those same properties.
- Loic:
- If you don't have SCMI services in your node, you don't need to request access rights. You only ask if you need access to SCMI resources.
- Privileged SW will grant access to someone else. Should be done before execution of guest.
- Xen use case is maybe bit different, but think at the end we have a list of access rights to get before can access the device. That's what we proposed in past & we got answer to implement as bus.
- Bill: Think model here is anything described in DT & enabled you have access to when you start. But, you may want to create new agent handles w/ less privilege than you.
- Loic: Xen could allow physical access & SCMI access. SMMU level 2 should be correctly programmed. If want to distinguish between the different virtual IDs, should also program firewall. So, have same pre-probing mechanisms to perform as w/ Linux.
- Stefano: Think you're right w/ correlation. Xilinx don't have that problem b/c our bus firewall isn't programmable at runtime, but if it was, Xen could ask SCMI to restrict channel to device ID 19 and program bus firewall to restrict to ID of firewall. These are 2 different ID spaces. Don't see how 1 new bus could solve the problem for both. No guarantee that SMCI is related to bus firewall.
- Loic: Except if device ID for SCMI & bus firewall is same. You can define that it's base address of device then it's up to firewall to have translation table & SCMI server to have translation address. All addresses are memory-mapped. Base address of device is unique.
- Stefano: That's possibly the most common case, but it's not guaranteed. Bus firewall could have different IDs from SCMI, could be different widths.
- Loic: Could create abstraction layer. So, your reference is your physical address & up to drivers to convert.
- Bill: There is legacy to clock, reset & power domains. So having provision for different IDs there makes sense. If going to invent new things, better to have unique device ID that's opaque & FW figures out. Firewalls could use different value but FW (whoever SCMI server delegates to)…
- Stefano: See the confusion: You are assuming bus firewall is configured via SCMI or by same entity. It could be configured directly in a separate way.
- Bill: Not saying that. Saying whoever differentiates the common ID to domain-specific IDs has knowledge.
- Bill: Sounds like there is a pattern. SCMI, Xilinx, TI FW have this pattern. We need an additional ID that is decoupled from clocks, resets & power domains. Think it should be more generic & this would be the platform bus device ID.
- Etienne: Old Benjamin Gaignard patch for domain controller - had abstracted ID that described a domain. Domain was related to agents that can access the domain. It was meaningful to the agents that can control.
- Bill: If our SoCs had, like Intel SoC, a common infrastructure of PCI even for internal things, then there is a unique PCI device ID for every device in the system. Because we rely on platform devices, we don't have that. Suggesting we create a platform-specific device ID.
- Rob: We have way to do that: address + node name
- Bill: Not great b/c address is unique for given master, but not same for every processor in system.
- Stefano: Also have to think about SCMI spec compliance which requires 32-bit ID. Others could be 64-bit IDs.
- Bill: SoC address? Not necessarily as it appears in Linux DT.
- Rob: What are the >1 things that the ID applies to? It applies to device permissions, what else?
- Stefano: Could apply (not on Xilinx HW) to bus firewall configs if ID is same.
- Rob: If.
- Loic: It's a choice. We do this choice to use physical address of device after discussing w/ Greg b/c it is a unique way to identify device. Know where we want to access b/c we have address. Bill's point is valid: If we have several masters w/ different physical memory mapping, how to converge on unique physical address for all the devices?
- Bill: Your current platforms have 32-bit base addresses, which works for SCMI. If you have 64-bit, it doesn't work.
- Loic: If you follow Arm recommendation, all your devices should be in lower part of address. But, it's only recommendation and you could do what you want.
- Stefano: Suggest to continue discussion on LKML. Sounds like more promising is more traditional service provider-consumer approach.
- Sonal Santan, Bill Mills, Soren Soe, Rob Herring, Mark Dapoz, Valentin Monnot, Lizhi Hou, Bruce Ashfield, Stefano Stabellini, Loic Pallardy, Arnaud Pouliquen, Christopher, Tanmay Shah, Max Zhen, Nathalie Chan King Choy
- Discovery of PCIe apertures based on Device Tree overlays on ARM and x86. It is another example of using Device Tree to describe "more" compared to what it is commonly used for. In this case, it is used to describe PCIe apertures and other details of a PCIe device at runtime.
- (Didn't have time for) Conclude on security topic: How to express resources that have both secure & non secure address that are different
- Bill had Qs, but think we resolved most of the Qs over email
- Stefano has example from Grant, which may address Bill's point
- There are 3 pieces that need to be worked on separately
- How do you describe the PCI device? There are PCI bindings that existed for 25 years for DT. That part is a solved problem, just have to define what the device node looks like.
- How you describe everything downstream of the PCI device. Do-able if the device has a node in DT (the first problem). Probably addressed with overlays if you want to have same overlay applied to different nodes & translated to different addresses. Address translation for PCI is bit more tricky b/c BARs are dynamic. Assigned addresses property would contain BAR addresses. Some challenges to apply different overlays to different base nodes.
- ACPI: How do you do all this in a system that's either not DT or the base node is not described in DT b/c it is discoverable & dynamic. This is not just a problem for PCI. e.g. FTDI USB-serial chip w/ I2C, SPI, GPIO hanging off & want to plug in 10 to your system. This issue has been on the radar for some years & it's not solved. Not a hard problem to solve, but someone needs to look at it & come up with a proposal.
- Solution you come up with needs to work first in a DT-based system. Then, we can discuss what it looks like in a non-DT system.
- Easier to start w/ Arm-based platform w/ good DT support for PCI. There are examples of that. This is stepping stone to X86 w/ DT enabled.
- Can test on QEMU
- Recording (expires in 180 days, download/watch before then if you need to catch up)
- Recording link
- Access Passcode: 66TV9+M1
- Link to slides
- Sonal:
- Xilinx Run Time (XRT)
- Lizhi trying to upstream the kernel drivers for Alveo
- Have to add some infrastructure pieces to enable the use cases for Alveo
- What is the right architectures to get support for Alveo in mainline Linux kernel
- Xilinx Run Time (XRT)
- Lizhi from XRT team
- Have been working with FPGA Linux community for a while
- Got a lot of suggestions for overall architecture
- Alveo HW
- Alveo card is PCIe EP (not a bridge, no PCIe device underneath) & contains multiple HW apertures. These independent apertures are exposed on PCIe device BARs & each aperture has its own driver.
- Each can be programmed with different FPGA images (shells), which can have different architectures w/ different # of apertures & apertures at different locations.
- Different cards can be plugged into 1 server
- FPGA image could be dynamically programmed back & forth between shells, so FPGA image must provide meta-data. Currently using flattened DT format.
- Bill: Is aperture 1:1 per FPGA dynamic reconfig slice, or just a memory region?
- Lizhi: Actually, HW has 2 types:
- Basic shell, which is image saved in flash, which gets programmed to FPGA. User can re-flash another shell & cold reboot.
- Multiple partition design: FPGA image in host file system. During driver boot, driver will program image to FPGA. Dynamically switched.
- Also have user region on card & user can download FPGA image to that part of the card & this may have apertures.
- Sonal: PCIe BAR layout: # of BARs is fixed, comes from the thin shell from the flash. But, what appears on the BAR is dynamic. Xilinx tools have liberty to place the IP anywhere in the BAR & then generate the flattened DT.
- Rob: So A & B in slide 2 are same IP in both cards?
- Sonal: Yes
- Bill: So we can look at A, B, C as discrete interfaces
- Rob: There are 3 pieces that need to be worked on separately
- How do you describe the PCI device? There are PCI bindings that existed for 25 years for DT. That part is a solved problem, just have to define what the device node looks like.
- How you describe everything downstream of the PCI device. Do-able if the device has a node in DT (the first problem). Probably addressed with overlays if you want to have same overlay applied to different nodes & translated to different addresses. Address translation for PCI is bit more tricky b/c BARs are dynamic. Assigned addresses property would contain BAR addresses. Some challenges to apply different overlays to different base nodes.
- ACPI: How do you do all this in a system that's either not DT or the base node is not described in DT b/c it is discoverable & dynamic. This is not just a problem for PCI. e.g. FTDI USB-serial chip w/ I2C, SPI, GPIO hanging off & want to plug in 10 to your system. This issue has been on the radar for some years & it's not solved. Not a hard problem to solve, but someone needs to look at it & come up with a proposal.
- Sonal: PCI DT bindings - since the BAR addresses are not fixed in PCI class of devices (unless root port) - how to intersect that with non-DT system?
- Rob: For non-DT system, you'd have to tell Linux to create the device nodes & hierarchy. You could start with DT system where you're not describing discoverable devices in DT. For ACPI systems, you'd probably have to manufacture root port nodes in a skeleton DT & then once those are there, you can apply your overlays to those.
- Sonal: ACPI system w/ skeleton DT - full hierarchy from RP to device?
- Rob: Probably easiest to create the full hierarchy. Keep in mind, PCI DT spec was written originally for open firmware, not flattened DT. OF firmware went to look what PCI devices were there & populated DT based on that. So you got what was discovered, not the other way around like with the flattened DT. Assigned addresses came out of that - what config of system is.
- Sonal: How to describe downstream devices w/ overlays?
- Rob: Normally w/ flattened DT, you are describing device & function. The reg property in PCI devices is the device & function. Especially w/ PCIe, you have bunch of PCI-PCI bridges at each level. Good example of hierarchy with HiKey board: PCI root port with PCIe switch on the board, GPIO lines on downstream of PCIe switch.
- Lizhi: What is benefit to create device node for all the PCIe devices?
- Rob: Once that is in place, the kernel already supports associating PCI devices with device nodes. Would work same way if you are on DT system if you have it described already vs. if you are on ACPI system where you have to generate it.
- Lizhi: PCI is self-discovered & it can create the struct device by enumeration, then why to create DT node for it - other functions & applications need the DT support?
- Rob: Yes, think it will depend on what you need to describe in DT.
- Stefano: You're going to need that PCI DT node, otherwise you have nothing to apply the overlay to.
- Lizhi: Proposal: To create node for the apertures, rather than all the DT node for all the devices. Here, just need 1 node to apply the overlay.
- Rob: Solution you come up with needs to work first in a DT-based system. Then, we can discuss what it looks like in a non-DT system.
- Lizhi: Eventually, we want real device nodes (struct device) for all the apertures for all the different cards in the system. Then drivers can bind to the individual devices.
- Requirements:
- Need to discover apertures on PCIe BAR via metadata
- Need to set I/O address for aperture device & eventually translate to host I/O address.
- Driver binding: aperture node in fdt w/ compatible property
- Think can leverage from Open Firmware infrastructure
- Challenges we are facing (Rob mentioned them before)
- X86_64 don't use OF
- Don't have node to overlay on & need this node to act as bus node (not pci bus)
- Minor: Overlay API doesn't support dynamic creation of target node
- Proposed solution (see slides with detailed bullets)
- If we already have a DT, there will be of_root & other device nodes already
- Would introduce new function to of_pci
- Can create dynamic DT node under of_root
- Overlay will put all the DT node in the fdt in the newly created pcie-ep-bus DT node
- If there is non-DT system, then create empty node
- API that driver can call explicitly to create pci-ep-bus@addr
- Pcie-ep-bus node for dynamically pluggable PCIe device, can have >1 card in system, in any slot, which can't be statically defined in base tree
- Important: Creating ranges property based on PCIe BAR. PCIe BAR base address is being assigned & can generate ranges property to translate underneath device nodes into CPU I/O address.
- If the first 2 parts of solution are addressed, then OF infrastructure can create the platform_device nodes we need for the apertures
- Requirements:
- Rob: Think this needs to be separated into different problems. Take out the dynamic part of it first and solve that if everything is static. Then work from there to do it dynamically. Think DT would look like host bridge in base DT w/ RP bridge. If that's directly connected to your device & your device node is under there. That will be standard PCI device node binding, which describes device & function & what bridge it's under. From there, looks like you need a simple_bus node for each BAR.
- Lizhi: Correct
- Rob: Since the BARs are not populated statically, that would need to be dynamic. Driver or PCI infrastructure would want to populate assigned addreses. If that' sin place & ranges properties are in place & correct, the address translation should all just work. e.g. ISA bus under PCI bus works already. ISA is basically a platform bus type. Then driver just has to call of_platform_populate & it will find the simple bus & create devices for all the devices on the simple bus. Drivers can do it for their child nodes.
- Lizhi: There could be a simple bus under the PCI device instead of directly under the of_root?
- Rob: Right, or might be a different name from simple bus. Simple bus means there's nothing to program to access the bus.
- Max: Simple bus created by driver?
- Rob: No, simple bus & devices would be in your overlay. Driver would apply the overlay to the PCI device node.
- Lizhi: Think ranges at least has to be dynamic & can't be part of fdt
- Rob: Perhaps.
- Max: When PCIe driver attached, then we know the range of each BAR
- Lizhi: Can be part of PCIe enumeration.
- Rob: Think you can statically set assigned addresses & will set your BARs to what assigned addresses said. For translation, do the other way around, assign the BARs then populate assigned addresses. Can't recall what does/doesn't work off top of head, but there is code in kernel that has been there a long time. PPC dynamically populates PCI devices w/ DT & removes them.
- Lizhi: For generating DT node for all the PCI devices, has this topic been discussed before w/ PCIe owner, or is it new topic?
- Rob: New topic. Existing topic is if you need to describe stuff in DT, then you populate your DT w/ PCI hierarchy.
- Lizhi: If we want that feature, how to convince PCI owner that this will be helpful to real use case.
- Rob: Think will not be a problem - am a PCI reviewer & manager is a maintainer. Was recently thinking of adding something like this, so that ppl can make sure their PCIe hierarchy is correct. Was during process of getting HiKey's DT description correct to match the actual hierarchy (ppl often forget to put in the RP node). You could do something in userspace - look at sysfs PCI hierarchy & create DT nodes based on it.
- Sonal: What Lizhi has proposed is very localized change. Sounds like you're proposing a full infrastructure. That's a functionality for all PCIe devices that would come up in standard Linux if enabled in global config.
- Rob: Could be a specific driver that says to do it, instead of PCIe infrastructure doing it automatically
- Sonal: Assigned address is infrastructure to do the reverse where BAR addresses have been assigned & need to create DT based on PCI device self-discovery that has happened?
- Rob: Historically, it's what the firmware did for BAR assignment, or could be what Linux did for BAR assignment.
- Sonal: Once we have infrastructure, then PCIe device driver can request Linux to build til my node, then rest of Lizhi's proposal should work on top of that?
- Rob: Believe so, but those are 2 problems you can work on separately. Don't try to solve it all in 1 patch series.
- Sonal: Not sure if we have an Arm64 system?
- Stefano: ZU+ systems
- Rob: Have reviewed Xilinx bindings for Arm platforms
- Stefano: There are plenty of Arm servers. Which have DT support?
- Rob: Lots of Arm boards have PCI now, but usually M2 or x1 connector at most. Rock chip Rock Pro x1
- Bill: ZCU102 has x4 or could use adapter, pcie x1 to x16 extension cable
- Bill: Sequencer developer box use EDK and U-Boot, so think they do ACPI & DT
- Stefano: This is to test the description & discovery
- Rob: Reviewed Versal CPM5
- Lizhi: IBM Power?
- Rob: That's actual open firmware systems generally. Power host bridge support is its own implementation, not the common one. It uses some of the common code, so may not matter. PPC has its own history & issues.
- Lizhi: That may skip the entire part to create this DT node, the main feature Rob suggested.
- Stefano: We have QEMU team at Xilinx who might be able to write an emulator for PCIe config space of Alveo card to be able to test the discovery. May be enough to get you going. You're not trying to do DMA or bitstream programming. Then you could do driver probing & test if it works.
- Lizhi: 1st step is just to create DT node & don't need deice?
- Rob: They are separate threads of work, you can tackle in any order.
- Sonal: If we cannot get the HW, then we may have to try on X86 and make the change required. X86 currently has no DT & DT infrastructure is not compiled in. We can come up with that patch series first.
- Rob: It can be enabled.
- Sonal: We can start w/ PCIe DT changes first & test on X86 w DT infrastructure enabled.
- Rob: And you can test on QEMU. Arm virt machine model supports PCI out of the box. You can add models of PCIe switches & bridges and create any # of nodes in the hierarchy
- Bill: If they start on x86, don't they need DT to describe the whole x86 system
- Rob: Not the whole system, but that is a separate issue to solve & not PCIe-specific: How to create a root DT on non-DT systems
- Bill: Easier to start w/ Arm-based platform w/ good DT support for PCI. There are examples of that.
- Sonal: Challenge: All the Alveo customers we know of are on x86.
- Bill: This is just stepping stone.
- Stefano: Today was not most traditional System DT topic, but relevant: adding DT to describe things that are not described before. Next time we can go back to traditional discussion & will follow up w/ Bill's question.
- Tammy Leino
- Nathalie CKC
- Marti Bolivar
- Stefano Stabellini
- Bruce Ashfield
- Mark Dapoz
- Loic Pallardy
- Alexandre Torgue
- Tomas Evensen
- Bill Mills
- Tanmay Shah
- Sergei Korneichuk
- Stefano: To raise Q w/ Rob in email
- Stefano: Could write System DT part of spec just to add secure-address-map not tied to secure-bus. Secure-address-map could work for range specified through simple bus. Had always thought of the 3 features together, but this can be split out.
- Stefano will write out proposal
- Secure-address-map
- What can be done simpler for simpler cases?
- Stefano: Will continue conv w/ Rob about ranges & reg & what can be done to avoid duplication
- Nathalie & Stefano: Set up another call w/ Rob.
- Recording link (download recording before it expires!)
- Passcode: 23F1qb?#
- Stefano's slides in this attachment to mailing list
- Scenario: Looking at secure bit is in the transaction
- Marti has an example of such peripheral
- How to represent the secure & non-secure sets of addresses?
- What is the impact on other things such as the bus or cpu cluster?
- Bus has range property that tells us how addresses of child nodes are translated to parent address space
- If device has 2 addresses, how will ranges property work?
- Example of secure vs. Non-secure addresses
- Ranges is a very important property
- Address-map: What is address map of a given CPU cluster
- Sometimes there are multiple CPU clusters in an SoC & mapping of each cluster is a little different
- Allows us to express what is mapped in a CPU cluster & what is not
- Reg, ranges, address-map are all impacted if >1 address for the device (for secure & non-secure)
- Previously discussed if we should introduce secure-reg or extend reg.
- Example: extending reg
- Has non-secure address & size + secure address & size
- How to distinguish? 1 extra cell in each tuple that tells the execution mode.
- How do we know if there is 1 more cell?
- Need a different compatible string & bus type: e.g. compatible = "secure-bus"
- #address-cells with multiple cells w/ address type (secure or non-secure)
- Bill: Looks like very general implementation that could describe lots of interesting & complex situations. Concern: The simple cases are much more constrained than this. This could be error prone for the simple cases.
- Simple case: Secure & non-secure address is same. Just secure bit that dictates response.
- Each peripheral will declare if they are secure, non, both.
- It looks like Arm says for Cortex-M you can't use same address for secure & non. So, entire bus ranges get duplicated.
- If you could distinguish the simple cases, you could make it easier for a human to create.
- Stefano: If only responding to secure, then it's different "address"
- Bill M: Semantics
- Stefano: This is a discussion where we could use Rob b/c he suggested this scheme. Maybe:
- Could put all devices w/ same secure & non-secure addresses under simple bus
- Could put devices where they are different w/ secure-bus
- Example: extending reg
- Stefano: Putting 2 addresses in reg isn't sufficient to solve the problem. Need ranges.
- Ranges: 1st cell is new that tells us if the mapping is for secure or non-secure addresses.
- Bill: Suggestion: Simple secure bus, cortex-M style secure bus, and this one that can do anything (the sledgehammer that can do everything for complex cases). It's a lot to get wrong for the simple cases.
- Stefano: Maybe could handle in ranges so that reg could be same for simple case. Would have to ask Rob.
- Bill: Does it have to be ranges? Could it be secure-ranges?
- Stefano: Could be, or, ranges-address-cells?
- Bill: Xilinx has good Cortex-A examples. Zephyr Cortex-M example w/ bank of peripherals w/ 1 address range & that appears at non-secure address & secure address () may help you see the simple case.
- Next problem to solve: Address map
- Showed all 3 to see what they could look like
- Traditional just as a reference: Useful for special address for Cortex-R
- Secure-address-map w/ property to describe secure mappings
- OR extending address-map with a secure/non-secure execution mode cell
- Think both options could work. Extending can be more cumbersome. Secure lets us have separate for simple.
- Would it be invalid if we set address-cells = <0x1>?
- Mark:
- Doesn't address-cells affect both addresses in that mapping?
- If 64-bit, then address-cells would be 0x3, but then that doesn't work.
- Stefano: Don’t think address-cells can be lower than 2 to prevent ambiguity
- Execution type
- 1 or 2 for address
- Full example would be w/ amba simple bus and a secure bus w/ devices w/ different addresses
- Mark: Confusing. We're overloading the meaning of address-cells & part of it is defined above to mean something else.
- Bill: In traditional DT it means 1 thing & System DT another
- Stefano: Sounds like secure-address-map is the way to go
- Bill: Sounds good. Since that's the solution there, would suggest ranges & secure-ranges and reg & secure-reg and make it all consistent. Then the simple case becomes simple & can assume if the secure one is missing..
- Stefano: To raise Q w/ Rob in email
- Stefano: "Secure" only applies to secure execution mode. But only secure will have __
- Bill: Could be different for non-Arm architectures. Better to be explicit in the extension.
- Showed all 3 to see what they could look like
- Stefano: What about execution domains?
- Domains are unaffected b/c from execution mode, can tell if access is secure or non-secure
- That's the full example
- Marti: Will have to take some time to digest the slides
- You were saying a fully generalized example would have a secure-bus and simple bus.
- Nordic case: We don't have simple buses in SoC's we’re interested in, but have indirect buses and secure buses because 2 CPU core clusters both of which are running Zephyr, where they have un-equal access to the peripherals.
- So, using address-map & secure-address-map has made it a bit easier to do the partitioning in prototypes w/ Bruce. +1 to that direction at cluster level.
- Bruce: Have been prototyping Lops with Marti. Will depend what we settle on for System DT.
- If see secure address map, know it's a secure cluster
- Can check domain too
- Loic: Same feedback as Bill. We have info in too many locations. Would like to see the register translations defined at 1 location. Domain, mode as secure or non-secure - how it could be linked to the CPU cluster address map?
- Stefano: Difficult answer. Yes, there is duplication. Want to highlight duplication was used to make example more obvious. Ranges could be just w/ semicolon if want to show 1:1 map. Think it would work for secure-bus too. Your point still stands for the rest. Would need secure address-map if secure. For domain, need both address map & execution mode.
- Loic: Using execution mode from domain to use the correct address map to apply?
- Stefano: Yes.
- Loic: e.g. 32-bit product requires some re-mapping, so will have to create several bases to fit with this b/c you are adding 1 offset between secure & non-secure.
- Stefano: System DT parent address space is more conceptual than real. Each CPU cluster will have its own address map. Parent address space is everything else - can be thought of as global mapping even if no one uses it.
- Loic: Why do you need both address level at peripheral level if translation done at top by Lopper?
- Stefano: Good question.
- Can't solve entire problem w/ address map b/c another non-cpu device (i.e. PCIe DMA master doesn't have an address-map) can do DMA to timer, which needs to use the global translation.
- However, may be artificial to need addresses at both ranges & reg
- Don't think we need repetition between range & reg.
- Loic: For CPU, will use CPU view & Lopper will replace right physical address.
- Stefano: Don’t have real example. Imagine could have co-processor on PCIe card
- Loic: Other processor will have dedicated view through PCIe card, but you still need address map of this processor. If you just use address map, you have less impact on DT & simplifying the adoption.
- Stefano: Could write System DT part of spec just to add secure-address-map not tied to secure-bus. Secure-address-map could work for range specified through simple bus. Had always thought of the 3 features together, but this can be split out.
- Loic: MCU with cortex-M33 w/ TF-M & Zephyr. Would like DT with TF-M. Would be good to have 1 definition for peripheral, giving the right address on Zephyr side & TF-M side
- Stefano: Can do. For rest, will try to sync up w/ Rob & bring the feedback to him. See Bill's point on secure-range & secure0reg that's easier to read.
- Loic: Secure status doesn't make sense & should remove it. Have discussed w/ Rob about that. Was pushed by Arm, but not widely/correctly used. Need to define secure b/c if you need another status w/ another mode, would like to avoid duplication of different fields in peripheral node.
- Bill: What are we trying to express? In domains, we are in realm of policy (firewall, MMU, hypervisor). Outside of domains, are we expressing just HW or is some policy in here?
- Stefano: Supposed to express the HW. Is device responding to address or another address, based on secure bit set. Is there a different address mapping between secure/non?
- Loic: Ranges at bus level w/ secure & non-secure mapping. If have several CPUs, they may not have exactly the same mapping b/c they can't see the same device & 32-bit mappings may change. If you do ranges at bus level, you assume you have same non-secure & secure mapping for all the processors. If we do same at peripheral level, we take same assumptions.
- Stefano: True. Would be simplification for that case.
- Loic: If we do same at peripheral level, we take same assumptions. Having address map at cpu-clusters: It's view by CPU and that should be propagated to all masters (e.g. DMA) programmed by this CPU b/c important if DMA configured by secure side, it accesses to peripheral by same register base. Think we need to do something generic, maybe assume all processors do not have exactly the same mappings.
- Next steps
- Stefano will write out proposal
- Secure-address-map
- What can be done simpler for simpler cases?
- Will continue conv w/ Rob about ranges & reg & what can be done to avoid duplication
- Nathalie & Stefano: Set up another call w/ Rob.
- Stefano will write out proposal