Add an extension for UEFI references

Add an extension python script to supplement our references to the UEFI specification with corresponding section title and web-page hyperlink. Hook the new extension into the `conf.py' sphinx configuration. We keep the index under version control for caching, and we have a python script to re-generate it. Convert all the UEFI references to use the new extension and adapt a bit for readability when necessary. Mention the extension in the README. While at it, tell git to ignore the folders produced by python when running the extension. Signed-off-by: Vincent Stehlé <[email protected]>
ARM-software · Feb 22, 2024 · 68e98bc · 68e98bc
1 parent 4cea232
commit 68e98bc
Show file tree

Hide file tree

Showing 10 changed files with 2,387 additions and 28 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1 +1,2 @@
 /build
+__pycache__/
diff --git a/README.rst b/README.rst
@@ -146,6 +146,32 @@ tag. Generally this means each ``.rst`` file should include the line
 .. _reStructuredText: http://docutils.sourceforge.net/docs/user/rst/quickref.html
 .. _Sphinx: http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
 
+Extensions
+^^^^^^^^^^
+
+Extension files are kept under ``source/extensions/``.
+
+We have an extension for referencing UEFI specifications chapters.
+
+To reference UEFI section 6.1 for example, write::
+
+ :UEFI:`6.1`
+
+This will be expanded to the following reference, with a link to the UEFI
+webpage::
+
+ UEFI § 6.1 Block Translation Table (BTT) Background
+
+Debugging the extension is easier when running Sphinx with debug messages::
+
+  $ make singlehtml SPHINXOPTS=-vv
+
+We keep the UEFI index ``.csv`` file under version control for caching, and we
+have a python script to re-generate it from the UEFI specification webpage.
+To re-generate the index file, do::
+
+  $ ./scripts/update_uefi_index.py
+
 Original Document
 =================
 Prior to being relicensed to CC-BY-SA 4.0, this specification was

diff --git a/scripts/update_uefi_index.py b/scripts/update_uefi_index.py
@@ -0,0 +1,190 @@
+#!/usr/bin/env python3
+
+from html.parser import HTMLParser
+import re
+import os
+import csv
+from typing import Optional, TypedDict
+import enum
+import logging
+import requests
+
+UEFI_INDEX_URL = 'https://uefi.org/specs/UEFI/2.10/index.html'
+uefi_csv = os.path.dirname(__file__) + '/../source/extensions/uefi_index.csv'
+logger = logging.getLogger(__name__)
+
+AttrsType = list[tuple[str, Optional[str]]]
+
+
+class ParsedEntry(TypedDict, total=False):
+    num: str
+    title: str
+    href: Optional[str]
+
+
+# State machine:
+#
+#         AWAIT_DIV
+#             v
+#         AWAIT_LI <----+
+#             v         |
+#          AWAIT_A -----+
+#             v         |
+# +-- AWAIT_FIRST_DATA -+
+# |           v         |
+# |   AWAIT_MORE_DATA --+
+# |           v
+# +-------> DONE
+class State(enum.Enum):
+    AWAIT_DIV = enum.auto()
+    AWAIT_LI = enum.auto()
+    AWAIT_A = enum.auto()
+    AWAIT_FIRST_DATA = enum.auto()
+    AWAIT_MORE_DATA = enum.auto()
+    DONE = enum.auto()
+
+
+class IndexHtmlParser(HTMLParser):
+    """A class to parse an HTML index and extract what we need from there.
+    """
+    def reset(self) -> None:
+        self.index: list[ParsedEntry] = []  # The index we have captured.
+        self.state = State.AWAIT_DIV        # Our state-machine current state.
+        self.current: ParsedEntry = {}      # The current data.
+        self.nums: set[str] = set()         # To detect duplicates.
+        HTMLParser.reset(self)
+
+    def set_state(self, s: State) -> None:
+        if self.state != s:
+            logger.debug(f"-> {s}")
+            self.state = s
+
+    def has_class(self, pat: str, attrs: AttrsType) -> bool:
+        for a in attrs:
+            if a[0] == 'class' and a[1] is not None and pat in a[1]:
+                return True
+
+        return False
+
+    def handle_starttag(self, tag: str, attrs: AttrsType) -> None:
+        logger.debug(f"Encountered a start tag: {tag}, {attrs}")
+
+        if self.state == State.AWAIT_DIV and tag == 'div':
+            # We look for a div with toctree* class.
+            if self.has_class('toctree', attrs):
+                self.set_state(State.AWAIT_LI)
+                return
+
+        elif self.state == State.AWAIT_LI and tag == 'li':
+            # We look for an li with toctree* class.
+            if self.has_class('toctree', attrs):
+                self.set_state(State.AWAIT_A)
+                return
+
+        elif self.state == State.AWAIT_A:
+            # We expect an a with a reference internal class and a href.
+            if tag == 'a' and self.has_class('reference internal', attrs):
+                for a in attrs:
+                    if a[0] == 'href':
+                        self.current['href'] = a[1]
+                        self.set_state(State.AWAIT_FIRST_DATA)
+                        return
+
+            self.set_state(State.AWAIT_LI)
+
+        elif self.state == State.AWAIT_FIRST_DATA:
+            self.set_state(State.AWAIT_LI)
+
+        elif self.state == State.AWAIT_MORE_DATA:
+            # Ignore most of the tags when inside the data.
+            if tag != 'a':
+                return
+
+            self.set_state(State.AWAIT_LI)
+
+    def handle_endtag(self, tag: str) -> None:
+        logger.debug(f"Encountered an end tag : {tag}")
+
+        if self.state in (State.AWAIT_A, State.AWAIT_FIRST_DATA):
+            self.set_state(State.AWAIT_LI)
+
+        elif self.state == State.AWAIT_MORE_DATA:
+            if tag != 'a':
+                # Ignore most of the tags when inside the data.
+                return
+
+            # else: tag == 'a'
+            # When we have all the data, store the index entry.
+
+            # We need to filter the section titles a bit because they sometimes
+            # contain a few remaining unicode characters.
+            self.current['title'] = re.sub(
+                r'[\x80-\xff]+', '-', self.current['title'])
+
+            logger.debug(f"Index entry: {self.current}")
+            self.index.append(self.current)
+            self.current = {}
+            self.set_state(State.AWAIT_LI)
+
+    def handle_data(self, data: str) -> None:
+        logger.debug(f"Encountered some data  : {data}")
+
+        if self.state == State.AWAIT_A:
+            self.set_state(State.AWAIT_LI)
+
+        elif self.state == State.AWAIT_FIRST_DATA:
+            # Inside the li, and the a, we look for a first data in the right
+            # format.
+            m = re.match(r'([A-Z0-9\.]*[0-9])\. (.*)', data)
+            if m:
+                num = m[1]
+
+                # Bail out at first duplicate.
+                if num in self.nums:
+                    self.set_state(State.DONE)
+                    return
+
+                self.nums.add(num)
+                self.current['num'] = num
+                self.current['title'] = m[2]
+                self.set_state(State.AWAIT_MORE_DATA)
+                return
+
+        elif self.state == State.AWAIT_MORE_DATA:
+            # We might have more data.
+            self.current['title'] += data
+            return
+
+
+def update_index(index_url: str, csv_filename: str) -> None:
+    """Update index database.
+    We download the index and create a csv containing lines in the following
+    format:
+    <chapter number>,<chapter title>,<url>
+    """
+    # Download index
+    logger.info(f"Downloading {index_url}")
+    req = requests.get(index_url, allow_redirects=True, timeout=60.0)
+    # logger.debug(req)
+
+    # Parse HTML
+    logger.debug('Parsing')
+    parser = IndexHtmlParser()
+    parser.feed(req.text)
+    # logger.debug(parser.index)
+
+    # Save csv
+    logger.info(f"Saving {csv_filename}")
+    url_prefix = os.path.dirname(index_url)
+
+    with open(csv_filename, 'w', encoding='utf-8', newline='') as f:
+        writer = csv.writer(f, lineterminator='\n')
+
+        for e in parser.index:
+            writer.writerow(
+                [e['num'], e['title'], f"{url_prefix}/{e['href']}"])
+
+
+if __name__ == '__main__':
+    logging.basicConfig(level=logging.INFO)
+    update_index(UEFI_INDEX_URL, uefi_csv)
diff --git a/source/chapter1-about.rst b/source/chapter1-about.rst
@@ -188,7 +188,7 @@ section by using the section sign §.
 
 Examples:
 
-UEFI § 6.1 - Reference to the UEFI specification [UEFI]_ section 6.1
+:UEFI:`6.1` - Reference to the UEFI specification [UEFI]_ section 6.1
 
 Terms and abbreviations
 =======================

diff --git a/source/chapter2-uefi.rst b/source/chapter2-uefi.rst
@@ -18,14 +18,14 @@ UEFI Compliance
 EBBR compliant platform shall conform to a subset of the [UEFI]_ spec as listed
 in this section.
 Normally, UEFI compliance would require full compliance with all items listed
-in UEFI § 2.6.
+in :UEFI:`2.6`.
 However, the EBBR target market has a reduced set of requirements,
 and so some UEFI features are omitted as unnecessary.
 
 Required Elements
 -----------------
 
-This section replaces the list of required elements in [UEFI]_ § 2.6.1.
+This section replaces the list of required elements in :UEFI:`2.6.1`.
 All of the following UEFI elements are required for EBBR compliance.
 
 .. list-table:: UEFI Required Elements
@@ -58,7 +58,7 @@ All of the following UEFI elements are required for EBBR compliance.
    * - `EFI_DEVICE_PATH_UTILITIES_PROTOCOL`
      - Interface for creating and manipulating UEFI device paths.
 
-.. list-table:: Notable omissions from UEFI § 2.6.1
+.. list-table:: Notable omissions from :UEFI:`2.6.1`
    :widths: 50 50
    :header-rows: 1
 
@@ -70,7 +70,7 @@ All of the following UEFI elements are required for EBBR compliance.
 Required Platform Specific Elements
 -----------------------------------
 
-This section replaces the list of required elements in [UEFI]_ § 2.6.2.
+This section replaces the list of required elements in :UEFI:`2.6.2`.
 All of the following UEFI elements are required for EBBR compliance.
 
 .. list-table:: UEFI Platform-Specific Required Elements
@@ -104,15 +104,15 @@ All of the following UEFI elements are required for EBBR compliance.
    * - `EFI_SIMPLE_NETWORK_PROTOCOL`
      - Required if the platform has a network device.
    * - HTTP Boot
-     - Required if the platform supports network booting. (UEFI § 24.7)
+     - Required if the platform supports network booting. (:UEFI:`24.7`)
    * - `RISCV_EFI_BOOT_PROTOCOL`
-     - Required on RISC-V platforms. (UEFI § 2.3.7.1 and [RVUEFI]_)
+     - Required on RISC-V platforms. (:UEFI:`2.3.7.1` and [RVUEFI]_)
 
-The following table is a list of notable deviations from UEFI § 2.6.2.
+The following table is a list of notable deviations from :UEFI:`2.6.2`.
 Many of these deviations are because the EBBR use cases do not require
 interface specific UEFI protocols, and so they have been made optional.
 
-.. list-table:: Notable Deviations from UEFI § 2.6.2
+.. list-table:: Notable Deviations from :UEFI:`2.6.2`
    :widths: 50 50
    :header-rows: 1
 
@@ -171,7 +171,7 @@ Required Global Variables
 -------------------------
 
 EBBR compliant platforms are required to support the following Global
-Variables as found in [UEFI]_ § 3.3.
+Variables as found in :UEFI:`3.3`.
 
 .. list-table:: Required UEFI Variables
    :widths: 50 50
@@ -201,7 +201,7 @@ Required Variables for capsule update "on disk"
 
 When the firmware implements in-band firmware update with `UpdateCapsule()` it
 must support the following Variables to report the status of capsule "on disk"
-processing after restart as found in [UEFI]_ § 8.5.6. [#FWUpNote]_
+processing after restart as found in :UEFI:`8.5.6`. [#FWUpNote]_
 
 .. list-table:: UEFI Variables required for capsule update "on disk"
    :widths: 50 50
@@ -244,7 +244,7 @@ AArch64 Exception Levels
 ------------------------
 
 On AArch64 UEFI shall execute as 64-bit code at either EL1 or EL2, as defined in
-[UEFI]_ § 2.3.6, depending on whether or not virtualization is available at OS
+:UEFI:`2.3.6`, depending on whether or not virtualization is available at OS
 load time.
 
 UEFI Boot at EL2
@@ -530,7 +530,7 @@ If a platform does not implement modifying non-volatile variables with
 then firmware shall return `EFI_UNSUPPORTED` for any call to `SetVariable()`,
 and must advertise that `SetVariable()` isn't available during runtime services
 via the `RuntimeServicesSupported` value in the `EFI_RT_PROPERTIES_TABLE`
-as defined in [UEFI]_ § 4.6.2.
+as defined in :UEFI:`4.6.2`.
 EFI applications can read `RuntimeServicesSupported` to determine if calls
 to `SetVariable()` need to be performed before calling `ExitBootServices()`.
 
@@ -559,17 +559,18 @@ EBBR platforms are required to implement either an in-band or an out-of-band fir
 
 If firmware update is performed in-band (firmware on the application processor updates itself),
 then the firmware shall implement the `UpdateCapsule()` runtime service and accept updates in the
-"Firmware Management Protocol Data Capsule Structure" format as described in [UEFI]_ § 23.3,
-"Delivering Capsules Containing Updates to Firmware Management Protocol".  [#FMPNote]_
-Firmware is also required to provide an EFI System Resource Table (ESRT). [UEFI]_ § 23.4
+"Firmware Management Protocol Data Capsule Structure" format as described in
+:UEFI:`23.3`. [#FMPNote]_
+Firmware is also required to provide an EFI System Resource Table (ESRT) as
+described in :UEFI:`23.4`.
 Every firmware image that can be updated in-band must be described in the ESRT.
 Firmware must support the delivery of capsules via file on mass storage device
-("on disk") as described in [UEFI]_ § 8.5.5. [#VarNote]_
+("on disk") as described in :UEFI:`8.5.5`. [#VarNote]_
 
 .. note::
    It is recommended that firmware implementing the `UpdateCapsule()` runtime
    service and an ESRT also implement the `EFI_FIRMWARE_MANAGEMENT_PROTOCOL`
-   described in [UEFI]_ § 23.1. [#FMProtoNote]_
+   described in :UEFI:`23.1`. [#FMProtoNote]_
 
 If firmware update is performed out-of-band (e.g., by an independent Baseboard
 Management Controller (BMC), or firmware is provided by a hypervisor),
@@ -592,7 +593,7 @@ service and it is not required to provide an ESRT.
 .. [#FMProtoNote] At the time of writing, both Tianocore/EDK2 and U-Boot are
    using the `EFI_FIRMWARE_MANAGEMENT_PROTOCOL` internally to support their
    implementation of the `UpdateCapsule()` runtime service and of the ESRT,
-   as detailed in [UEFI]_ § 23.3 and 23.4 respectively.
+   as detailed in :UEFI:`23.3` and :UEFI:`23.4` respectively.
 
 Miscellaneous Runtime Services
 ------------------------------

diff --git a/source/chapter4-firmware-media.rst b/source/chapter4-firmware-media.rst
@@ -48,7 +48,7 @@ Partitioning of Shared Storage
 ==============================
 
 The shared storage device must use the GUID Partition Table (GPT) disk
-layout as defined in [UEFI]_ § 5.3, unless the platform boot sequence is
+layout as defined in :UEFI:`5.3`, unless the platform boot sequence is
 fundamentally incompatible with the GPT disk layout.
 In which case, a legacy Master Boot Record (MBR) must be used.
 [#MBRReqExample]_
@@ -101,7 +101,7 @@ GPT partitioning
 ----------------
 
 The partition table must strictly conform to the UEFI specification and include
-a protective MBR authored exactly as described in [UEFI]_ § 5.3 (hybrid
+a protective MBR authored exactly as described in :UEFI:`5.3` (hybrid
 partitioning schemes are not permitted).
 
 Fixed-location firmware images must be protected by creating protective
@@ -123,7 +123,7 @@ adjusting the GUID Partition Entry array location
 and `SizeOfPartitionEntry`),
 or by specifying the usable LBAs (Choosing `FirstUsableLBA`/`LastUsableLBA`
 to not overlap the fixed firmware location).
-See [UEFI]_ § 5.3.2.
+See :UEFI:`5.3.2`.
 
 Given the choice, platforms should use protective partitions over
 adjusting the placement of GPT data structures because protective partitions

diff --git a/source/chapter5-variable-storage.rst b/source/chapter5-variable-storage.rst
@@ -101,7 +101,7 @@ DataSize
 
 Attributes
     This field is a bitmap with the variable attributes as defined in
-    [UEFI]_ § 8.2.1.
+    :UEFI:`8.2.1`.
 
 TimeStamp
     For time-based authenticaed variables this field contains the timestamp