XF86keysym.h: reserve a range for Linux kernel keysyms

The Linux kernel adds a few evdev keycodes roughly every other release. These aren't available as keysyms through XKB until they have been added as keycode in xkeyboard-config and mapped there to a newly defined keysym in the X11 proto headers. In the past, this was done manually, a suitable keysym was picked at random and the mapping updated accordingly. This doesn't scale very well and, given we have a large reserved range for XF86 keysyms anyway, can be done easier. Let's reserve the range 0x10081XXX range for a 1:1 mapping of Linux kernel codes. That's 4095 values, the kernel currently uses only 767 anyway. The lower 3 bytes of keysyms within that range have to match the kernel value to make them easy to add and search for. Nothing in X must care about the actual keysym value anyway. Since we expect this to be parsed by other scripts for automatic updating, the format of those #defines is quite strict. Add a script to generate keycodes as well as verify that the existing ones match the current expected format. The script is integrated into the CI and meson test, so we will fail if an update breaks the expectations. Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
author: Peter Hutterer <peter.hutterer@who-t.net> 2021-01-18 11:37:39 +1000
committer: Peter Hutterer <peter.hutterer@who-t.net> 2021-02-08 14:52:02 +1000
commit: 5dbb5b76597f434ec91cfcde0750de8157c0bbf5 (patch)
tree: 568e8056be50e0b096c70191fd8d5df3f0f9a94b
parent: 70e990f09c54033097ed21caebf0dc73ec738aaf (diff)
5 files changed, 520 insertions, 3 deletions
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 4c648cf..3700fbd 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -12,10 +12,11 @@ variables:
   FDO_UPSTREAM_REPO: 'xorg/proto/xorgproto'
   # Changing the tag will rebuild the container images. The value is just a
   # string, but we use the date for human benefits.
-  FDO_DISTRIBUTION_TAG: '2021-01-20.0'
+  FDO_DISTRIBUTION_TAG: '2021-01-21.1'
 
 stages:
   - prep
+  - check
   - build
   - test
 
@@ -28,7 +29,7 @@ container-prep:
     # minimal set of packages required to build and install
     BASE_PACKAGES: 'meson ninja gcc'
     # extra packages we need for various tests
-    EXTRA_PACKAGES: 'libevdev python python-libevdev'
+    EXTRA_PACKAGES: 'git libevdev python python-libevdev python-black'
     FDO_DISTRIBUTION_PACKAGES: $BASE_PACKAGES $EXTRA_PACKAGES
 
 meson:
@@ -39,8 +40,26 @@ meson:
     matrix:
       - MESON_OPTIONS: ['', '-Dlegacy=true']
   script:
-    - mkdir ../_inst
+    - mkdir -p ../_inst
     - meson builddir --prefix="$PWD/../_inst" $MESON_OPTIONS
     - meson configure builddir
     - ninja -C builddir test
     - ninja -C builddir install
+
+check evdev keysyms:
+  extends:
+    - .fdo.distribution-image@arch
+  stage: test
+  script:
+    - ./scripts/keysym-generator.py --header=include/X11/XF86keysym.h verify
+
+check formatting:
+  extends:
+    - .fdo.distribution-image@arch
+  stage: check
+  script:
+    - black scripts/keysym-generator.py
+    - git diff --exit-code || (echo "Please run Black against the Python script" && false)
+  only:
+    changes:
+      - scripts/keysym-generator.py
diff --git a/README.md b/README.md
index a5af6b6..ba0a2f9 100644
--- a/README.md
+++ b/README.md
@@ -32,3 +32,27 @@ For patch submission instructions, see:
 
   https://www.x.org/wiki/Development/Documentation/SubmittingPatches
 
+
+Updating for new Linux kernel releases
+--------------------------------------
+
+The XF86keysym.h header file needs updating whenever the Linux kernel
+adds a new keycode to linux/input-event-codes.h. See the comment in
+include/X11/XF86keysym.h for details on the format.
+
+The steps to update the file are:
+
+- if the kernel release did not add new `KEY_FOO` defines, no work is
+  required
+- ensure that libevdev has been updated to the new kernel headers. This may
+  require installing libevdev from git.
+- run `scripts/keysym-generator.py` to add new keysyms. See the `--help`
+  output for the correct invocation.
+- verify that the format for any keys added by this script is correct and
+  that the keys need to be mapped. Where a key code should not get a new
+  define or is already defined otherwise, comment the line.
+- file a merge request with the new changes
+- notify the xkeyboard-config maintainers that updates are needed
+
+Note that any #define added immediately becomes API. Due diligence is
+recommended.
diff --git a/include/X11/XF86keysym.h b/include/X11/XF86keysym.h
index 8310fe3..26f9c39 100644
--- a/include/X11/XF86keysym.h
+++ b/include/X11/XF86keysym.h
@@ -232,3 +232,43 @@
 #define XF86XK_Prev_VMode	0x1008FE23   /* prev. video mode available */
 #define XF86XK_LogWindowTree	0x1008FE24   /* print window tree to log   */
 #define XF86XK_LogGrabInfo	0x1008FE25   /* print all active grabs to log */
+
+
+/*
+ * Reserved range for evdev symbols: 0x10081000-0x10081FFF
+ *
+ * Key syms within this range must match the Linux kernel
+ * input-event-codes.h file in the format:
+ *     XF86XK_CamelCaseKernelName	_EVDEVK(kernel value)
+ * For example, the kernel
+ *   #define KEY_MACRO_RECORD_START	0x2b0
+ * effectively ends up as:
+ *   #define XF86XK_MacroRecordStart	0x100812b0
+ *
+ * For historical reasons, some keysyms within the reserved range will be
+ * missing, most notably all "normal" keys that are mapped through default
+ * XKB layouts (e.g. KEY_Q).
+ *
+ * CamelCasing is done with a human control as last authority, e.g. see VOD
+ * instead of Vod for the Video on Demand key.
+ *
+ * The format for #defines is strict:
+ *
+ * #define XF86XK_FOO<tab...>_EVDEVK(0xABC)<tab><tab> |* kver KEY_FOO *|
+ *
+ * Where
+ * - alignment by tabs
+ * - the _EVDEVK macro must be used
+ * - the hex code must be in uppercase hex
+ * - the kernel version (kver) is in the form v5.10
+ * - kver and key name are within a slash-star comment (a pipe is used in
+ *   this example for technical reasons)
+ * These #defines are parsed by scripts. Do not stray from the given format.
+ *
+ * Where the evdev keycode is mapped to a different symbol, please add a
+ * comment line starting with Use: but otherwise the same format, e.g.
+ *  Use: XF86XK_RotationLockToggle	_EVDEVK(0x231)		   v4.16 KEY_ROTATE_LOCK_TOGGLE
+ *
+ */
+#define _EVDEVK(_v) (0x10081000 + _v)
+#undef _EVDEVK
diff --git a/meson.build b/meson.build
index 8da8337..fa44c38 100644
--- a/meson.build
+++ b/meson.build
@@ -95,3 +95,8 @@ ext_xorgproto = declare_dependency(
 )
 
 subdir('include')
+
+keysymfile = join_paths(meson.source_root(), 'include', 'X11', 'XF86keysym.h')
+test('evdev-keysym-check',
+     find_program('scripts/keysym-generator.py'),
+     args: ['-v', '--header', keysymfile, 'verify'])
diff --git a/scripts/keysym-generator.py b/scripts/keysym-generator.py
new file mode 100755
index 0000000..6bcde61
--- /dev/null
+++ b/scripts/keysym-generator.py
@@ -0,0 +1,429 @@
+#!/usr/bin/env python3
+#
+# SPDX-License-Identifier: MIT
+#
+# This script checks XF86keysym.h for the reserved evdev keysym range and/or
+# appends new keysym to that range. An up-to-date libevdev must be
+# available to guarantee the correct keycode ranges and names.
+#
+# Run with --help for usage information.
+#
+#
+# File is formatted with Python Black
+
+import argparse
+import logging
+import sys
+import re
+import libevdev
+import subprocess
+from pathlib import Path
+
+logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s")
+logger = logging.getLogger("ksgen")
+
+start_token = re.compile(r"#define _EVDEVK.*")
+end_token = re.compile(r"#undef _EVDEVK\n")
+
+
+def die(msg):
+    logger.critical(msg)
+    sys.exit(1)
+
+
+class Kernel(object):
+    """
+    Wrapper around the kernel git tree to simplify searching for when a
+    particular keycode was introduced.
+    """
+
+    def __init__(self, repo):
+        self.repo = repo
+
+        exitcode, stdout, stderr = self.git_command("git branch --show-current")
+        if exitcode != 0:
+            die(f"{stderr}")
+        if stdout.strip() != "master":
+            die(f"Kernel repo must be on the master branch (current: {stdout.strip()})")
+
+        exitcode, stdout, stderr = self.git_command("git tag --sort=version:refname")
+        tags = stdout.split("\n")
+        self.versions = list(
+            filter(lambda v: re.match(r"^v[2-6]\.[0-9]+(\.[0-9]+)?$", v), tags)
+        )
+        logger.debug(f"Kernel versions: {', '.join(self.versions)}")
+
+    def git_command(self, cmd):
+        """
+        Takes a single-string git command and runs it in the repo.
+
+        Returns the tuple (exitcode, stdout, stderr)
+        """
+        # logger.debug(f"git command: {cmd}")
+        try:
+            result = subprocess.run(
+                cmd.split(" "), cwd=self.repo, capture_output=True, encoding="utf8"
+            )
+            if result.returncode == 128:
+                die(f"{result.stderr}")
+
+            return result.returncode, result.stdout, result.stderr
+        except FileNotFoundError:
+            die(f"{self.repo} is not a git repository")
+
+    def introduced_in_version(self, string):
+        """
+        Search this repo for the first version with string in the headers.
+
+        Returns the kernel version number (e.g. "v5.10") or None
+        """
+
+        # The fastest approach is to git grep every version for the string
+        # and return the first. Using git log -G and then git tag --contains
+        # is an order of magnitude slower.
+        def found_in_version(v):
+            cmd = f"git grep -E \\<{string}\\> {v} -- include/"
+            exitcode, _, _ = self.git_command(cmd)
+            return exitcode == 0
+
+        def bisect(iterable, func):
+            """
+            Return the first element in iterable for which func
+            returns True.
+            """
+            # bias to speed things up: most keycodes will be in the first
+            # kernel version
+            if func(iterable[0]):
+                return iterable[0]
+
+            lo, hi = 0, len(iterable)
+            while lo < hi:
+                mid = (lo + hi) // 2
+                if func(iterable[mid]):
+                    hi = mid
+                else:
+                    lo = mid + 1
+            return iterable[hi]
+
+        version = bisect(self.versions, found_in_version)
+        logger.debug(f"Bisected {string} to {version}")
+        # 2.6.11 doesn't count, that's the start of git
+        return version if version != self.versions[0] else None
+
+
+def generate_keysym_line(code, kernel, kver_list=[]):
+    """
+    Generate the line to append to the keysym file.
+
+    This format is semi-ABI, scripts rely on the format of this line (e.g. in
+    xkeyboard-config).
+    """
+    evcode = libevdev.evbit(libevdev.EV_KEY.value, code)
+    if not evcode.is_defined:  # codes without a #define in the kernel
+        return None
+    if evcode.name.startswith("BTN_"):
+        return None
+
+    name = "".join([s.capitalize() for s in evcode.name[4:].lower().split("_")])
+    keysym = f"XF86XK_{name}"
+    tabs = 4 - len(keysym) // 8
+    kver = kernel.introduced_in_version(evcode.name) or " "
+    if kver_list:
+        from fnmatch import fnmatch
+
+        allowed_kvers = [v.strip() for v in kver_list.split(",")]
+        for allowed in allowed_kvers:
+            if fnmatch(kver, allowed):
+                break
+        else:  # no match
+            return None
+
+    return f"#define {keysym}{'	' * tabs}_EVDEVK(0x{code:03X})		/* {kver:5s} {evcode.name} */"
+
+
+def verify(ns):
+    """
+    Verify that the XF86keysym.h file follows the requirements. Since we expect
+    the header file to be parsed by outside scripts, the requirements for the format
+    are quite strict, including things like correct-case hex codes.
+    """
+
+    # No other keysym must use this range
+    reserved_range = re.compile(r"#define.*0x10081.*")
+    normal_range = re.compile(r"#define.*0x1008.*")
+
+    # This is the full pattern we expect.
+    expected_pattern = re.compile(
+        r"#define XF86XK_\w+\t+_EVDEVK\(0x([0-9A-F]{3})\)\t+/\* (v[2-6]\.[0-9]+(\.[0-9]+)?)? +KEY_\w+ \*/"
+    )
+    # This is the comment pattern we expect
+    expected_comment_pattern = re.compile(
+        r"/\* Use: \w+\t+_EVDEVK\(0x([0-9A-F]{3})\)\t+   (v[2-6]\.[0-9]+(\.[0-9]+)?)? +KEY_\w+ \*/"
+    )
+
+    # Some patterns to spot specific errors, just so we can print useful errors
+    define = re.compile(r"^#define .*")
+    name_pattern = re.compile(r"#define (XF86XK_[^\s]*)")
+    tab_check = re.compile(r"#define \w+(\s+)[^\s]+(\s+)")
+    hex_pattern = re.compile(r".*0x([a-f0-9]+).*", re.I)
+    comment_format = re.compile(r".*/\* ([^\s]+)?\s+(\w+)")
+    kver_format = re.compile(r"v[2-6]\.[0-9]+(\.[0-9]+)?")
+
+    in_evdev_codes_section = False
+    had_evdev_codes_section = False
+    success = True
+
+    all_defines = []
+
+    class ParserError(Exception):
+        pass
+
+    def error(msg, line):
+        raise ParserError(f"{msg} in '{line.strip()}'")
+
+    last_keycode = 0
+    for line in open(ns.header):
+        try:
+            if not in_evdev_codes_section:
+                if re.match(start_token, line):
+                    in_evdev_codes_section = True
+                    had_evdev_codes_section = True
+                    continue
+
+                if re.match(reserved_range, line):
+                    error("Using reserved range", line)
+                match = re.match(name_pattern, line)
+                if match:
+                    all_defines.append(match.group(1))
+            else:
+                # Within the evdev defines section
+                if re.match(end_token, line):
+                    in_evdev_codes_section = False
+                    continue
+
+                # Comments we only search for a hex pattern and where there is one present
+                # we only check for uppercase format, ordering and update our last_keycode.
+                if not re.match(define, line):
+                    match = re.match(expected_comment_pattern, line)
+                    if match:
+                        if match.group(1) != match.group(1).upper():
+                            error(
+                                f"Hex code 0x{match.group(1)} must be uppercase", line
+                            )
+                        if match.group(1):
+                            keycode = int(match.group(1), 16)
+                            if keycode < last_keycode:
+                                error("Keycode must be ascending", line)
+                            if keycode == last_keycode:
+                                error("Duplicate keycode", line)
+                            last_keycode = keycode
+                    elif re.match(hex_pattern, line):
+                        logger.warning(f"Unexpected hex code in {line}")
+                    continue
+
+                # Anything below here is a #define line
+                # Let's check for specific errors
+                if re.match(normal_range, line):
+                    error("Define must use _EVDEVK", line)
+
+                match = re.match(name_pattern, line)
+                if match:
+                    if match.group(1) in all_defines:
+                        error("Duplicate define", line)
+                    all_defines.append(match.group(1))
+                else:
+                    error("Typo", line)
+
+                match = re.match(hex_pattern, line)
+                if not match:
+                    error("No hex code", line)
+                if match.group(1) != match.group(1).upper():
+                    error(f"Hex code 0x{match.group(1)} must be uppercase", line)
+
+                tabs = re.match(tab_check, line)
+                if not tabs:  # bug
+                    error("Matching error", line)
+                if " " in tabs.group(1) or " " in tabs.group(2):
+                    error("Use tabs, not spaces", line)
+
+                comment = re.match(comment_format, line)
+                if not comment:
+                    error("Invalid comment format", line)
+                kver = comment.group(1)
+                if kver and not re.match(kver_format, kver):
+                    error("Invalid kernel version format", line)
+
+                keyname = comment.group(2)
+                if not keyname.startswith("KEY_") or keyname.upper() != keyname:
+                    error("Kernel keycode name invalid", line)
+
+                # This could be an old libevdev
+                if keyname not in [c.name for c in libevdev.EV_KEY.codes]:
+                    logger.warning(f"Unknown kernel keycode name {keyname}")
+
+                # Check the full expected format, no better error messages
+                # available if this fails
+                match = re.match(expected_pattern, line)
+                if not match:
+                    error("Failed match", line)
+
+                keycode = int(match.group(1), 16)
+                if keycode < last_keycode:
+                    error("Keycode must be ascending", line)
+                if keycode == last_keycode:
+                    error("Duplicate keycode", line)
+
+                # May cause a false positive for old libevdev if KEY_MAX is bumped
+                if keycode < 0x0A0 or keycode > libevdev.EV_KEY.KEY_MAX.value:
+                    error("Keycode outside range", line)
+
+                last_keycode = keycode
+        except ParserError as e:
+            logger.error(e)
+            success = False
+
+    if not had_evdev_codes_section:
+        logger.error("Unable to locate EVDEVK section")
+        success = False
+    elif in_evdev_codes_section:
+        logger.error("Unterminated EVDEVK section")
+        success = False
+
+    if success:
+        logger.info("Verification succeeded")
+
+    return 0 if success else 1
+
+
+def add_keysyms(ns):
+    """
+    Print a new XF86keysym.h file, adding any *missing* keycodes to the existing file.
+    """
+    if verify(ns) != 0:
+        die("Header file verification failed")
+
+    # If verification succeeds, we can be a bit more lenient here because we already know
+    # what the format of the field is. Specifically, we're searching for
+    # 3-digit hexcode in brackets and use that as keycode.
+    pattern = re.compile(r".*_EVDEVK\((0x[a-fA-F0-9]{3})\).*")
+    max_code = max(
+        [
+            c.value
+            for c in libevdev.EV_KEY.codes
+            if c.is_defined
+            and c != libevdev.EV_KEY.KEY_MAX
+            and not c.name.startswith("BTN")
+        ]
+    )
+
+    def defined_keycodes(path):
+        """
+        Returns an iterator to the next #defined (or otherwise mentioned)
+        keycode, all other lines (including the returned one) are passed
+        through to printf.
+        """
+        with open(path) as fd:
+            in_evdev_codes_section = False
+
+            for line in fd:
+                if not in_evdev_codes_section:
+                    if re.match(start_token, line):
+                        in_evdev_codes_section = True
+                    # passthrough for all other lines
+                    print(line, end="")
+                else:
+                    if re.match(r"#undef _EVDEVK\n", line):
+                        in_evdev_codes_section = False
+                        yield max_code
+                    else:
+                        match = re.match(pattern, line)
+                        if match:
+                            logger.debug(f"Found keycode in {line.strip()}")
+                            yield int(match.group(1), 16)
+                    print(line, end="")
+
+    kernel = Kernel(ns.kernel_git_tree)
+    prev_code = 255 - 8  # the last keycode we can map directly in X
+    for code in defined_keycodes(ns.header):
+        for missing in range(prev_code + 1, code):
+            newline = generate_keysym_line(
+                missing, kernel, kver_list=ns.kernel_versions
+            )
+            if newline:
+                print(newline)
+        prev_code = code
+
+    return 0
+
+
+def find_xf86keysym_header():
+    """
+    Search for the XF86keysym.h file in the current tree or use the system one
+    as last resort. This is a convenience function for running the script
+    locally, it should not be relied on in the CI.
+    """
+    paths = tuple(Path.cwd().glob("**/XF86keysym.h"))
+    if not paths:
+        path = Path("/usr/include/X11/XF86keysym.h")
+        if not path.exists():
+            die("Unable to find XF86keysym.h in CWD or /usr")
+    else:
+        if len(paths) > 1:
+            die("Multiple XF86keysym.h in CWD, please use --header")
+        path = paths[0]
+
+    logger.info(f"Using header file {path}")
+    return path
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Keysym parser script")
+    parser.add_argument("--verbose", "-v", action="count", default=0)
+    parser.add_argument(
+        "--header",
+        type=str,
+        default=None,
+        help="Path to the XF86Keysym.h header file (default: search $CWD)",
+    )
+
+    subparsers = parser.add_subparsers(help="command-specific help", dest="command")
+    parser_verify = subparsers.add_parser(
+        "verify", help="Verify the XF86keysym.h matches requirements"
+    )
+    parser_verify.set_defaults(func=verify)
+
+    parser_generate = subparsers.add_parser(
+        "add-keysyms", help="Add missing keysyms to the existing ones"
+    )
+    parser_generate.add_argument(
+        "--kernel-git-tree",
+        type=str,
+        default=None,
+        required=True,
+        help="Path to a kernel git repo, required to find git tags",
+    )
+    parser_generate.add_argument(
+        "--kernel-versions",
+        type=str,
+        default=[],
+        required=False,
+        help="Comma-separated list of kernel versions to limit ourselves to (e.g. 'v5.10,v5.9'). Supports fnmatch.",
+    )
+    parser_generate.set_defaults(func=add_keysyms)
+    ns = parser.parse_args()
+
+    logger.setLevel(
+        {2: logging.DEBUG, 1: logging.INFO, 0: logging.WARNING}.get(ns.verbose, 2)
+    )
+
+    if not ns.header:
+        ns.header = find_xf86keysym_header()
+
+    if ns.command is None:
+        parser.error("Invalid or missing command")
+
+    sys.exit(ns.func(ns))
+
+
+if __name__ == "__main__":
+    main()
author	Peter Hutterer <peter.hutterer@who-t.net>	2021-01-18 11:37:39 +1000
committer	Peter Hutterer <peter.hutterer@who-t.net>	2021-02-08 14:52:02 +1000
commit	5dbb5b76597f434ec91cfcde0750de8157c0bbf5 (patch)
tree	568e8056be50e0b096c70191fd8d5df3f0f9a94b
parent	70e990f09c54033097ed21caebf0dc73ec738aaf (diff)