Skip to content

shegeley/hatis

Repository files navigation

This is a very early-stage project (alpha-version) + a set of experiments of building HAckable Text Input System (HATIS)

! Moving from Guile Scheme to Common Lisp !

Friendship is over with Guile Scheme. Now Common Lisp is my new friend.

This project was originated in Guile Scheme using realatively new library guile-wayland. I think Guile is a great language and the leverage from the Guix ecosystem is awesome too.

But later on I’ve realised that the library is still pretty raw and buggy in some places. I tried to figure out the internals of it and send fixes, but it was too hard.

Then I’ve looked on other lisps: Racket and Common Lisp.

And in the last one I’ve found Wayflan which is just great and well-tested. I have almost 0 knowledge of CL before (only guile scheme & clojure), but hey, how bad can it be? Mature and stable language after all. Old Guile attempt can be found at guile-dev branch.

Description

Text-editors are dead as a concept. What’s needed is a text-input system. Mobile phones got it right more than 10 years ago. Both Android and iOS can catch the text-input context: «ah here we can input text, let’s show the virtual keyboard!».

This project is inspired by very same idea: catch the text-input context globally (across all system, not just one process) and do what’s needed: change the UI, keybindings, etc. Emacs got some part of text-input right with modes. But modes should be global, on Window Manager level (or even deeper).

In GUI it’s possible to “catch input context” using Wayland::InputMethod

It should also be possible on pure-tty with readline or something.

The system should be very hackable. That’s why it’s written in Guile Scheme Common Lisp.

Core idea

Core idea for the project of this repository comes from the naming: “Hackable Text Input System”.

Hackable := easily extensible and possible to experiment with is; that’s why hackable language with hackable ecosystem and REPL is chosen

Text Input System := system that deals with Text input (multimedia comes afterwards, maybe in another module)

So, at the first sight it’s just should be able to “catch the current context” of text input [on GUI (Wayland via zwp_input_method_v2)]

context := text input “event”. “context” example:

  • it’s started and “called by” wl_server/wl_client [not sure yet who calls it] wl-server/wl-client
  • on wl_seat wl-seat
  • with wl_popup_input_surface wpis1
  • by “window with PID X”, “title Y”, dimensions “A” and “B”
  • and other `metadata/context`

File system context

It’s very important that even with those setup if user edits some real project with code it still won’t be able to get the context of it inside text-input system:

  • “I’m editing the file ending with [.scm] extension”
  • “inside the repo ~/home/Projects/hatis”
  • “that has [this] filetree”
  • etc

From now on let’s call all that above “file system (fs) context” and it needs to be provided customly.

Window manager context

There is also “wm-specific-context”. As far as I’ve got wayland itself has no unified way to tell windows/surfaces’es metadata:

  • pid
  • title
  • size
  • etc

Let’s call that above “window manager (wm) context”.

Now let’s formulate the requirements/definitions

Requirements/Definitions

Stability

  • pre-alpha and alpha APIS could break at any point if it’s reasonable enough
  • can’t yet garantee something on betas

pre-alpha

  • [ ] “hatis” shoule be able to collect as much context/meta (without file-system-context and wm-specific-context) as possible and stream it to ChanL channel; context system should support different “context providers” (be extendable)
  • [ ] collect it inside the “good REPL’able constuct” := well-written wayland-event-loop-alike entity that is easily extenadble, playfull and connectable in runtime with REPL

— there is even no “user” yet. btw “user” should also be defined in the context of this system

alpha-1

  • [ ] user should be able to catch text-input context with customizable channels [just cond predicate] (“came from pid X and app title satisfies [some-regexp]) in Sway WM;
  • [ ] and pass it forward as a file-desriptor or port wherever he/she would like (emacs/vim)
  • [ ] should have basic tests (hard question is how to do it properly)

alpha-2

  • [ ] user has access to the text-input (real utf8 string) of the context and change it however he wants (regexps, call http-api services and do some custom stuff, etc. just calling any procs (gexps?) with that string [but they should also return string])
  • [ ] user can write&call custom interceptors. Like: “I’m in text editing” (default rule) “AND user pressed X” or “moved mouse that much” etc.

beta-1

  • [ ] user should not even need to call any wayland (wayflan) code to be able to manipulate with text-input as he/she/etc wish
  • [ ] user get’s fs-context from it’s (default, it can be replaced with custom) provider

beta-2

  • [ ] it has some basic UI; ui’s at it’s best should be described with lisp data strucrutes (be xml/json/yaml/somehing serializable) and have custom “resolvers”
  • [ ] it has it’s own REPL UI panel (not too ugly): user can see and interact with system in realtime (change keybingings in the input field of that very system and see the current “events” happening [keypresses, maybe buffers and contextes])

Dev Setup(s)

Run the project with: make sway+mrepl or ~~make sway+tm/mrepl~~ to launch it with “windowed” sway or make mrepl to launch in on your’s current wayland compositor.

Notes

On wl_seat and it’s capabilities

(wl-seat-get-...) (touch/pointer/keyboard) can be called even when there is no relatable capability (when there is no touch devices, for example)

it will cause “sneaky error”: it will be written in a console, that there are no capability, but no exception will be thrown, so it’s not possible to catch it with (with-exception-handler ...). and it will, in fact “return” wl-touch object. althought it will be “broken”.

workaround is to always (extract-capabilities ...) and only “do things” after getting the exact capabilities. also know: capabilities might change directly (disconnect the keyboard/mouse), so the should also be updated and refered each time. never assume constance.

Might be the key to distingushing text-inputs (getting + saving text-input context). At least it “knows” it’s surface.

Context of input method protocol

What can I access with input method manager and input method itself is just wl_seat.

Having wl_seat I can access current focused surface (window), wl_pointer, wl_touch and wl_keyboard. (keyboard can be accessed via keyboard_grab directly anyway).

On distinguishing text input context and getting the text that’s already into text field

By default wayland doesn’t provide an ability to explicitly “distingush” one input context from another and also access what’s already commited into text-input.

So, imagine the simplest usecase: redirecting the input to some socket and editing it via emacs. As soon as emacs in closed and commit event is done there won’t be option to retrieve the commited text and focusing there again and sending to emacs will cause only appending new text.

It would be nice to save “unfinished” text-input. But it’s not possible in the current implementation of wayland (wlroots) input-method protocol.

At max I can have the history of inserts and their identifiers (this/or that window/app). — Althought it might be possible using some hardcore clever memory tricks or later in new wayland protocols & versions. See the todo.

input-method keypress event keycode

The scancode from this event is the Linux evdev scancode. To translate this to an XKB scancode, you must add 8 to the evdev scancode.

Scheme Code:

(define (keycode:evdev->xkb keycode)
  "Translates evdev keycode to xkb keycode"
  (+ keycode 8))

On XOrg+XWayland Input Method possibilities

Xorg has it’s own input-method protocol (standartized in 1993/4!) https://www.x.org/releases/X11R7.6/doc/libX11/specs/XIM/xim.html

For now X support is not a priority. XWayland also has keyboard grab support. See [[id:\[\[id:8793f30e-76d8-4443-a048-fc760da8918e\]\]][the task]].

On input-popup-surface vs surface vs xdg-surface

Input-popup-surface is another breed. Won’t cast to any other.

  • wl_display_roundtrip - Block until all pending request are processed by the server

    Returns: The number of dispatched events on success or -1 on failure This function blocks until the server has processed all currently issued requests by sending a request to the display server and waiting for a reply before returning.

    This function blocks until the server has processed all currently issued requests by sending a request to the display server and waiting for a reply before returning.

    This function uses wl_display_dispatch_queue() internally. It is not allowed to call this function while the thread is being prepared for reading events, and doing so will cause a dead lock.

    Note: This function may dispatch other events being received on the default queue.

  • wl_display_dispatch - Dispatch events on the default event queue.

    If the default event queue is empty, this function blocks until there are events to be read from the display fd. Events are read and queued on the appropriate event queues. Finally, events on the default event queue are dispatched. On failure -1 is returned and errno set appropriately.

    In a multi threaded environment, do not manually wait using poll() (or equivalent) before calling this function, as doing so might cause a dead lock. If external reliance on poll() (or equivalent) is required, see wl_display_prepare_read_queue() of how to do so.

    This function is thread safe as long as it dispatches the right queue on the right thread. It is also compatible with the multi thread event reading preparation API (see wl_display_prepare_read_queue()), and uses the equivalent functionality internally. It is not allowed to call this function while the thread is being prepared for reading events, and doing so will cause a dead lock.

    Note: It is not possible to check if there are events on the queue or not. For dispatching default queue events without blocking, see wl_display_dispatch_pending(). See also: wl_display_dispatch_pending(), wl_display_dispatch_queue(), wl_display_read_events()

  • wl_display_flush - Send all buffered requests on the display to the server.

    Returns: The number of bytes sent on success or -1 on failure

    Send all buffered data on the client side to the server. Clients should always call this function before blocking on input from the display fd. On success, the number of bytes sent to the server is returned. On failure, this function returns -1 and errno is set appropriately.

    wl_display_flush() never blocks. It will write as much data as possible, but if all data could not be written, errno will be set to EAGAIN and -1 returned. In that case, use poll on the display file descriptor to wait for it to become writable again.

  • wl_display_sync - asynchronous roundtrip

    The sync request asks the server to emit the ‘done’ event on the returned wl_callback object. Since requests are handled in-order and events are delivered in-order, this can be used as a barrier to ensure all previous requests and the resulting events have been handled.

    The object returned by this request will be destroyed by the compositor after the callback is fired and as such the client must not attempt to use it after that point.

    The callback_data passed in the callback is the event serial.

«Instead wl_display_roundtrip is similar to wl_display_dispatch, but use a sync request to receive an event and prevent blocking» - Axel Davy (@davyaxel)

— dispatch is used in wlroots examples, guile-wayland’s source code examples and in the Wayland Book — seems like dispatch is the way

XKB character signature → utf-8 symbol

See

TODOS

Security policy (maybe DBUS?) [far future, low priority]

Ask @avp (@artyom-poptsov) once again for clarifications. Pairing?

Draft a simple GTK-based UI to show interactive “input-repl” (events in real-time)

Figure out testing [high priority]

How do I test it from non-personal PC? Simulate wayland when being ssh-access only. Will be needed for automations like github actions or something. — It seems to be possible on very basic level. See guile-wayland/tests. — Also learn how the sway, mutter and etc themselfs are tested

Catch clipboard (wl_data_control/source + wl_primary_selection or something) [easy, high priority]

For now just create default listener that will simply print events — There is on wl_clipboard, only “wl_data_source”. See:

  1. https://emersion.fr/blog/2020/wayland-clipboard-drag-and-drop/
  2. https://wayland.app/protocols/primary-selection-unstable-v1
  3. https://wayland.app/protocols/wlr-data-control-unstable-v1

Get “file system context” (see definitions)

Parse xkb keymap format [low priority]

Having a keypress uid I can translate it to any character with any keyboard mapping basically.

I can parse what’s came from grabbed keyword:

xkb_keymap {
    xkb_keycodes "(unnamed)" {
            minimum = 8;
            maximum = 708;
            <ESC>                = 9;
            <AE01>               = 10;
            <AE02>               = 11;
            <AE03>               = 12;
            <AE04>               = 13;
            <AE05>               = 14;
            <AE06>               = 15;
            <AE07>               = 16;
            <AE08>               = 17;
            <AE09>               = 18;
            <AE10>               = 19;
            ...
     };

    xkb_types "(unnamed)" {
            virtual_modifiers NumLock,Alt,LevelThree,LevelFive,Meta,Super,Hyper,ScrollLock;

            type "ONE_LEVEL" {
                    modifiers= none;
                    level_name[1]= "Any";
            };
            ...
    };

    xkb_compatibility "(unnamed)" {
    ...
    };

    xkb_symbols "(unnamed)" {
            name[Group1]="English (US)";
            name[Group2]="Russian";

            key <ESC>                {	[          Escape ] };
            key <AE01>               {
                    symbols[Group1]= [               1,          exclam ],
                    symbols[Group2]= [               1,          exclam ]
            };
            key <AE02>               {
                    symbols[Group1]= [               2,              at ],
                    symbols[Group2]= [               2,        quotedbl ]
            };
            key <AE03>               {
                    symbols[Group1]= [               3,      numbersign ],
                    symbols[Group2]= [               3,      numerosign ]
            };
            key <AE04>               {
                    symbols[Group1]= [               4,          dollar ],
                    symbols[Group2]= [               4,       semicolon ]
            };
         ...
    };
};

Into multiple hash-maps:

  1. keycodes to “keynames” [number - <AB10>]
  2. keyname -> group symbol

— See: guile-wl-play/kdb-parse by @mwette

Figure out if (how?) it’s possible to distinguish one “text-input-context” from another [HARD]

Figure out proper event loop/event flow & handling using guile fibers [very high priority] (with @abcdw help)

Also try XWayland keyboard grabbing [very low priority]

Alas it seems broken at it’s very core:

The protocol:

  • does not guarantee that the grab itself is applied for a surface, the grab request may be silently ignored by the compositor,
  • does not guarantee that any events are sent to this client even if the grab is applied to a surface,
  • does not guarantee that events sent to this client are exhaustive, a compositor may filter some events for its own consumption,
  • does not guarantee that events sent to this client are continuous, a compositor may change and reroute keyboard events while the grab is nominally active.

Try distinguish text-input context via it’s popup surface

Is popup surface something that just appears and dies right away or it’s “saved” and can be compared on equality? — Answer: no. Popup surface won’t allow it.

Figure out basic log (output) management

Not to mess the arei/ares buffer I’m redirecting all the output to some sample file (for now it’s just “./output.txt”) and write all to that using open-port (in append-mode) + with-output-to-port.

It can be monitored with tail -f output.txt | nl (last one is number lines, optional)

Questions

Do I need to create sepate virtual keyboard when grabbing the input?

For what? Kinda hotswap?

To grab wl_pointer+touch on active input-method?

By-default input-method allows only keyboard-grab, not pointer or touch grabs. Is it realiable to grab them globall (from the given wl_seat) and use during input to build interfaces?

Resources

  1. Writing Wayland Clients (@bugaevc)
  2. The Wayland Book (Drew DeVault)
  3. Wayland.app
  4. guille-wl-play (@mwette)