I’ve recently enhanced my emacsclient(1) wrapper script to make it possible to have my primary Emacs daemon always running under gdb. That way, if there’s a seemingly-random crash, I might be able to learn something about what happened. The tricky thing is that I want gdb running inside an instance of Emacs too, because Emacs has a nice interface to gdb, and gdb’s Emacs daemon – hereafter “gdbmacs” – needs to be the installed, optimised build of Emacs, such that it’s not likely to suffer the same crash. And the whole thing should be transparent: I shouldn’t have to do anything special to launch the primary session under gdb. That is, if right after booting up my machine I execute

% emacsclient foo.txt

then gdbmacs should start, it should then start the primary sesion under gdb, and finally the real emacsclient(1) should connect to the primary session and request editing foo.txt. I’ve got that all working now, and there are some nice additional features. If the primary session hits a breakpoint, for example, then emacsclient requests will be redirected to gdbmacs, so that I can still edit files etc. without losing the information in the gdb session. I’ve given gdbmacs a different background colour, so that if I request a new graphical frame and it pops up with that colour, I know that the main session is wedged and I might like to investigate.

First attempt: remote attaching

My first attempt, which was running for several weeks, had a different architecture. Instead of having gdbmacs start up the primary session, the primary session would start up gdbmacs, if necessary, and then send over its own PID, and ask gdbmacs to use gdb’s functionality for attaching to existing processes:

(defun spw/remote-gdbmacs-attach ()
  (interactive)
  (call-process "emacsclient" nil "*gdbmacs-emacsclient*" nil
                "--socket-name=gdbmacs" "--spw/installed" "--eval"
                (prin1-to-string `(spw/gdbmacs-attach ,(emacs-pid)))))

(defun spw/maybe-remote-gdbmacs-attach ()
  (when (and (eq (daemonp) t) ; `daemonp' can return a string
             (file-in-directory-p invocation-directory "~/src/emacs/"))
    (spw/remote-gdbmacs-attach)))
(add-hook 'after-init-hook #'spw/maybe-remote-gdbmacs-attach)

So, if we are an Emacs that just started up out of my clone of Emacs’s upstream git repository in ~/src/emacs, invoke the emacsclient(1) wrapper script to have gdbmacs attach to us. The --spw/installed option asks the wrapper script to start up gdbmacs using the Emacs binary on PATH, not the one in ~/src/emacs. (We can’t use the server-eval-at function because we need the wrapper script to start up gdbmacs if it’s not already running.)

Over in gdbmacs, the spw/gdbmacs-attach function then did something like this:

(let ((default-directory (expand-file-name "~/src/emacs/")))
  (gdb (format "gdb -i=mi --pid=%d src/emacs" pid))
  (gdb-wait-for-pending (lambda () (gud-basic-call "continue"))))

Having gdbmacs attach to the existing process is more robust than having gdbmacs start up Emacs under gdb. If anything goes wrong with attaching, or with gdbmacs more generally, you’ve still got the primary session running normally – it just won’t be under a debugger. More significantly, the wrapper script doesn’t need to know anything about the relationship between the two daemons. It just needs to be able to start up both in-tree and installed daemons, via its --spw/installed option. The complexity is all in Lisp, not shell script (the wrapper is a shell script because it needs to be fast).

The disadvantage of this scheme is that the primary session’s stdout and stderr are not directly accessible to gdbmacs. There is a function redirect-debugging-output to deal with this situation, and I experimented with having spw/remote-gdbmacs-attach call this and send the new output filename to gdbmacs, but it’s much less smooth than having gdbmacs start up the primary session itself.

I think most people would probably prefer this scheme. It’s definitely cleaner to have the two daemons start up independently, and then have one attach to the other. But I decided that I was willing to complexify my wrapper script in order to have the primary session’s stdout and stderr attached to gdbmacs in the normal way.

Second attempt: daemons starting daemons

In this version, the relevant logic is shifted out of Lisp into the wrapper script. When we execute emacsclient foo.txt, the script first determines whether the primary session is already running, using something like this:

[ -e /run/user/1000/emac/server \
    -a -n "$(ss -Hplx src /run/user/1000/emacs/server)" ]

The ss(8) tool is used to determine if anything is listening on the socket. The script also uses flock(1) to have other instances of the wrapper script wait, in case they are going to cause the daemon to exit, or something. If the daemon is running, then we can just exec ~/src/emacs/lib-src/emacsclient to handle the request. If not, we first have to start up gdbmacs:

installed_emacsclient=$(PATH=$(echo "$PATH" \
                   | sed -e "s#/directory/containing/wrapper/script##") \
                command -v emacsclient)
"$installed_emacsclient" -a '' -sgdbmacs --eval '(spw/gdbmacs-attach)'

spw/gdbmacs-attach now does something like this:

(let ((default-directory (expand-file-name "~/src/emacs/")))
  (gdb "gdb -i=mi --args src/emacs --fg-daemon")
  (gdb-wait-for-pending
   (lambda ()
     (gud-basic-call "set cwd ~")
     (gdb-wait-for-pending
      (lambda ()
        (gud-basic-call "run"))))))

"$installed_emacsclient" exits as soon as spw/gdbmacs-attach returns, which is before the primary session has started listening on the socket, so the wrapper script uses inotifywait(1) to wait until /run/user/1000/server appears. Then it is finally able to exec ~/src/emacs/lib-src/emacsclient to handle the request.

A particular kind of complexity

The wrapper script must be highly reliable. I use my primary Emacs session for everything, on the same laptop that I do my academic work. The main way I get at it is via a window manager shortcut that executes emacsclient -nc to request a new frame, such that if there is a problem, I won’t see any error output until I open an xterm and tail ~/.swayerr/~/.xsession-errors. And as starting gdbmacs and only then starting up less optimised, debug in-tree builds of Emacs is not fast, I would have to wait at least ten seconds without any Emacs frame popping up before I could suppose that something was wrong.

This is where the first scheme, where the complexity is all in Lisp, really seems attractive. My emacsclient(1) wrapper script has several other facilities and convenience features, some of which are general and some of which are only for my personal usage patterns, and the code for all those is now interleaved with the special cases for gdbmacs and the primary session that I’ve described in this post. There’s a lot that could go wrong, and it’s all in shell, and its output isn’t readily visible to the user. I’ve done a lot of testing, and I’m pretty confident in the script in its current form, but if I need to change or add features, I’ll have to do a lot of testing again before I can deploy to my usual laptop.

Single-threaded, readily interactively-debuggable Emacs Lisp really shines for this sort of “do exactly what I mean, as often as possible” code, and you find a lot of it in Emacs itself, third party packages, and peoples’ init.el files. You can add all sorts of special cases to your interactive commands to make Emacs do just what is most useful, and have confidence that you can manage the resulting complexity. In this case, though, I’ve got piles of just this sort of complexity out in an opaque shell script. The ultimate goal, though, is debugging Emacs, such that one can run yet more DJWIM Emacs Lisp, which perhaps justifies it.

Posted Thu 03 Nov 2022 18:50:28 UTC