Wednesday, 4 July 2012

In Which We Experience Despair And Clarity


Well this cursor undraw bug has been a real struggle to fix - I'm absolutely stunned that so trivial an issue could take so long to trace and remedy. I'd isolated the problem as being a mistake in the bit setting that indicated whether the cursor was visible or not (i.e. the bit was clear, implying that the cursor was not visible, but this was not the case as the cursor was very definitely in the 'on' phase of the blink cycle) and so I methodically went back through the code looking for the place where the problem occurred. To no avail, as it turned out, because I just couldn't find anything that was hitting the bit when it shouldn't have been.

I focussed on the logic that handles inverting the bit and the corresponding routine that draws/undraws the cursor in response to that, and did actually find a couple of places where errors could occur, but after reworking the code (and actually making it more efficient) the error persisted. I placed breakpoints at strategic positions and carefully traced the code path through them, checking the bit at each juncture to see if the problem had arisen yet, but it all checked-out OK. I went back to the keyboard routine that handles the keypress which generates the CRLF in case I'd just done something silly there, but that was doing exactly what it was supposed to. I began to despair, because this stupid bug was obviously something utterly trivial and annoying but I just couldn't find the damned thing.

In desperation, I put a breakpoint right back at the point where system initialisation has just finished and the IRQ routine is about to kick-in and start doing the background stuff that makes the screen, cursor, and keyboard work. I checked the value of the bit right there, knowing that it should show the cursor as active and visible - and it did. I then watched the first cycle of IRQ processing run, and the bit was correctly inverted and the cursor draw/undraw routine was called. And then I experienced a lengthy ohnosecond.

You might be asking yourself what an ohnosecond is, and how long it is. Well, it's a variable-length measure of time that describes the duration between doing something silly and then, realising that you've done something silly, experiencing the moment of clarity when you comprehend the awful truth and say 'Oh no' to yourself. It's most commonly associated with stupendous screw-ups such as switching to your database backup directory and accidentally typing 'del *.bak' instead of 'dir *.bak' - you hit the key and the ohnosecond counter starts, stopping only when you wonder why the directory isn't appearing and then realising what you've typed.

So let's just recap on that last debug session: system initialisation has finished, and the cursor display flag has been correctly set to show the cursor as active and visible. The IRQ handler starts-up and checks the cursor-active bit - it's set, so it invokes the cursor handler. The cursor handler inverts the visibility bit (so it's now off) and drops into the draw/undraw routine, which simply inverts the cursor pixels. So where, exactly, after setting the visibility bit during initialisation, did we call the cursor draw/undraw code to actually draw the cursor? Nowhere, that's where - so that first iteration through the IRQ logic inverts the bit (correctly) and inverts the cursor (correctly) except that we hadn't drawn the cursor yet, so inverting it doesn't undraw it in line with the bit, it draws it.

In other words, the bit and the actual draw-state of the cursor are out of phase; although the bit is being inverted each IRQ, and the cursor is being drawn/undrawn on each alternating cycle, they're not synchronised with each other because way back at the beginning we set the bit but forgot to actually draw the cursor.

Oh, no.

No comments:

Post a Comment