?Device Not Present: In Which We Push The Stack Around

I'm still hammering-away at the keyboard handler, getting things organised to my liking in the way it does stuff like handle control keys, feed printable characters to the screen, and so on, and in the course of these fun but fairly unexciting endeavours I've had occasion to do a bit of messing-about with the Stack. I haven't really needed to do much with it so far, since a round-trip needs 7 cycles (3 to push with PHA, 4 to pull with PLA) and as almost everything I've written so far has been speed-critical it's been more efficient to use Zero Page for temporary storage. I think there might be one place I use the Stack as a temporary store, and that's only because it's a single byte I need to keep handy and it's not needed anywhere else - so committing a ZP location to it is a bit wasteful.

But I'm writing a bit of code right now that needs to splat something to the screen and issue a 'change position' command to the cursor, and it passes three parameters to a subroutine in the .A, .X and .Y registers. However, before that subroutine makes use of those register values, it has to call a subroutine of its' own first - which will use at least two registers during execution and therefore nuke the original parameter values before they can be used. The obvious thing to do of course is stash the values before making the subroutine call, but again I don't fancy chewing into precious ZP storage just to hold them - and since this particular code isn't speed-critical and the Stack is designed for just this sort of thing, it's a no-brainer.

Bearing in mind the fact that the requirement to stash register-value parameters on the Stack is going to become more important as I climb higher up the OS structure towards userland, the logical thing to do at this point is to write a nice standardised pair of routines to handle the pushing of registers on to the Stack and the reciprocal action of pulling them off again. Then I can call these whenever anything needs to push or pull stuff, rather than having replicated chunks of code to do it scattered across the codebase - seems sensible, right? Sounds simple too, no doubt. Hah.

Let's take the simplest requirement first - we want to stash .A, .X and .Y to the Stack so they're safe and can be retrieved later; the code is short and sweet (ooh look, new formatting template!):

   PHA       ; [3] stash .A on the Stack
   TXA       ; [2] move .X to .A
   PHA       ; [3] stash .X on the Stack
   TYA       ; [2] move .Y to .A
   PHA       ; [3] stash .Y on the Stack

   ...       ; [x] some code that uses registers executes here

   PLA       ; [4] get .Y from the Stack
   TAY       ; [2] move to .Y
   PLA       ; [4] get .X from the Stack
   TAX       ; [2] move to .X
   PLA       ; [4] get .A from the Stack

Dead simple - push each register to the Stack, .A first, then .X and .Y (which have to go through .A because there's no direct Push .X or Push .Y instructions on the original 6502) and then pull them back in reverse order to recover their original values. Hardly rocket science, and you'll see code very like this in a good percentage of any moderately complex software. But there's a little snag with it, because although .A is saved to the Stack, its' value is trashed since we had to put .X and .Y into .A to push them. So what if we want to be able to push all three registers but leave their values intact? Now things get a little more complicated:

   PHA               ; [3] stash .A on the Stack
   TXA               ; [2] move .X to .A
   TSX               ; [2] move .SP to .X
   PHA               ; [3] stash original .X on the Stack
   TYA               ; [2] move .Y to .A
   PHA               ; [3] stash .Y on the Stack
   LDA _STACK+1,X    ; [4] peek original .A from Stack (.X contains .SP after first PHA)
   PHA               ; [3] stash .A on the Stack
   LDA _STACK,X      ; [4] peek original .X from Stack
   TAX               ; [2] restore original .X
   PLA               ; [4] restore original .A

That looks a bit gnarly, but it's still reasonably straightforward - we push .A as before, and shift .X ready to push it too, but grab the Stack Pointer (.SP) first. This tells us where we are about to push .X into the Stack, or to put it another way, it indirectly tells us where we just pushed .A. So we then push .X and .Y as before, but we can get the original value of .A back by 'sniffing' the Stack directly using .SP which is in .X - and having got it, we then push it onto the Stack a second time before sniffing once more to get the original value of .X back, and finally pulling .A. Hey presto, all three registers on the Stack, and their original values still intact. We use the same 'pull' code as before to get them all back in reverse order when we need them.

But that's not the end of the story, because this code then fails abysmally if you make a subroutine out of it - we don't want it repeated everywhere we need to use it, so it makes sense to make it a subroutine and just call it whenever we want, but it doesn't work:

saveregs SUBROUTINE  stash registers to Stack preserving contents
   PHA               ; [3] stash .A on the Stack
   TXA               ; [2] move .X to .A
   TSX               ; [2] move .SP to .X
   PHA               ; [3] stash original .X on the Stack
   TYA               ; [2] move .Y to .A
   PHA               ; [3] stash .Y on the Stack
   LDA _STACK+1,X    ; [4] peek original .A from Stack (.X contains .SP after first PHA)
   PHA               ; [3] stash .A on the Stack
   LDA _STACK,X      ; [4] peek original .X from Stack
   TAX               ; [2] restore original .X
   PLA               ; [4] restore original .A
   RTS               ; [6] pull 2-byte return address from Stack

The reason is that the act of calling a subroutine with JSR causes the 6502 to push the two-byte return-address-minus-one to the Stack before jumping, which it then pops-off to reset the Program Counter and come back when it hits RTS - but our subroutine has now added three items to the Stack (our register values) and so the return instruction pops the values of .Y and .X instead of the address it expects. It doesn't know that the values it's pulled aren't the return address, so happily loads .PC with them and suddenly we're running Xod-knows-where through memory, executing all sorts of excitingly-fatal bits of whatever data happens to be there.

So what we have to do in the new subroutine is somehow contrive to push the registers as expected, leave their values intact, and simultaneously adjust the Stack so that the first two items to be pulled are actually the proper return address for the RTS instruction. Take a look:

saveregs SUBROUTINE  stash registers to Stack preserving contents
   PHA               ; [3] stash .A on the Stack
   TXA               ; [2] move .X to .A
   TSX               ; [2] move .SP to .X
   PHA               ; [3] stash original .X on the Stack
   TYA               ; [2] move .Y to .A
   PHA               ; [3] stash .Y on the Stack
   LDA _STACK+3,X    ; [4] sniff return .PCH from Stack (.X contains .SP after first PHA)
   PHA               ; [3] stash .PCH on the Stack again
   LDA _STACK+2,X    ; [4] sniff return .PCL from Stack
   PHA               ; [3] stash .PCL on the Stack again
   LDA _STACK+1,X    ; [4] peek original .A from Stack 
   PHA               ; [3] stash .A on the Stack
   LDA _STACK,X      ; [4] peek original .X from Stack
   TAX               ; [2] restore original .X
   PLA               ; [4] restore original .A
   RTS               ; [6]

Oh-kaaaay. Deep breath. This is just a slightly more complicated variant of the previous version, with the addition of two extra LDA / PHA steps which sniff the .PC (high and low bytes) from the Stack and push them back on again so that they're the first thing to be pulled when the RTS executes - thus curing the 'random return' effect of the earlier version.

The snag with this is that we now have two orphaned bytes on the Stack, those being the original return address that the JSR pushed - so we have to tweak the 'pull' routine a bit so that it tidies-up and dumps those bytes as well as retrieving the register values, and of course handling the same return-address issue itself. Remember, because JSR pushes the return address (minus one) then if we just pull the first three values and assign them to the registers, we've actually pulled the two-byte return address for the subroutine into .Y and .X, and .A contains what should be in .Y - the registers are all wrong and the subroutine RTS will return to an incorrect location:

loadregs SUBROUTINE  retrieve registers from Stack
   TSX               ; [2] move .SP to .X
   LDA _STACK+2,X    ; [4] sniff return .PCH from Stack
   STA _STACK+7,X    ; [5] sneak .PCH on the Stack
   PLA               ; [4] pull return .PCL from stack
   STA _STACK+6,X    ; [5] sneak .PCL on the Stack
   PLA               ; [4] pull old .PCH from stack
   PLA               ; [4] pull .Y from stack
   TAY               ; [2] move to .Y
   PLA               ; [4] pull .X from stack
   TAX               ; [2] move to .X
   PLA               ; [4] pull .A from the Stack
   RTS               ; [6]

Since .A, .X and .Y are in the right order on the Stack, and it's just that we have a return address in the way and an orphan return address above them, we can simply grab .SP and copy the new return address over the orphaned one, pop the registers off as usual, and leave the Stack properly set-up to return from this routine normally. But it's a much more complex arrangement than we began with - it works, but pushing the three registers without disrupting their contents takes 58 cycles (including the 12 for the JSR / RTS) and pulling them back takes 52. In addition we're carrying two orphaned return-address bytes around on the Stack during the process, which is inefficient in itself and means we have to be extremely vigilant that every time we call saveregs we subsequently retrieve the registers through loadregs (and not with 'unsupervised' PLA instructions) because we've got to stop those orphaned bytes gradually filling-up the Stack.

I wasn't satisfied with this - it felt wrong, and clunky - and I wondered if there might be a smarter way. I'll show you what I came-up with next time...

Wednesday, 13 March 2013

In Which We Push The Stack Around

No comments:

Post a Comment