I'd discovered, way back when writing the RAMTAS logic, that neither DASM nor XVIC liked the undocumented HLT instruction; DASM spat it out as an unrecognised opcode and refused either of the two alternate mnemonics for it (KIL and JAM) but would of course quite happily assemble DC.B #$02, which is the first of the raw hex codes for the instruction. XVIC, however, pops-up a message-box when it hits any HLT opcode and the monitor disassembles it as the JAM mnemonic (which is good) but what I wanted the emulator to do is what a real 6502 does with this instruction, and just stop dead - that is, freeze the emulation without showing that message-box. I tried all the other HLT opcodes, but XVIC does the same thing for all of them - I left the HLT instruction in there anyway, because it's a 1-byte opcode that breaks the T-state register inside the CPU and renders it unable to process any more instructions, so it's a great way to halt the CPU without using a Bxx * or JMP * spinloop (neither of which prevent the CPU from responding to IRQ or NMI signals, and take 2 or 3 bytes respectively).
There are a few other undocumented opcodes (here's just one of a number of sites that talk about them) some of which can be quite useful in certain circumstances - though others are quite esoteric in their function and finding a use for them can be a challenge, and there are also some which are just too unstable to be useful. One I do like, though, is LAX, which DASM assembles correctly and XVIC emulates accurately. It loads both the Accumulator and X-register with a value simultaneously; it has an immediate addressing mode (i.e. LAX #$xx) but sadly it's too unstable or unpredictable across 6502 variants to be properly useful. However, it has other stable addressing modes, and I spotted an opportunity to use it in the row-redraw logic.
Part of the refresh routine requires that the glyph data for a given ASCII character be copied to the screen bitmap. In fact, we copy two glyphs at a time, merging them so that we only have to update each bitmap byte once - an earlier version of the code copied each glyph separately, and masked-off whichever one was already there when applying the data for the second. This worked fine, but meant that we hit every byte on a row twice - and I am now writing the code which eliminates this double-hit. Pretty-much the first thing the logic has to do is determine where in the Character Generator ROM (running from $8000 to $87FF) the pixel-data for each ASCII character starts; this is a straight-forward function of the ASCII code multiplied by 8, and there are probably half-a-dozen ways to do it - but I specifically wanted to avoid any mechanism which needed 16-bit address increments from a base, because that tends to be a bit clunky to look at, and slows down the farther away from the base address you want to get.
Instead, I looked at the binary patterns for ASCII codes and noticed that the upper three bits of a given value (bits 7-5) act as a de facto page offset when doing lookups in multiples of 8. So if we move those three bits to positions 2-0 and add them to #$80, we get the ROM address hi-byte. The remainder, the lower five bits of the ASCII value, can then just be multiplied by 8 to form the address lo-byte. Here's what I initially wrote to do that:
With .Y holding an index into the ASCII string we're drawing into the row, it simply initialises a value as #$10 (which is #$80 right-shifted three bits) and then shift/rotates the upper three bits of the ASCII code into it, yielding a value between #$80 and #$87 - the address hi-byte. The remainder is the lower five bits, shifted left three times (or to put it another way, multiplied by 8) so this is the address lo-byte. The eight bytes at that location are the glyph bit-pattern data. The code occupies 17 bytes and takes 34 cycles to execute, but of course we have to do it twice as we're dealing with two glyphs at the same time, so that makes 34 bytes and 68 cycles.
I wondered if doing a simple masking operation might be quicker and/or smaller. After a couple of iterations I ended-up with this:
Part of the refresh routine requires that the glyph data for a given ASCII character be copied to the screen bitmap. In fact, we copy two glyphs at a time, merging them so that we only have to update each bitmap byte once - an earlier version of the code copied each glyph separately, and masked-off whichever one was already there when applying the data for the second. This worked fine, but meant that we hit every byte on a row twice - and I am now writing the code which eliminates this double-hit. Pretty-much the first thing the logic has to do is determine where in the Character Generator ROM (running from $8000 to $87FF) the pixel-data for each ASCII character starts; this is a straight-forward function of the ASCII code multiplied by 8, and there are probably half-a-dozen ways to do it - but I specifically wanted to avoid any mechanism which needed 16-bit address increments from a base, because that tends to be a bit clunky to look at, and slows down the farther away from the base address you want to get.
Instead, I looked at the binary patterns for ASCII codes and noticed that the upper three bits of a given value (bits 7-5) act as a de facto page offset when doing lookups in multiples of 8. So if we move those three bits to positions 2-0 and add them to #$80, we get the ROM address hi-byte. The remainder, the lower five bits of the ASCII value, can then just be multiplied by 8 to form the address lo-byte. Here's what I initially wrote to do that:
LDA #$10 ; [2] glyph address hi-byte (#$80, right-shifted 3 bits)
STA _CHARADD2+1 ; [3] ZP set glyph address hi-byte
LDA (_TBUFADDR),Y ; [5] get ASCII char from text buffer
ASL ; [2] shift upper 3 bits out via Carry
ROL _CHARADD2+1 ; [5] rotate bits into address hi-byte
ASL ; [2]
ROL _CHARADD2+1 ; [5]
ASL ; [2]
ROL _CHARADD2+1 ; [5]
STA _CHARADD2 ; [3] ZP set glyph address lo-byte
With .Y holding an index into the ASCII string we're drawing into the row, it simply initialises a value as #$10 (which is #$80 right-shifted three bits) and then shift/rotates the upper three bits of the ASCII code into it, yielding a value between #$80 and #$87 - the address hi-byte. The remainder is the lower five bits, shifted left three times (or to put it another way, multiplied by 8) so this is the address lo-byte. The eight bytes at that location are the glyph bit-pattern data. The code occupies 17 bytes and takes 34 cycles to execute, but of course we have to do it twice as we're dealing with two glyphs at the same time, so that makes 34 bytes and 68 cycles.
I wondered if doing a simple masking operation might be quicker and/or smaller. After a couple of iterations I ended-up with this:
LAX (_TBUFADDR),Y ; [5] get ASCII char from text buffer (undocumented opcode LDA+TAX)
AND #%11100000 ; [2] mask upper three bits of ASCII char
ROL ; [2] shift bits 7-5 to 2-0
ROL ; [2]
ROL ; [2]
ROL ; [2]
ORA #$80 ; [2] set bit 7 for glyph address hi-byte
STA _CHARADD2+1 ; [3] ZP set address hi-byte
TXA ; [2] get character back
ASL ; [2] multiply by 8 for glyph size
ASL ; [2]
ASL ; [2]
STA _CHARADD2 ; [3] ZP set address lo-byte
So here we get the character ASCII code as before, but now mask-off the upper three bits and set bit 7 to yield the address hi-byte. We then grab the ASCII code again (from .X, thanks to LAX) and multiply by 8 for the lo-byte. This occupies 18 bytes (so is one byte larger than the other version) but only takes 31 cycles - and when we double it for the other glyph, that sums to 36 bytes and 62 cycles, so we've traded two bytes for six cycles. As we execute this code 20 times per row, we're saving 120 cycles - so this is the version I'm using.
I was also using the XAA opcode in this routine at one point, although the code has since changed and it's not there any more. It transfers the X-register to the Accumulator, and then does an immediate-mode AND (useful anywhere you might do TXA followed by AND #$xx) although it's worth pointing-out that DASM doesn't recognise the XAA mnemonic, but insists on the rather more arcane (and now little-used) ANE instead. XVIC emulates it correctly, however.
I've got the code working that copies ASCII data to the screen text buffer, and allows you to set control bits for each character - as well as automatically setting the dirty-row bit when a row is updated. I'm checking for dirty-row bits during each IRQ and invoking the row-refresh when one is discovered (the routine tracks where it got to, so it can carry-on from that point on the next IRQ). The basic logic for copying glyph data to the screen bitmap is also in and working, but is still lacking the additional options for masking the bit-patterns as they're written to the bitmap in order to apply underline, strike-through and inverse mode effects.
However, I'm relieved to be moving forward again - a week of deep thought and experimentation is now starting to pay dividends.
I've got the code working that copies ASCII data to the screen text buffer, and allows you to set control bits for each character - as well as automatically setting the dirty-row bit when a row is updated. I'm checking for dirty-row bits during each IRQ and invoking the row-refresh when one is discovered (the routine tracks where it got to, so it can carry-on from that point on the next IRQ). The basic logic for copying glyph data to the screen bitmap is also in and working, but is still lacking the additional options for masking the bit-patterns as they're written to the bitmap in order to apply underline, strike-through and inverse mode effects.
However, I'm relieved to be moving forward again - a week of deep thought and experimentation is now starting to pay dividends.
No comments:
Post a Comment