A slight diversion into string-copying, partly because I'm fine-tuning the logic to put strings onto the screen (as a result of all the tuning and tweaking I've done to the renderer) and partly because - like the NTSC settings I adjusted recently - this issue has had a 'FIX ME' comment next to it for a while. I wrote some code ages ago to copy strings around in memory, principally to get stuff into the text buffer so the earliest versions of the renderer would have something to work with. It was perfectly good code, but it had a critical limitation that I knew I'd have to come back to and remedy later. Well, it's later.
The limitation was a simple one - using (Indirect),Y indexing to copy data around had an inherent cap on the length of stuff that could be moved, namely 256 bytes (as the Y-register is of course only 8 bits wide). I wanted VIC++ to have no imposed limit on string lengths, other than available memory, so that meant I needed logic that could manage and copy data of arbitrary lengths, and importantly of lengths greater than 256 bytes. Here's the first working version of an unlimited-length string copying routine:
copystr SUBROUTINE copy zero-terminated string (_COPYSRCE, _COPYDEST) [.A, .X, .Y]
LDA _COPYSRCE ; [3] ZP get string start address lo-byte
STA _TMPSRCE ; [3] ZP set temporary address lo-byte
STA _BIN16L ; [3] ZP set subtraction lo-byte
LDA _COPYSRCE+1 ; [3] ZP get string start address hi-byte
STA _TMPSRCE+1 ; [3] ZP set temporary address hi-byte
STA _BIN16H ; [3] ZP set subtraction hi-byte
LDA _COPYDEST ; [3] ZP get target start address lo-byte
STA _TMPDEST ; [3] ZP set temporary address lo-byte
LDA _COPYDEST+1 ; [3] ZP get target start address hi-byte
STA _TMPDEST+1 ; [3] ZP set temporary address hi-byte
LDA #$00 ; [2] initialise string length count
STA _COPYLEN ; [3] ZP clear count hi-byte
STA _COPYLEN+1 ; [3] ZP clear count lo-byte
LDY #$00 ; [2] set string byte offset
.getbyte LDA (_TMPSRCE),Y ; [5/6] get string byte from source
STA (_TMPDEST),Y ; [6] set string byte at destination
BEQ .calclen ; [2/3] quit when we hit zero-terminator
INC _TMPSRCE ; [5] ZP increment source address lo-byte
BNE .incdest ; [3/2] skip hi-byte increment unless lo-byte wrapped
INC _TMPSRCE+1 ; [5] ZP increment source address hi-byte
BEQ .calclen ; [2/3] quit if we hit end of memory
.incdest INC _TMPDEST ; [5] ZP increment target address lo-byte
BNE .getbyte ; [3/2] skip hi-byte increment unless lo-byte wrapped
INC _TMPDEST+1 ; [5] ZP increment target address hi-byte
BNE .getbyte ; [2/3] quit if we hit end of memory
.calclen LDA _TMPSRCE ; [4] get string start address lo-byte
LDX _TMPSRCE+1 ; [4] get string start address hi-byte
JSR sub16bit ; [6] calculate length of copied string
LDA _BIN16H ; [3] ZP get subtraction result hi-byte
STA _COPYLEN ; [4] save copy length hi-byte
LDA _BIN16L ; [3] ZP get subtraction result lo-byte
STA _COPYLEN+1 ; [4] save copy length lo-byte
RTS ; [6]
We're using two pointers to the start of the source and destination addresses of the string to copy, which we copy into temporary spaces in Zero Page so we can do indirect addressing on them. The code sets the indirection offset to zero and then reads the first byte and stores it, then looks to see if we hit the zero-terminator (VIC++ strings are zero-terminated) which signals the end of the copy phase. If it's not the end of the string, we increment both temporary addresses, taking care to increment the hi-byte too if the lo-byte wraps, and go around again - the indirection index in .Y never changes, because we're altering the actual indirection address. When we either hit a zero-terminator or the end of memory we stop copying, calculate the length of what we copied with a call to sub16bit using the original source address and the post-copy incremented source address (the difference is a 16-bit value telling us how long it was) and we're done. No 256-byte limit. Yay!
This routine occupies 70 bytes and takes 15,019 cycles to copy a 513-byte string (one that is twice the size, plus a byte, of the biggest string that an index-limited copy could handle). But even as I was finishing this bit of code, I'd had an idea for a faster version... ;)
copystr SUBROUTINE copy zero-terminated string (_COPYSRCE, _COPYDEST) [.A, .X, .Y]
LDA _COPYSRCE ; [3] ZP get string start address lo-byte
STA _TMPSRCE ; [3] ZP set temporary address lo-byte
STA _BIN16L ; [3] ZP set subtraction lo-byte
LDA _COPYSRCE+1 ; [3] ZP get string start address hi-byte
STA _TMPSRCE+1 ; [3] ZP set temporary address hi-byte
LDA _COPYDEST ; [3] ZP get target start address lo-byte
STA _TMPDEST ; [3] ZP set temporary address lo-byte
STA _BIN16H ; [3] ZP set subtraction hi-byte
LDA _COPYDEST+1 ; [3] ZP get target start address hi-byte
STA _TMPDEST+1 ; [3] ZP set temporary address hi-byte
LDA #$00 ; [2] initialise string length count
STA _COPYLEN ; [3] ZP clear count hi-byte
STA _COPYLEN+1 ; [3] ZP clear count lo-byte
LDY #$00 ; [2] set string byte offset
.getbyte LDA (_TMPSRCE),Y ; [5/6] get string byte from source
STA (_TMPDEST),Y ; [6] set string byte at destination
BEQ .calclen ; [2/3] quit when we hit zero-terminator
INY ; [2] increment byte offset
BNE .getbyte ; [3/2] loop for next byte until .Y = 0
INC _TMPSRCE+1 ; [5] ZP increment source address hi-byte
INC _TMPDEST+1 ; [5] ZP increment target address hi-byte
BNE .getbyte ; [3/2] loop for next page unless we hit end of memory
.calclen TYA ; [2] move index to .A
CLC ; [2] clear Carry
ADC _TMPSRCE ; [3] ZP add temporary address lo-byte
TAY ; [2] stash lo-byte in .Y for later
LDA _TMPSRCE+1 ; [3] ZP get temporary address hi-byte
ADC #$00 ; [2] add Carry if lo-byte wrapped
TAX ; [2] move hi-byte to .X for subtract
TYA ; [2] get lo-byte back for subtract
JSR sub16bit ; [6] calculate length of copied string
LDA _BIN16H ; [3] ZP get subtraction result hi-byte
STA _COPYLEN ; [4] save copy length hi-byte
LDA _BIN16L ; [3] ZP get subtraction result lo-byte
STA _COPYLEN+1 ; [4] save copy length lo-byte
RTS ; [6]
The beginning and end of the routine are (broadly) the same - copy the source and destination addresses to ZP for indirect addressing, and compute the length of what we copied using the original start address and whatever the temporary address incremented-to. The difference in the length calculation is important - we've added the value of .Y to the final address because of the way the middle part now works; instead of incrementing the indirection addresses every time, we now process the data in 256-byte chunks and only increment the address hi-bytes when we hit the .Y limit. The routine is exactly the same size at 70 bytes, but crucially copies a 513-byte string in 9,682 cycles. That's 35.5% faster. Nice.
No comments:
Post a Comment