?Device Not Present: In Which We Turn Strings Up To 11

A slight diversion into string-copying, partly because I'm fine-tuning the logic to put strings onto the screen (as a result of all the tuning and tweaking I've done to the renderer) and partly because - like the NTSC settings I adjusted recently - this issue has had a 'FIX ME' comment next to it for a while. I wrote some code ages ago to copy strings around in memory, principally to get stuff into the text buffer so the earliest versions of the renderer would have something to work with. It was perfectly good code, but it had a critical limitation that I knew I'd have to come back to and remedy later. Well, it's later.

The limitation was a simple one - using (Indirect),Y indexing to copy data around had an inherent cap on the length of stuff that could be moved, namely 256 bytes (as the Y-register is of course only 8 bits wide). I wanted VIC++ to have no imposed limit on string lengths, other than available memory, so that meant I needed logic that could manage and copy data of arbitrary lengths, and importantly of lengths greater than 256 bytes. Here's the first working version of an unlimited-length string copying routine:

copystr  SUBROUTINE  copy zero-terminated string (_COPYSRCE, _COPYDEST) [.A, .X, .Y]
         LDA _COPYSRCE       ; [3]   ZP get string start address lo-byte
         STA _TMPSRCE        ; [3]   ZP set temporary address lo-byte
         STA _BIN16L         ; [3]   ZP set subtraction lo-byte
         LDA _COPYSRCE+1     ; [3]   ZP get string start address hi-byte
         STA _TMPSRCE+1      ; [3]   ZP set temporary address hi-byte
         STA _BIN16H         ; [3]   ZP set subtraction hi-byte
         LDA _COPYDEST       ; [3]   ZP get target start address lo-byte
         STA _TMPDEST        ; [3]   ZP set temporary address lo-byte
         LDA _COPYDEST+1     ; [3]   ZP get target start address hi-byte
         STA _TMPDEST+1      ; [3]   ZP set temporary address hi-byte
         LDA #$00            ; [2]   initialise string length count
         STA _COPYLEN        ; [3]   ZP clear count hi-byte
         STA _COPYLEN+1      ; [3]   ZP clear count lo-byte
         LDY #$00            ; [2]   set string byte offset
.getbyte LDA (_TMPSRCE),Y    ; [5/6] get string byte from source
         STA (_TMPDEST),Y    ; [6]   set string byte at destination
         BEQ .calclen        ; [2/3] quit when we hit zero-terminator
         INC _TMPSRCE        ; [5]   ZP increment source address lo-byte
         BNE .incdest        ; [3/2] skip hi-byte increment unless lo-byte wrapped
         INC _TMPSRCE+1      ; [5]   ZP increment source address hi-byte
         BEQ .calclen        ; [2/3] quit if we hit end of memory
.incdest INC _TMPDEST        ; [5]   ZP increment target address lo-byte
         BNE .getbyte        ; [3/2] skip hi-byte increment unless lo-byte wrapped
         INC _TMPDEST+1      ; [5]   ZP increment target address hi-byte
         BNE .getbyte        ; [2/3] quit if we hit end of memory
.calclen LDA _TMPSRCE        ; [4]   get string start address lo-byte
         LDX _TMPSRCE+1      ; [4]   get string start address hi-byte
         JSR sub16bit        ; [6]   calculate length of copied string
         LDA _BIN16H         ; [3]   ZP get subtraction result hi-byte
         STA _COPYLEN        ; [4]   save copy length hi-byte
         LDA _BIN16L         ; [3]   ZP get subtraction result lo-byte
         STA _COPYLEN+1      ; [4]   save copy length lo-byte
         RTS                 ; [6]

We're using two pointers to the start of the source and destination addresses of the string to copy, which we copy into temporary spaces in Zero Page so we can do indirect addressing on them. The code sets the indirection offset to zero and then reads the first byte and stores it, then looks to see if we hit the zero-terminator (VIC++ strings are zero-terminated) which signals the end of the copy phase. If it's not the end of the string, we increment both temporary addresses, taking care to increment the hi-byte too if the lo-byte wraps, and go around again - the indirection index in .Y never changes, because we're altering the actual indirection address. When we either hit a zero-terminator or the end of memory we stop copying, calculate the length of what we copied with a call to sub16bit using the original source address and the post-copy incremented source address (the difference is a 16-bit value telling us how long it was) and we're done. No 256-byte limit. Yay!

This routine occupies 70 bytes and takes 15,019 cycles to copy a 513-byte string (one that is twice the size, plus a byte, of the biggest string that an index-limited copy could handle). But even as I was finishing this bit of code, I'd had an idea for a faster version... ;)

copystr  SUBROUTINE  copy zero-terminated string (_COPYSRCE, _COPYDEST) [.A, .X, .Y]
         LDA _COPYSRCE       ; [3]   ZP get string start address lo-byte
         STA _TMPSRCE        ; [3]   ZP set temporary address lo-byte
         STA _BIN16L         ; [3]   ZP set subtraction lo-byte
         LDA _COPYSRCE+1     ; [3]   ZP get string start address hi-byte
         STA _TMPSRCE+1      ; [3]   ZP set temporary address hi-byte
         LDA _COPYDEST       ; [3]   ZP get target start address lo-byte
         STA _TMPDEST        ; [3]   ZP set temporary address lo-byte
         STA _BIN16H         ; [3]   ZP set subtraction hi-byte
         LDA _COPYDEST+1     ; [3]   ZP get target start address hi-byte
         STA _TMPDEST+1      ; [3]   ZP set temporary address hi-byte
         LDA #$00            ; [2]   initialise string length count
         STA _COPYLEN        ; [3]   ZP clear count hi-byte
         STA _COPYLEN+1      ; [3]   ZP clear count lo-byte
         LDY #$00            ; [2]   set string byte offset
.getbyte LDA (_TMPSRCE),Y    ; [5/6] get string byte from source
         STA (_TMPDEST),Y    ; [6]   set string byte at destination
         BEQ .calclen        ; [2/3] quit when we hit zero-terminator
         INY                 ; [2]   increment byte offset
         BNE .getbyte        ; [3/2] loop for next byte until .Y = 0
         INC _TMPSRCE+1      ; [5]   ZP increment source address hi-byte
         INC _TMPDEST+1      ; [5]   ZP increment target address hi-byte
         BNE .getbyte        ; [3/2] loop for next page unless we hit end of memory
.calclen TYA                 ; [2]   move index to .A
         CLC                 ; [2]   clear Carry
         ADC _TMPSRCE        ; [3]   ZP add temporary address lo-byte
         TAY                 ; [2]   stash lo-byte in .Y for later
         LDA _TMPSRCE+1      ; [3]   ZP get temporary address hi-byte
         ADC #$00            ; [2]   add Carry if lo-byte wrapped
         TAX                 ; [2]   move hi-byte to .X for subtract
         TYA                 ; [2]   get lo-byte back for subtract
         JSR sub16bit        ; [6]   calculate length of copied string
         LDA _BIN16H         ; [3]   ZP get subtraction result hi-byte
         STA _COPYLEN        ; [4]   save copy length hi-byte
         LDA _BIN16L         ; [3]   ZP get subtraction result lo-byte
         STA _COPYLEN+1      ; [4]   save copy length lo-byte
         RTS                 ; [6]

The beginning and end of the routine are (broadly) the same - copy the source and destination addresses to ZP for indirect addressing, and compute the length of what we copied using the original start address and whatever the temporary address incremented-to. The difference in the length calculation is important - we've added the value of .Y to the final address because of the way the middle part now works; instead of incrementing the indirection addresses every time, we now process the data in 256-byte chunks and only increment the address hi-bytes when we hit the .Y limit. The routine is exactly the same size at 70 bytes, but crucially copies a 513-byte string in 9,682 cycles. That's 35.5% faster. Nice.

Sunday, 17 February 2013

In Which We Turn Strings Up To 11

No comments:

Post a Comment