?Device Not Present: In Which Memory Testing Is Revisited And Improved

We're in real-time mode now - the first batch of posts were catching-up on what I'd already done, so now the post rate will drop slightly because I'll be writing these after I've completed some actual work on the code. However, I'm still anticipating at least two or three posts per week, because the ROM design is quite modular and so I can write about something new reasonably regularly. What I've just finished is a fairly major rewrite of the memory test logic, as I had a couple of ideas that I thought would improve the structure of the code as well as increase its' speed somewhat. I also wanted to extend it so that it included the colour memory nybble area ($9400 - $97FF) in the test coverage, something that was completely missing from the first version (and from the Commodore ROM, by the way).

The first thing I did was redesign the way the core test logic worked. Version one essentially mimicked the original Commodore design (albeit in a more efficient way) by performing a 'discovery' process - that is, it started at the beginning of the memory map and worked its' way up, testing every location until it got a failure and then figuring-out whether that failure was critical (i.e. in the on-board 1K or 4K sections) or not (i.e. in one of the designated expansion areas). The bitmap in _EXPBITS then got bits knocked-out as non-critical failure sections were identified, until we reached the upper limit at which no more RAM could possibly exist, at $C000. This was reasonable and certainly quicker than the original ROM, but lacked a proper address-line test and completely ignored the slightly peculiar colour RAM area.

The routine now works in a different way. I'd already created a table of page-progression values for the first version so that it knew how to skip to successive areas after a failure, so I modified this to become a more streamlined list of key pages which had to be tested (for address line verification) as well as representing area 'end-points' - the code now processes this 9-byte table, still doing all the same things it did before, but now implicitly confirming that all the memory address lines are working as well as checking every individual byte. Part of the streamlining effort was also to remove the _EXPBITS bit-mask bytes which got EORed into the bitmap byte - we now start with an empty bitmap, and by a nifty bit of Carry flag manipulation and use of the ROL instruction, we build the bitmap dynamically as each expansion area is checked.

As a result, the core of the routine is now easier to read, smaller, and faster - whilst actually doing more than the first version. That said, I still do the Zero Page test in a separate precursor loop, and the colour memory is also checked in a separate loop - but that's because it's actually a kilobyte of 4-bit nybbles rather than 8-bit bytes and has to be tested differently from the rest of RAM. Specifically, the maximum value each colour nybble can hold is 15 rather than 255, so the bit-toggle logic has to take that into account when storing and testing each location. Additionally, because the upper four data lines are 'floating', it's possible that when the read of the colour memory byte occurs, the upper nybble might have 'garbage' in it, so we have to mask that off before checking the returned value. The entire block of startup code now looks like this:

; memory test table
_MEMTAB
    DC.B $01,$04,$10,$20,$40,$60,$80,$A0,$C0
; 9 bytes

; CPU RESET handler - system initialisation
cpureset SUBROUTINE
    SEI                 ; [2]   disable CPU interrupts
    CLD                 ; [2]   clear decimal flag
    LDX #$FF            ; [2]
    TXS                 ; [2]   set stack pointer
    LDX #$01            ; [2]
    STX _SCRNCOL        ; [4]   set VIC register for white border
    DEX                 ; [2]   ZP memory test location index (.X = #$00)
    LDY #$FF            ; [2]

; zero-page RAM memory test
.nextzp
    TYA                 ; [2]   first test bit-pattern (.Y = %11111111)
.zptest
    STA $00,X           ; [4]   ZP store pattern at ZP location with X offset
    CMP $00,X           ; [4]   check it
    BNE .critfail       ; [2/3] store failed, game over
    ADC #$00            ; [2]   add carry flag for second test bit-pattern (000000)
    BEQ .zptest         ; [3/2] loop around to test next pattern
    INX                 ; [2]   increment location index
    BNE .nextzp         ; [3/2] loop around to test next location

; main RAM address line / byte memory test
    INY                 ; [2]   test address lo-byte index (.Y = $00)
    LDX #$07            ; [2]   memory table index
.newblock
    LDA _MEMTAB,X       ; [4]   get test address start page
    STA _TESTHI         ; [3]   ZP store test address start page (hi-byte)
.nextbyte
    LDA #$FF            ; [2]   first test bit-pattern (%11111111)
.bytetest
    STA (_TESTLO),Y     ; [6]   store pattern at indirect test address with .Y offset
    CMP (_TESTLO),Y     ; [5]   check it
    BEQ .testpass       ; [2/3] store worked
    LDA _MEMTAB+1,X     ; [4]   get test address end page
    CMP #$04            ; [2]   failure in 1K onboard?
    BEQ .critfail       ; [2/3] yep, critical failure
    CMP #$20            ; [2]   failure in 4K onboard?
    BEQ .critfail       ; [2/3] yep, critical failure
    CLC                 ; [2]   clear carry
    BCC .setbit         ; [3/3] set expansion bit
.critfail
    INC _SCRNCOL        ; [5]   red border (critical failure)
    BNE *               ; [3/3] spinloop (VICE spits a JAM exception if we use HLT)
.testpass
    ADC #$00            ; [2]   add carry flag for second test bit-pattern (000000)
    BEQ .bytetest       ; [3/2] loop around to test next pattern
    INY                 ; [2]   next test address on current page
    BNE .nextbyte       ; [3/2] loop back to first pattern
    LDA _MEMTAB+1,X     ; [4]   get test address end page
    INC _TESTHI         ; [5]   ZP increment memory test address page (hi-byte)
    CMP _TESTHI         ; [5]   ZP check for end of block
    BNE .nextbyte       ; [3/3] continue test at next page
    CMP #$04            ; [2]   1K onboard?
    BEQ .skipbit        ; [2/3] yep, no expansion bit
    CMP #$20            ; [2]   4K onboard?
    BEQ .skipbit        ; [2/3] yep, no expansion bit
    SEC                 ; [2]   set carry
.setbit
    ROL _EXPBITS        ; [6]   rotate carry into bitmap
    CMP #$C0            ; [2]   did we just test 8K BLK5?
    BNE .skipbit        ; [3/2] no, carry on
    DEX                 ; [2]   double-decrement for the $A000 special case
.skipbit
    DEX                 ; [2]   decrement memory table index for next block
    BPL .newblock       ; [3/2] loop back if not finished

; colour RAM memory test
    LDX #$94            ; [2]   colour memory start
    STX _TESTHI         ; [3]   ZP store test address start page (hi-byte)
    LDX #$98            ; [2]   colour memory end
.nextnybl
    LDA #$0F            ; [2]   first test bit-pattern (001111)
.nybltest
    STA _TESTDATA       ; [3]   ZP store pattern at test data location
    STA (_TESTLO),Y     ; [6]   store pattern at indirect test address with .Y offset
    LDA (_TESTLO),Y     ; [5]   get pattern
    AND #$0F            ; [2]   mask-off upper nybble
    CMP _TESTDATA       ; [3]   ZP compare with test data
    BNE .critfail       ; [3/2] store failed
    LDA #$00            ; [2]   second test bit-pattern (000000)
    CMP _TESTDATA       ; [3]   ZP compare with test data
    BNE .nybltest       ; [3/2] loop back for second test
    INY                 ; [2]   next test address on current page
    BNE .nextnybl       ; [3/2] loop back to first pattern
    INC _TESTHI         ; [5]   ZP increment memory test address page (hi-byte)
    CPX _TESTHI         ; [5]   ZP check for end of block
    BNE .nextnybl       ; [3/2] continue test at next page

The test process is still virtually instantaneous for an unexpanded configuration, and has dropped to under two seconds (by a whisker) for the fully-expanded setup where all 8K blocks - including the one at $A000 - plus the 3K block have RAM in them. I'm on the hunt for a relatively straight-forward way to profile both the original ROM and VIC++ so that I can compare the cycle-counts for both - I know VIC++ is faster, but I'd quite like to see by how much in terms of actual CPU workload. I have an alpha-release 6502 emulator written in C# (coded about four years ago) which does accurate cycle-counting, so I might dig it out and let it chew both ROMs over when I get a moment.

Having got this out of my system (it's been bugging me for days) I can now return to the screen display task, where I'll be experimenting with a few ideas for making 40-column mode work the way I want it to.

Monday, 12 March 2012

In Which Memory Testing Is Revisited And Improved

No comments:

Post a Comment