We're in real-time mode now - the first batch of posts were catching-up on what I'd already done, so now the post rate will drop slightly because I'll be writing these after I've completed some actual work on the code. However, I'm still anticipating at least two or three posts per week, because the ROM design is quite modular and so I can write about something new reasonably regularly. What I've just finished is a fairly major rewrite of the memory test logic, as I had a couple of ideas that I thought would improve the structure of the code as well as increase its' speed somewhat. I also wanted to extend it so that it included the colour memory nybble area ($9400 - $97FF) in the test coverage, something that was completely missing from the first version (and from the Commodore ROM, by the way).
The first thing I did was redesign the way the core test logic worked. Version one essentially mimicked the original Commodore design (albeit in a more efficient way) by performing a 'discovery' process - that is, it started at the beginning of the memory map and worked its' way up, testing every location until it got a failure and then figuring-out whether that failure was critical (i.e. in the on-board 1K or 4K sections) or not (i.e. in one of the designated expansion areas). The bitmap in _EXPBITS then got bits knocked-out as non-critical failure sections were identified, until we reached the upper limit at which no more RAM could possibly exist, at $C000. This was reasonable and certainly quicker than the original ROM, but lacked a proper address-line test and completely ignored the slightly peculiar colour RAM area.
The routine now works in a different way. I'd already created a table of page-progression values for the first version so that it knew how to skip to successive areas after a failure, so I modified this to become a more streamlined list of key pages which had to be tested (for address line verification) as well as representing area 'end-points' - the code now processes this 9-byte table, still doing all the same things it did before, but now implicitly confirming that all the memory address lines are working as well as checking every individual byte. Part of the streamlining effort was also to remove the _EXPBITS bit-mask bytes which got EORed into the bitmap byte - we now start with an empty bitmap, and by a nifty bit of Carry flag manipulation and use of the ROL instruction, we build the bitmap dynamically as each expansion area is checked.
As a result, the core of the routine is now easier to read, smaller, and faster - whilst actually doing more than the first version. That said, I still do the Zero Page test in a separate precursor loop, and the colour memory is also checked in a separate loop - but that's because it's actually a kilobyte of 4-bit nybbles rather than 8-bit bytes and has to be tested differently from the rest of RAM. Specifically, the maximum value each colour nybble can hold is 15 rather than 255, so the bit-toggle logic has to take that into account when storing and testing each location. Additionally, because the upper four data lines are 'floating', it's possible that when the read of the colour memory byte occurs, the upper nybble might have 'garbage' in it, so we have to mask that off before checking the returned value. The entire block of startup code now looks like this:
; memory test table
_MEMTAB
DC.B $01,$04,$10,$20,$40,$60,$80,$A0,$C0
; 9 bytes
; CPU RESET handler - system initialisation
cpureset SUBROUTINE
SEI ; [2] disable CPU interrupts
CLD ; [2] clear decimal flag
LDX #$FF ; [2]
TXS ; [2] set stack pointer
LDX #$01 ; [2]
STX _SCRNCOL ; [4] set VIC register for white border
DEX ; [2] ZP memory test location index (.X = #$00)
LDY #$FF ; [2]
; zero-page RAM memory test
.nextzp
TYA ; [2] first test bit-pattern (.Y = %11111111)
.zptest
STA $00,X ; [4] ZP store pattern at ZP location with X offset
CMP $00,X ; [4] check it
BNE .critfail ; [2/3] store failed, game over
ADC #$00 ; [2] add carry flag for second test bit-pattern (000000)
BEQ .zptest ; [3/2] loop around to test next pattern
INX ; [2] increment location index
BNE .nextzp ; [3/2] loop around to test next location
; main RAM address line / byte memory test
INY ; [2] test address lo-byte index (.Y = $00)
LDX #$07 ; [2] memory table index
.newblock
LDA _MEMTAB,X ; [4] get test address start page
STA _TESTHI ; [3] ZP store test address start page (hi-byte)
.nextbyte
LDA #$FF ; [2] first test bit-pattern (%11111111)
.bytetest
STA (_TESTLO),Y ; [6] store pattern at indirect test address with .Y offset
CMP (_TESTLO),Y ; [5] check it
BEQ .testpass ; [2/3] store worked
LDA _MEMTAB+1,X ; [4] get test address end page
CMP #$04 ; [2] failure in 1K onboard?
BEQ .critfail ; [2/3] yep, critical failure
CMP #$20 ; [2] failure in 4K onboard?
BEQ .critfail ; [2/3] yep, critical failure
CLC ; [2] clear carry
BCC .setbit ; [3/3] set expansion bit
.critfail
INC _SCRNCOL ; [5] red border (critical failure)
BNE * ; [3/3] spinloop (VICE spits a JAM exception if we use HLT)
.testpass
ADC #$00 ; [2] add carry flag for second test bit-pattern (000000)
BEQ .bytetest ; [3/2] loop around to test next pattern
INY ; [2] next test address on current page
BNE .nextbyte ; [3/2] loop back to first pattern
LDA _MEMTAB+1,X ; [4] get test address end page
INC _TESTHI ; [5] ZP increment memory test address page (hi-byte)
CMP _TESTHI ; [5] ZP check for end of block
BNE .nextbyte ; [3/3] continue test at next page
CMP #$04 ; [2] 1K onboard?
BEQ .skipbit ; [2/3] yep, no expansion bit
CMP #$20 ; [2] 4K onboard?
BEQ .skipbit ; [2/3] yep, no expansion bit
SEC ; [2] set carry
.setbit
ROL _EXPBITS ; [6] rotate carry into bitmap
CMP #$C0 ; [2] did we just test 8K BLK5?
BNE .skipbit ; [3/2] no, carry on
DEX ; [2] double-decrement for the $A000 special case
.skipbit
DEX ; [2] decrement memory table index for next block
BPL .newblock ; [3/2] loop back if not finished
; colour RAM memory test
LDX #$94 ; [2] colour memory start
STX _TESTHI ; [3] ZP store test address start page (hi-byte)
LDX #$98 ; [2] colour memory end
.nextnybl
LDA #$0F ; [2] first test bit-pattern (001111)
.nybltest
STA _TESTDATA ; [3] ZP store pattern at test data location
STA (_TESTLO),Y ; [6] store pattern at indirect test address with .Y offset
LDA (_TESTLO),Y ; [5] get pattern
AND #$0F ; [2] mask-off upper nybble
CMP _TESTDATA ; [3] ZP compare with test data
BNE .critfail ; [3/2] store failed
LDA #$00 ; [2] second test bit-pattern (000000)
CMP _TESTDATA ; [3] ZP compare with test data
BNE .nybltest ; [3/2] loop back for second test
INY ; [2] next test address on current page
BNE .nextnybl ; [3/2] loop back to first pattern
INC _TESTHI ; [5] ZP increment memory test address page (hi-byte)
CPX _TESTHI ; [5] ZP check for end of block
BNE .nextnybl ; [3/2] continue test at next page
The test process is still virtually instantaneous for an unexpanded configuration, and has dropped to under two seconds (by a whisker) for the fully-expanded setup where all 8K blocks - including the one at $A000 - plus the 3K block have RAM in them. I'm on the hunt for a relatively straight-forward way to profile both the original ROM and VIC++ so that I can compare the cycle-counts for both - I know VIC++ is faster, but I'd quite like to see by how much in terms of actual CPU workload. I have an alpha-release 6502 emulator written in C# (coded about four years ago) which does accurate cycle-counting, so I might dig it out and let it chew both ROMs over when I get a moment.
Having got this out of my system (it's been bugging me for days) I can now return to the screen display task, where I'll be experimenting with a few ideas for making 40-column mode work the way I want it to.
No comments:
Post a Comment