Wednesday, 29 February 2012

In Which We Discuss RAMTAS And Make A Start


During VIC-20 initialisation, the Commodore ROM executes a moderately labyrinthine chunk of code known as RAMTAS - Random Access Memory Test And Set. This routine has the following goals:
  1. Test almost all memory present in the system (except the first 1K, and the 8K block at $A000)
  2. Initialise Page 0, 2 and 3 to zero
  3. Find the end of contiguous RAM
  4. Set variables according to which RAM expansion(s) is/are present
You'll have spotted some obvious peculiarities with these goals, no doubt - RAMTAS doesn't test Pages 0-3 of RAM, despite Page 0 being critical to CPU Zero Page addressing, Page 1 being the CPU Stack, and Pages 2 and 3 being used for a variety of KERNAL and BASIC variables and other storage. It only initialises Pages 0, 2 and 3 to zero, ignoring Page 1 again and all other memory - which could present problems because the nature of these memory chips means they might have random junk in them at power-up; and although (most of) the rest of memory is tested, the test value is left behind afterwards instead of clearing each byte.

Furthermore the OS can only manage contiguous memory, so if there's a 'hole' anywhere above $2000 where memory isn't present, or where something else exists like the 4K Character Generator ROM at $8000, then anything above that is ignored. Look at this diagram showing how Commodore lay out the VIC-20 memory map - note that the 8K block at $A000 is labelled 'Expansion ROM', which is misleading  because you can actually put RAM in there, but due to the 4K Character Generator ROM and VIC/VIA chips mapping into $8000-$9FFF this RAM can't be used directly by the Commodore OS (you have to write code specifically to use it).

As a last kick in the teeth, the Commodore design can't handle both the 3K block at $0400 and the 8K block(s) from $2000 being populated simultaneously - if the 8K block has memory in it, the 3K block is 'lost'. This is also specifically due to the need to have contiguous memory and because the OS can't work around the existence of Screen RAM in the middle of the memory area...

RAMTAS is also a bit slow. We'll compare timings both in CPU cycles and wall-clock time later, but you can get a feel for how long things take just by firing-up VICE with a variety of memory configurations and measuring how long it takes to go from a blank screen to seeing the 'Ready' prompt. Now admittedly the BASIC initialisation code takes a finite amount of time after RAMTAS and the other initialisation routines have finished, but the lions' share of the 'boot' delay is due to what RAMTAS is doing, and how its' doing it. It's not bad code by any means, but it was probably written with the aim of minimising ROM space occupation rather than the test duration - it's quite compact, if a little spaghetti-like in places, but not as quick as it could be.

The other thing RAMTAS is a bit light on is error reporting - in fact, it hasn't got any. The two chunks of RAM that are critical for VIC-20 operations are the 1K and 4K built-in blocks running from $0000-$03FF and $1000-$1FFF - RAMTAS doesn't test the first 1K at all, and drops into an unhelpful spinloop if a test fails in the higher 4K; neither of which is at all useful if you have a flakey or dead RAM chip in those ranges - if the first 1K has a problem you won't know about it until the machine starts behaving oddly as various bytes stop working properly; and if anything in the 4K block is dodgy, the machine just hangs during boot and offers no clue as to what the problem is (you can observe the exact same 'symptom' if any one of the CPU, VIC, VIAs, RAM or PSU is failing, amongst other things).

So let's start by defining our new RAM test-and-set goals:
  1. Be as small and fast as possible
  2. Test all memory, reporting any critical error (i.e. in the 1K and 4K blocks)
  3. Initialise all memory to zero
  4. Map which memory blocks have RAM in them
  5. Define system variables such that the new OS can use as much memory as is present
That's a good set of objectives, so let's begin by looking at the very first few instructions of the new VIC++ ROM:

.reset
    SEI                ; [2]   disable interrupts
    CLD                ; [2]   clear decimal flag
LDX #$FF ; [2]
TXS ; [2] set stack pointer
    LDA #$05            ; [2]
    STA .SCRNCOL        ; [4]   set VIC register for green border
INX ; [2] ZP memory test location index (X = #$00)
; 11 bytes 16 cycles

Yep, there it is - the first 11 bytes of spanking-new code. It's not especially exciting, but there are a couple of things you'll notice are different from the original; firstly, we turn off interrupts and decimal-mode, and initialise the Stack Pointer, right away (always good practice); and then, secondly, set the screen border to green so the user knows the CPU and VIC chips are working - we're already making-sure there's some feedback coming from the system for diagnostic purposes. The X register is then set up for the next bit of code.

I'm always striving for the fastest and smallest 6502 code I can get, which is why I put a comment after the code block tallying what I used - those numbers in square brackets on each line state the cycle count for the instruction. Oh and that .SCRNCOL label is defined in a separate file as an EQU (equate) which references $900F, the VIC screen-colour register - until we tell the VIC what to display, it just draws border-colour over the whole display, so we get a green screen.

Wanna see the next bit? Oh go on then:

.nextzp
    LDA #$FF            ; [2]    first test bit-pattern (%11111111)
.zptest
    STA $00,X           ; [4]    ZP store pattern at ZP location with X offset
    CMP $00,X           ; [4]    check it
    BNE .critfail       ; [2/3]  store failed, game over
    ADC #$00            ; [2]    add 1 in carry flag for second test bit-pattern (000000)
    BEQ .zptest        ; [3/2]  loop around to test next pattern
    INX                ; [2]    increment location index
    BNE .nextzp        ; [3/2]  loop around to test next location
; 15 bytes 9215 cycles

What this does is test-and-set Zero Page (Page 0) in a loop of its' own. We're going to test all the rest of the memory in the system using just one other loop, but we need to use a couple of Zero Page (ZP) locations for that, so we have to make sure they're working before we do anything else. And since we're testing a couple of ZP locations in a loop, we might as well extend the loop and test the whole 256 bytes - we're using the fast ZP-addressing-mode to do the set and test, so it's as efficient as it can be.

The original Commodore ROM uses a reduced variant of the standard Walking Bit Pattern test, where each byte is alternately loaded with binary 10101010 and 01010101 in order to check that each bit doesn't have any 'stickiness' (i.e. can't switch from '1' to '0' or vice-versa). But I'm using a slightly more elegant technique in which the value 11111111 is loaded first, and then replaced with 00000000 - which achieves the same result in terms of checking individual addresses for bit-stickiness, and gives us a reasonable (if simple) test of the bytes' ability to hold values whilst simultaneously finishing with it initialised to zero.

You might be thinking that almost 10,000 cycles to test just the first page (256 bytes) of memory seems a lot, but it's not really; on a 1MHz 6502, that equates to a fraction under a 100th of a second and is really about the fastest way to do it - as you'll see in a moment, using ZP addressing makes quite a difference compared to the indirect addressing mode used for all the rest of the installed memory. Let's see:

    LDA #$1F           ; [2]
STA .EXPBITS ; [3] ZP set expansion RAM bitmap (011111)
LDY #$00 ; [2] memory test location lo-byte index
; 6 bytes 7 cycles

.nextpage
INC .TESTHI ; [5] increment memory test address (first address is $0100)
.nextloc
LDA #$FF ; [2] first test bit-pattern (%11111111)
.loctest
STA (.TESTLO),Y ; [6] store pattern at indirect test address with Y offset
CMP (.TESTLO),Y ; [5] check it
BNE .testfail ; [2/3] store failed (loop exit, always fail at $8000 and $C000)
ADC #$00 ; [2] add 1 in carry flag for second test bit-pattern (000000)
BEQ .loctest ; [3/2] loop around to test next pattern
INY ; [2] next location on current page
BNE .nextloc ; [3/2] loop back to first pattern
BEQ .nextpage ; [3/3] resume testing at next page
; 19 bytes, 10759 cycles per page

That first little 6-byte block sets-up something new - an expansion RAM bitmap that indicates which memory expansion blocks have actually got RAM in them. It contains 5 bits, arranged from right to left, where each bit represents an expansion block. There are 5 blocks, beginning with the 3K block at $0400, running through the three 8K blocks at $2000, $4000 and $6000, and finishing with the extra block most commonly filled with cartridge ROM (but able to accept RAM as well) at $A000:

[ unused / unused / unused / $A000 / $6000 / $4000 / $2000 / $0400 ]

We begin by assuming all expansion blocks have RAM in them and set the bitmap value to $1F which corresponds to a bit-pattern of '00011111'. As we chug through the memory map doing the tests, this bitmap has bits switched-off when a block of memory contains a byte which fails the test. This might be because the RAM has a fault, or, more likely, because there isn't RAM in the block (by definition, if the block is empty it can't have any addresses with bytes which can pass the test).

The Y register is set to zero because it's the main index variable used in the test loop - we use a 2-byte ZP pointer at .TESTLO to hold the current memory address under test, but actually never increment the low-byte because we use the Indirect Indexed addressing mode where Y is an offset from the base address. So, having tested ZP and left all bytes there with a zero in them, we increment the .TESTHI byte of the test address so it holds $0100 and then commence looping through all addresses from there to $01FF in a similar vein to the first ZP test loop. There are two key differences though - the first being that if we get a failure it's not necessarily a disaster - we anticipate the probability that some addresses will fail because they don't contain RAM, and in fact we know that the tests must fail at $8000 and $C000 because that's where the Character Generator and OS ROMs live.

The second difference is that we aren't only testing one page in this loop, so when Y wraps from $FF to $00 we know we've finished the current page and can therefore increment .TESTHI and start again from zero. After testing $01FF, we restart the loop at $0200 and continue doing this until we reach the end of testable memory - the .testfail routine, as you'll see, knows that when we fail at $C000 there won't be any more RAM in the system after that, and does not return to the main test loop here.

This loop takes a minimum of 10,759 cycles to test each page, unless a failure occurs - in which case some more cycles will be consumed in the .testfail routine while we figure-out whether the failed byte is critical or not, and which bit in the expansion bitmap should be turned-off. Remember what I said about ZP addressing being faster? We test all 256 ZP bytes in just over 9,000 cycles, but every other 256-byte page takes close to 11,000 - this is why ZP addressing is prized as a 6502 programming technique.

Having said that, even though this routine actually does a little more than the original Commodore code, it's still notably faster - a simple wall-clock test in which I start VICE with all blocks completely unpopulated takes about half-a-second with the Commodore ROM, and is virtually instantaneous under VIC++. With all blocks populated, it takes approximately 3 seconds under the Commodore ROM, and a shade under two with VIC++ (and, of course, the VIC++ expansion bitmap says there's 35K of additional RAM available, whereas the Commodore OS says there's only 24K).

All we really have to do now is handle the failure scenarios.

No comments:

Post a Comment