I've been wanting a way to profile bits of code for a while, so that I could judge the efficiency of one version of a code module against another in terms of the number of CPU cycles consumed, as well as bytes of storage needed. I manually count cycles as I'm coding, but sometimes it gets a bit fraught when there are loops to contend with that have flag dependencies - in the end, the only way to be certain is to run the code and tally cycles. I wrote a 6502 emulator in C# about four years back which did on-the-fly cycle counting as it executed instructions, but then an inopportune disk crash set me back a couple of versions and I lost heart in trying to reconstruct it - so I've been thinking of another way of generating the profiling data.
Just the other day, it occurred to me that I could use the so-far-untouched VIA #1 in the VIC-20 as a rudimentary profiler. The Timer A counter runs in step with the system clock, which is a feature we already make good use of by synchronising VIA #2 with the VIC raster counter to generate the extra display line at row 25 via IRQ. So it suddenly seemed obvious that if we used the timer on VIA #1 and set it to its' maximum value when we wanted to start counting cycles, and then read the value it had counted-down to when we wanted to stop profiling, then a simple subtraction of that value from the maximum would yield the number of cycles that it had counted - and thus, by further subtracting the number of cycles we know it takes to start and stop the counter, whatever is left must be the number of cycles that were executed by the CPU.
So we first need a routine to start the profile timer - it's absurdly simple, as we just have to set the Timer A countdown start value in order to set the timer running:
strtprof SUBROUTINE
PHP ; [3] push .SR to Stack
PHA ; [3] push .A to Stack
LDA #%01000000 ; [2] timer #1 free run, timer #2 clock Ø2, sr disabled, a/b latches disabled (#$40)
STA _V1ACR ; [4] set VIA #1 auxiliary control register
LDA #$FF ; [2] timer #1 frequency lo/hi-byte
STA _PROFTIML ; [3] ZP set profiler counter lo-byte
STA _PROFTIMH ; [3] ZP set profiler counter hi-byte
STA _V1T1LL ; [4] set VIA #1 timer #1 lo-byte latch
STA _V1T1LH ; [4] set VIA #1 timer #1 hi-byte latch
PLA ; [4] pull .A from Stack
PLP ; [4] pull .SR from Stack
RTS ; [6]
We also initialise a pair of Zero Page bytes which will hold the result when we stop counting. You might notice the Stack instructions to push and pull the Accumulator and Status Register as we enter and exit the routine - this just makes sure we don't disrupt whatever code is running when we start the profiler. Stopping the profiler has its own routine:
stopprof SUBROUTINE
PHP ; [3] push .SR to Stack
PHA ; [3] push .A to Stack
LDA #$FF ; [2] timer start value
SEC ; [2] set Carry
SBC _V1T1LL ; [4] subtract VIA #1 timer #1 lo-byte latch
SBC #$21 ; [2] subtract #33 for cycles used by profiler
STA _PROFTIML ; [3] ZP set profiler cycle count lo-byte
LDA #$FF ; [2] timer start value
SBC _V1T1LH ; [4] subtract VIA #1 timer #1 hi-byte latch
STA _PROFTIMH ; [3] ZP set profiler cycle count hi-byte
PLA ; [4] pull .A from Stack
PLP ; [4] pull .SR from Stack
RTS ; [6]
So this really just reads the current Timer A value and stashes it in the ZP bytes, after subtracting the number of cycles we expended in returning from strtprof and then in getting to the point in stopprof where we query the timer. Subtracting that from the timer start-value ($FFFF, or 65535) gives us the number of cycles that the CPU executed in between - and thus, the cycle-count of whatever code we executed between the calls to the profile start and stop routines. Once again we stash .A and.SR on the Stack as we enter, since we change both inside the routine and therefore want to restore them nicely when we leave.
It's a one-shot mechanism - if you invoke it multiple times, you only count from the most recent invocation and there's no accumulator facility - but it does the job. As an example, it's told me that invoking drawdrty with a dirty row at line 25 takes 11720 cycles to refresh - well within the IRQ frame time, which is exactly as I'd planned.
No comments:
Post a Comment