So I'm tinkering with some little snippets of code, playing with a couple of ideas and trying to gauge which is faster and/or smaller - I've got 'fast' with a 50-byte lookup table, 'faster' with a 25-byte table (but bigger code) and 'really fast', but with a table I'm not entirely sure about yet and actually I'm not completely certain that the routine works properly anyway... There's some breakpoint-setting and tracing going-on, and a good bit of playing with the profiler routine I wrote ages ago to assess cycle-counts, and it was whilst I was faffing-around in the XVIC monitor measuring stuff that I noticed a New Cool Thing that arrived in VICE 2.4 - the Stopwatch.
Now this is very, very cool. Way back in, ooh, 2009 or 2010 I think, I dropped a line to some email address or other relating to the VICE team, saying that it would be really useful if there was some way for the emulator to report elapsed cycles in the monitor as code was executing, as well as the scanline counter it already showed. I never got a response, but then it's entirely possible that I emailed an address that had been disused since Babbage gave up on the Difference Engine, so I didn't really give it another thought. And I doubt very much that it was my plaintive little plea that has resulted, four years later, in this cool new feature appearing in the emulator, but I don't care - it's there!
Now this is a good thing, and a bad thing. It's good because as of this release, VICE tells you exactly how many cycles any given piece of code has consumed during its' execution - either since the emulator started, or since you last reset the Stopwatch (i.e. at a breakpoint). Actually, I can't stress this enough: it's not just a good thing, it's a great thing. But it's also a bad thing, because at a single stroke it renders my carefully-crafted VIA profiler code utterly redundant. Well, maybe not entirely, because it'll doubtless come in handy sometime when I'm running code on a machine outside of the emulation environment - but as of right now it is entirely superfluous. Which, thinking about it, is also a good thing, because that means I can rip the profiler logic out of the build and also release the two ZP bytes it uses. Hurrah!
Remember back in, maybe, Post #3 or thereabouts I talked about comparing my RAMTAS times to the original Commodore ROM timings? No excuse not to now:
- Commodore ROM - RAMTAS entry point $FD8D: tests RAM from $0400 to $7FFF, initialises pages 0, 2, and 3 to zero, tracks the end of contiguous memory (i.e. the first empty block after $2000), ignores BLK5 ($A000) if it's RAM. Total cycles if RAM fully populated: 2,386,249
- VIC++ ROM - RAMTAS entry point $C005: tests all memory from $0000 to $BFFF, initialises all memory to zero, tracks populated memory zones (doesn't care if there are 'holes'), includes BLK5 if populated with RAM, tests Colour RAM. Total cycles if RAM fully populated: 1,868,208
That's 'new toy' number one; the second is also something I stumbled across completely by chance over the last weekend, during a spot of rummaging around on the 'net when I was looking for the opcode for LAX Immediate. I have no recollection of the chain of sites I went through, but I remember browsing a thread on AtariAge which mentioned in passing something called the WUDSN IDE. Curious, I searched for it and discovered that it's a really neat plug-in for the Eclipse IDE which gives you a 6502-compatible set of tools, syntax formatting and colouring, and hooks to just about every cross-assembler that exists on the PC - originally designed for the modern Atari 2600 Developer community, it has been extended to support a few other environments including the C64 (and, by extension, the VIC-20). Incidentally, I also found a potentially good-looking plug-in for Visual Studio 2010 called Vintage Studio - it purports to do much the same, but it doesn't seem quite as polished as WUDSN and appeared to need a fair bit of work to actually get it up-and-running - so even though VS2010 is my IDE of choice and I've never used Eclipse, I decided to see whether WUDSN was as good as the YouTube videos implied. If anyone has managed to get Vintage Studio installed and working, please do let me know.
Half-an-hour later I'd installed Eclipse and the Java runtime (not especially appealing, but Eclipse won't run without it) and figured-out how to get the WUDSN plug-in installed - Eclipse has changed a bit since the installation instructions were written. I got DASM connected so that I could assemble the rudimentary snippet of code I'd bashed into a new document, and to my delight it Just Worked. I spent another half-hour playing, and then got stuck - I needed a way to do pre- and post-build tasks either side of the actual assembly stage, and it didn't seem obvious how to do that. I've since found some information on how to make it happen (it looks like my lack of familiarity with Eclipse is the problem rather than an issue with the plug-in) so when I get a moment I'm going to dive back in and see if I can get it working as I'd like. If I do, I may very well switch from Notepad++ and a couple of batch makefiles and shift the toolchain over to Eclipse/WUDSN completely.
However, I'm not there yet, so the work I'm doing on ZP optimisation continues to forge ahead using the tried-and-trusted tools I've been happy with so far. As I mentioned at the beginning of this post, I'm hip-deep in various bits of test code, looking for that particular configuration which behaves how I want, looks good, and feels right. So far I've managed to increase free ZP space from 16 bytes to 44 - which is most gratifying - and I'm not done yet. Next time, I'll show you whichever bits of code make it through the selection process and replace the existing dirty-row set/clear logic.
TTFN.
EDIT: in case you're interested, here's a complete table of cycle times for the Commodore and VIC++ RAMTAS passes for each memory population showing the speed increase of the latter:
That BLK0 result for VIC++ is interesting, isn't it? It's not actually as anomalous as it looks. The Commodore ROM is smoothing-out the performance hit between unexpanded and +3K because it's doing a memory-increment test all the way which is fairly consistent - but VIC++ uses an indexed test, which is very quick over Zero Page and then slows down to a constant rate over the rest of memory. So there's a statistical dip when it starts counting expansion-block RAM (although it's still over 20% faster for BLK0) but that is then quickly turned into an ever-increasing improvement over the Commodore routine as memory population size increases.
EDIT: in case you're interested, here's a complete table of cycle times for the Commodore and VIC++ RAMTAS passes for each memory population showing the speed increase of the latter:
Memory CBM VIC++ Cycles Faster % Faster
Unexpanded 502,774 289,401 213,373 42.4%
BLK0 (3K @ $0400) 542,691 424,714 117,977 21.7%
BLK1 (8K @ $2000) 1,157,229 785,587 371,642 32.1%
BLK2 (8K @ $4000) 1,771,757 1,146,460 625,297 35.3%
BLK3 (8K @ $6000) 2,386,249 1,507,331 878,918 36.8%
BLK5 (8K @ $A000) n/a 1,868,208 n/a n/a
Note: The Commodore RAMTAS routine is oblivious to RAM in BLK5, so there's no difference in the cycle time between BLK3 and BLK5 configurations. Also, technically, VIC++ is non-functional without BLK0 populated, but I can still show the RAMTAS time even though the OS will halt afterwards.That BLK0 result for VIC++ is interesting, isn't it? It's not actually as anomalous as it looks. The Commodore ROM is smoothing-out the performance hit between unexpanded and +3K because it's doing a memory-increment test all the way which is fairly consistent - but VIC++ uses an indexed test, which is very quick over Zero Page and then slows down to a constant rate over the rest of memory. So there's a statistical dip when it starts counting expansion-block RAM (although it's still over 20% faster for BLK0) but that is then quickly turned into an ever-increasing improvement over the Commodore routine as memory population size increases.
I'm a big DASM fan, but of course it's not really being maintained and I don't have the time to pick it up myself. I might have a look at Kick, just to see whether it's got anything that really makes me think it's worth switching and doing the inevitable source tweaks to fit its' syntax checker.
ReplyDelete