Would a cache miss would be slower than the 17 compares for the r/w and address plus figure out what to do with it?
If you're using a switch statement, the compiler should generate at most 5 compares (binary search), or two (table lookup). The real question is, how would this affect the available cache for the cases where it really helps? If you push out all the often-accessed data with these huge tables, the CPU can really slow down.
Can't the compiler or a profiler to do something to optimize for a CPU with humble cache?
If a CPU with a humble cache could work just as well, then why would a CPU have anything more?
I can see how WRAM which allows both r/w can be decoded very quickly but $4000 onwards is trouble. Is the pointer idea is really as bad as you make it out to be?
Only way to really find out is to profile the two. I tried changing the page size from 2K to 1 byte in my emulator and it uses 460% more CPU time (5.6x slower). This is on a PowerPC G3 with 32K data cache and 1MB secondary cache. The table size goes from 128 bytes to 256K, and I only use it for memory reads.