It is currently Sat Jun 24, 2017 1:49 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu May 18, 2017 9:21 am 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 846
I've been trying to work out in my head how to debug and tracelog solely from cart-edge signals. Obviously, one could simply fully-replicate the 6502 state. That would take too many FPGA resources, I believe, so I'm trying to pare it down first. This will also save me having to implement an entire 6502.
What seems key is maintaining a working knowledge of T₁, that is, when an instruction fetch happens: this allows you to know it's an opcode fetch, and what, if you must; it certainly seems helpful to know BRAnch, JMP, JSR, BRK for tracing.

My first inclination was "count argument bytes, next fetch is opcode, too". It didn't take long to come up with some problem cases: BRA [-1,0,+1], or
Code:
   LDA next,y  ;where y = 1
next:
as the dummy fetch will seem to be at the right address, but won't be an opcode fetch. Then, of course, the knowledge of "is-opcode-fetch" is lost. (The page-boundary fixup cycles are moderately-easily-detectable, on loads nor branches: loads, they're less-than; branches, out-of-range, though care will need be taken to spot a -128 branch)

NMIs may be another problem, at least, if one isn't allowing an assumption that $FFFA-FFFF is never executed. For a debugger, you don't want assumptions like that. However, if one is logging stack accesses, a [surprise] triple-push of PCH, PCL, P (which is a triple-write, if I'm understanding things, something that the processor never does, else) would seem to be a good tripwire for "now interrupting" and is followed by reading the destination vector. (BRK and IRQ are less problematic; IRQ will be asserted by mapper hardware and is thus interceptable; if we maintain opcode fetch parity knowledge, BRK is just another known.)

[hr]
So perhaps I should attack it from a different angle than keeping track. Log the last four reads, keeping track of where, particularly when they're sequential.
Hypothesis: Three sequential reads must include at least one opcode/operand fetch.
(I'm trying to think of("fuzz") how to set up a degenerate["pathological" was the word that timed my post out trying to remember] case for executing through the stack page…hmm…I think a JMP (label) label: would confuse, but there's a simpler triple-read: RTI (an RTS precisely into stack will also trip this.)
ldy #1 .org $4A ldy ($4C),y .db $4E, $00, $any, $any provides 6 nonopcode-fetches. An RTI in-stack will hit 5.

(These are all problems for "keep memory of what's the opcode" too.)

One needs really only trap branches-taken(hard), interrupts/BRK (easy), JSR (easy-ish), JMP (harder, no stack-write evidence)…argh.

I'm pretty sure there's a pathological case for every simple comparison I've come up with so far…
catching a JMP seems easy prima facie (read #$4C, read #$lo, read#$hi, read $hilo) , but the above case would trip it incorrectly.

JSR should be catchable by any double-descending-write that is not followed by the P push an interrupt does…should. Does an indexed y/x=FF pagebreak trigger? No, that's an extra descending read, not a write.

---
ed2:
Pathological cases for a Branch-detector:
Code:
.org $100
ldx #$06
txs
clc
bcc 0
pha ;gets read twice or three times, instead of 1-2 for a branch to +1


Last edited by Myask on Thu May 18, 2017 10:28 am, edited 2 times in total.

Top
 Profile  
 
PostPosted: Thu May 18, 2017 9:43 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18510
Location: NE Indiana, USA (NTSC)
I know the Missile Command hardware special cases the (d,X) addressing mode for frame buffer access and turns it into a read-modify-write sequence on the underlying memory. How does it do that?


Top
 Profile  
 
PostPosted: Thu May 18, 2017 10:29 am 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 846
I, uh, haven't implemented one yet.

and if I implemented one on PowerPak it would not help understand any mapper hardware, technically, as I'd have to implement the mapper myself! Only on a game-genielike thing (which, I suspect, Memblers already has managed) would one be able to do it with arbitrary mapper…though that would also deny you simple access to mapper regs.


Top
 Profile  
 
PostPosted: Thu May 18, 2017 10:36 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 5835
Location: Seattle
The plain-jane 6502 includes a SYNC output that lets an external device know when it's fetching an opcode vs data.

From the Missile Command Service Manual:
SYNC - Signal generated directly from the microprocessor that occurs at the beginning of an instruction read cycle and lasts one cycle of micrprocessor φ2 clock. Five φ0 cycles later, MADSEL goes high.


Top
 Profile  
 
PostPosted: Thu May 18, 2017 11:54 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3438
Location: Indianapolis
Not too long ago I also wanted to make a similar code/data discriminator fit in a CPLD, but I also hit that same brick wall of branches requiring that it know the status flags, which pretty much leads to emulating the whole 6502. Might as well! Most recently I've been experimenting with the iCE40HX4K. I tried Arlet Otten's 6502, only synthesizing it by itself though, and it used about 1400 of 3500 logic cells. This post shows that core should be under 500 LUTs + some block RAM on a Spartan 2. Seems pretty small, maybe it could fit in the PowerPak?

Your suspicion was correct, I am working on a game-genie-like device. :) As usual, I've redesigned the thing 5 different ways before starting it, but what I have managed so far is to settle on a design, and making progress on a board layout.


Top
 Profile  
 
PostPosted: Mon May 22, 2017 12:14 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 174
Question, why does the cart need to know so much detail? Surly it is connected to a PC of some sort to display said info, to which said PC could also have its own copy of the ROM, probably with a labels file for better display?

at which point the PC,R/W and data bus are all you need, were the PC could be encoded so that it only send the full 16bits if it is not a +1

Then you can count out access cycles to find the OpCodes.
You can use the I switched from Write to Read, which says you are on an OpCode fetch
Check the VECTOR pulls for the IRQ/NMI and Pushes in the middle of an opcode Read cycle - knowing that the IRQ flag has gone high is not enough, it will let you know it is coming, but not when the CPU actually reacts to it
For branches, the next opcode might be loaded a couple of times, so if you have a PC, PC + 1 not taken PC,PC,PC +/- then branch was not taken so basically you need to keep the last 2 PCs, and see if they change. Either way when the PC changes again you are on your opcode.

However can you trap AEC on the cart, to know when a DMA process kicks in to steal cycles?


Top
 Profile  
 
PostPosted: Mon May 22, 2017 2:10 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3438
Location: Indianapolis
If the cartridge was able to know which fetch is an opcode, assuming it's not running from internal RAM, then you could instead feed it a JMP or something into your own single-stepping code.

If you tracelog one frame that would be about 117kB (if simply logging 4 bytes per frame) and that's reasonable, but can't be done on anything that exists (yet..!). I tend to think of this in terms of frames, because the NMI seems like an obvious entry/exit point. Seems fairly workable if you can tell the FPGA to begin logging once it hits a certain condition or breakpoint, and only upload data to the PC for the interesting part.

2A03 does not have an AEC signal, but it doesn't seem like it would be too hard to detect a DMA access. When it writes $4014, then you can count the next 513/514 cycles as DMA. Memory locations for the DPCM samples could be determined by the most recent address and length register writes. There is virtually no reason that anyone would read DPCM data manually with the CPU, so that seems safe enough.


Top
 Profile  
 
PostPosted: Mon May 22, 2017 5:00 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 174
Ah the DMA logic is all internal on the NES.

If you have a break point though, you can assume that point is a valid opcode. To which you can then take it that you step code exits when the opcode is complete and hence the next byte will be a valid opcode?


Top
 Profile  
 
PostPosted: Mon May 22, 2017 6:44 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18510
Location: NE Indiana, USA (NTSC)
Oziphantom wrote:
If you have a break point though, you can assume that point is a valid opcode.

Not necessarily. Setting a breakpoint against a previous build, adding 1 byte of code before the breakpoint, rebuilding the ROM, and then running the code again will likely cause the breakpoint to become on an operand instead of an opcode.


Top
 Profile  
 
PostPosted: Mon May 22, 2017 2:24 pm 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 846
Oziphantom wrote:
Surly it is connected to a PC of some sort to display said info
THe idea was to not do that, but to create Powerpak firmware [etc.] that would allow the same sort of features as a debugging emulator without actually having to have any more hardware than that and your NES.

That the design also work for an interposed GG-like device is a bonus.
Quote:
at which point the PC,R/W and data bus are all you need, were the PC could be encoded so that it only send the full 16bits if it is not a +1
Program Counter is not a signal that leaves the CPU, AFAIK, only CPU_addr, which is a thing that can be loaded with the Program Counter.

Also, you're using two different "PC" acronyms in the same post. Hazardous!
Quote:
You can use the I switched from Write to Read, which says you are on an OpCode fetch
I thought of that, but nope. IRQ/NMI/BRK/JSR are all counterexamples. (Someone have an actual IRQ cycle timing? 'cause obviously it's not going to advance past BRK and its "argument"…)
Quote:
However can you trap AEC on the cart,
trap what?
You could detect DMA with just the RWRW alternation, as no 2-cycle instructions have a write. If you weren't just watching for $4014 writes, which is cheap enough if you're watching for $4016 write/reads to have a controller status to hook out to debugger from. (One probably wants to record all register accesses, anyway.)
Memblers wrote:
Not too long ago I also wanted to make a similar code/data discriminator fit in a CPLD, but I also hit that same brick wall of branches requiring that it know the status flags, which pretty much leads to emulating the whole 6502.
You don't have to drive R/W, don't have to drive CPU_data, can not compute high address byte (just a carry-into)…is Otten's core a complete, properly-functioning one? That post says many of the cores are still WIPs.
Quote:
If the cartridge was able to know which fetch is an opcode, assuming it's not running from internal RAM, then you could instead feed it a JMP or something into your own single-stepping code.
This is one of the best reasons for pre-knowing T₁ (is_opcode_fetch), as it lets you do that. To explicitly point out the benefit, it means branching off without producing a stack write.

Unfortunately it can be running from internal RAM, making that a different hazardous, so one might just be stuck with "preemptively keep a copy of the stackpage to restore to".

Quote:
If you tracelog one frame that would be about 117kB (if simply logging 4 bytes per frame) and that's reasonable, but can't be done on anything that exists (yet..!). I tend to think of this in terms of frames, because the NMI seems like an obvious entry/exit point.
4 bytes per frame? do you mean, per branch? per PC-change? number seems to indicate per PC-change, including basic incrementation, including argument reads, which…why? It seems an exceptionally redundant and
Quote:
There is virtually no reason that anyone would read DPCM data manually with the CPU, so that seems safe enough.
A debugger has the issue that one often is debugging something that isn't working correctly, so you don't get to make assumptions like "won't execute $FFFA-$01FF" or "won't execute a KIL/STP" or "won't invoke PPU master mode".

I'm pretty sure manual PCM loaders exist; for e.g. speech-synth.


Top
 Profile  
 
PostPosted: Mon May 22, 2017 11:36 pm 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 174
Myask wrote:
Oziphantom wrote:
Surly it is connected to a PC of some sort to display said info
THe idea was to not do that, but to create Powerpak firmware [etc.] that would allow the same sort of features as a debugging emulator without actually having to have any more hardware than that and your NES.

That the design also work for an interposed GG-like device is a bonus.
Quote:
at which point the PC,R/W and data bus are all you need, were the PC could be encoded so that it only send the full 16bits if it is not a +1
Program Counter is not a signal that leaves the CPU, AFAIK, only CPU_addr, which is a thing that can be loaded with the Program Counter.

Also, you're using two different "PC" acronyms in the same post. Hazardous!

Yes I'm am using PC as the same as ADDR my bad. Sure but spend $5us and put a raspberry Zero on it, probably be a lot cheaper than any CPLD large enough to handle it, although it will need a power pack upgrade ;) OR keep a pet 6502 on board, if you drop the need for illigals the 65C02 would do, you can buy them new. Just user buffers to block its writes. although you would need to delay it properly during DMA... it really is damned if you do dammed if you don't. Its always so annoying when you just need signal Y and it would be so easy to get, but they just don't give it to you isn't it...
Myask wrote:
Quote:
You can use the I switched from Write to Read, which says you are on an OpCode fetch
I thought of that, but nope. IRQ/NMI/BRK/JSR are all counterexamples. (Someone have an actual IRQ cycle timing? 'cause obviously it's not going to advance past BRK and its "argument"…)
As far as I know, the IRQ/NMI is identical to BRK, they also take 7 clocks, but don't set B flag and don't increment the PC during the first 2 cycles, and thus it is
Code:
 BRK

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PC     R  fetch opcode, increment PC
        2    PC     R  read next instruction byte (and throw it away),
                       increment PC
        3  $0100,S  W  push PCH on stack, decrement S
        4  $0100,S  W  push PCL on stack, decrement S
        5  $0100,S  W  push P on stack (with B flag set), decrement S
        6   $FFFE   R  fetch PCL
        7   $FFFF   R  fetch PCH

 IRQ/NMI

        #  address R/W description
       --- ------- --- -----------------------------------------------
        1    PC     R  fetch opcode
        2    PC     R  fetch opcode
        3  $0100,S  W  push PCH on stack, decrement S
        4  $0100,S  W  push PCL on stack, decrement S
        5  $0100,S  W  push P on stack (with B flag clear), decrement S
        6   $FFFE   R  fetch PCL
        7   $FFFF   R  fetch PCH
For this case I was thinking the "trap vector reads" rule would kick in, but damn that JSR! Another horror case for the FSM is when the 6502 losses the NMI due to IRQ...
Myask wrote:
Quote:
However can you trap AEC on the cart,
trap what?
You could detect DMA with just the RWRW alternation, as no 2-cycle instructions have a write. If you weren't just watching for $4014 writes, which is cheap enough if you're watching for $4016 write/reads to have a controller status to hook out to debugger from. (One probably wants to record all register accesses, anyway.)
Memblers wrote:
Not too long ago I also wanted to make a similar code/data discriminator fit in a CPLD, but I also hit that same brick wall of branches requiring that it know the status flags, which pretty much leads to emulating the whole 6502.
You don't have to drive R/W, don't have to drive CPU_data, can not compute high address byte (just a carry-into)…is Otten's core a complete, properly-functioning one? That post says many of the cores are still WIPs.
Address Enable Control - its the line you use to Tri-state the CPU thus enabling you to perform DMA. Which I now see is not exposed on the NES..


Top
 Profile  
 
PostPosted: Tue May 23, 2017 12:13 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 5835
Location: Seattle
AEC isn't on the 6502 either, that's a 6510 addition. That's what I was talking about in the other thread about the difference between the VIC-20 and the C64. (The VIC-20 has the array of 74'245s to handle the equivalent functionality)

I was recently introduced to the C64's Final Cartridge 3 which has a few neat tricks that might be useful? Maybe? Although I guess it basically boils down to "trigger an IRQ or NMI, detect when the vector read happens, and intercept the handler"


Top
 Profile  
 
PostPosted: Tue May 23, 2017 1:39 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 174
Yeah, I though they put the Audio and Video in one chip so would have added an `AEC` ( or some other name ) as they have DMA ;) And the NES is after the MAX machine. But no they put it all inside and didn't put the signal on the cart so others could add extra DMA - disappointing, although very little use for DMA on a NES CPU side I guess, except to do this... and if they didn't put the audio on die we could drop a 6510 into the NES to get it GAHHHHHHHHHHHHH
Sorry I'm still stuck in the world of Commodore where you wanted you to be able to do things with their machines...

Which part of the EF3? The Action replay/Retro replay/Final Cart 3 emulation AKA Freezer Cartridges? Or the Ultimax1 mode KERNAL2 SWAP?
The 64 is lucky as it has the VIC which will do a "bad line" and thus force BA/AEC lo for 40 clocks, which also handles the 3 clock delay. So you wait for the VIC to pull BA/AEC lo, then you pull NMI lo, switch to Ultimax Mode which lets you replace the KERNAL ROM, which gives you control of the vectors. When the VIC releases BA/AEC the 6510 has to be in a read cycle and thus the NMI will be immediately taken.

The 1541-11U+ has a lot more features, and does better take overs, DMA injection etc Gideon ( the designer of the 1541-11U ) made a PDF on his better take over findings here, http://rr.c64.org/rrwiki/images/6/67/Sa ... he_c64.pdf sadly they use the BA/AEC/DMA line which is missing on the NES :(

There is an idea, if you want to step you could just force NMIs and take over with a ROM swap, taking care to note if there is a NMI on the bus that is not yours...looks up nes cart edge connector OH COME ON

1Ultimax mode is based upon the Commodore MAX machine, it places the C64 into a memory map of
0000-07FF RAM
0800-7FFF Open Bus
8000-9FFF 8K ROM
A000-CFFF Open Bus
D000-DFFF I/O
E000-FFFF 8K ROM

2Somebody in Commodore marked a manual by mistake with KERNAL instead of KERNEL and well it stuck and was printed and became a "standard"


Top
 Profile  
 
PostPosted: Tue May 23, 2017 4:12 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3438
Location: Indianapolis
Myask wrote:
You don't have to drive R/W, don't have to drive CPU_data, can not compute high address byte (just a carry-into)…is Otten's core a complete, properly-functioning one? That post says many of the cores are still WIPs.


I don't know if it's been 100% audited, but I've noticed some people on 6502.org are using it in their projects.

Quote:
Quote:
If you tracelog one frame that would be about 117kB (if simply logging 4 bytes per frame) and that's reasonable, but can't be done on anything that exists (yet..!). I tend to think of this in terms of frames, because the NMI seems like an obvious entry/exit point.
4 bytes per frame? do you mean, per branch? per PC-change? number seems to indicate per PC-change, including basic incrementation, including argument reads, which…why? It seems an exceptionally redundant


I meant to write 4 bytes per CPU cycle. That would just be a log of everything like Oziphantom mentioned, CPU data, address, and R/W totals to 25 bits per cycle, padded into 4 bytes. Then it would be up to the remote system to disassemble it. This has the advantage of freeing up the FPGA for a bigger mapper, but I still prefer the cart to do more. Plus a 6502-aware mapper could allow stuff like separate program and data banks, like the '816.


Top
 Profile  
 
PostPosted: Tue May 23, 2017 10:14 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 5835
Location: Seattle
Oziphantom wrote:
Which part of the EF3? The Action replay/Retro replay/Final Cart 3 emulation AKA Freezer Cartridges?
Yeah, the freezer. At the cost of eating a pile of cycles, it should let you force the instruction phase to a known state from just the card edge. On the NES it does require that you use /IRQ and that the game permits IRQs, but.
Quote:
::looks up nes cart edge connector:: OH COME ON
<giggle>


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: roadkill908 and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group