What exactly are T-states doing?

Discussion of programming and development for the original Game Boy and Game Boy Color.
User avatar
Dwedit
Posts: 4922
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: What exactly are T-states doing?

Post by Dwedit »

If I had to make a wild guess, I'd guess that it would finish executing the CB instruction instead of handling the interrupt. This is just a wild guess.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
gekkio
Posts: 49
Joined: Fri Oct 16, 2015 6:18 am

Re: What exactly are T-states doing?

Post by gekkio »

If someone cared to test (and had a flash cart to do it with), it should be simple enough to set up a test that would have an interrupt trigger just after the CB prefix is read, and then watch the address bus to see if the next memory access is to the stack (to push PC) or to fetch the next instruction, and have your program look at the pushed F register to check for unused bits being set.
If I had to make a wild guess, I'd guess that it would finish executing the CB instruction instead of handling the interrupt. This is just a wild guess.
I did a quick test with something like this:

Code: Select all

  ld a, $F0
  ei
  swap a
The result is: it runs swap a normally, then handles the interrupt. The PC pushed to the stack points to the instruction following swap a.
On a Z80, this technique can be used to reveal prefix opcodes, but here it just handles swap a as usual.

I'll add this test ROM to the mooneye test suite once I tidy up things a bit and verify it on all devices.
If CB flag is in the status flags (and POP itself doesn't clear it), the NOP should be interpreted as RLC B, so B will be 3 now.
Mooneye GB test acceptance/bits/reg_f confirms that the low bits of F are not usable. Also, the test I mentioned above would show something special in F if CB would be a status flag
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: What exactly are T-states doing?

Post by AWJ »

Just some comments about the questions on your "emulation accuracy" page.
What happens if the CPU accesses memory during OAM DMA?
I believe someone tested this and found that on the DMG, reads (presumably including opcode fetches) return the byte currently being DMAed. On the GBC, external WRAM (i.e. $C000-$DFFF) is on a separate physical bus from the cartridge slot, probably because of the WRAM bankswitching. If you read WRAM during DMA it has the same effect as on the DMG, but you can apparently run code from ROM normally while DMAing from WRAM (the Wizardry Famicom remakes do this--they don't bother copying their OAM DMA routine to $FF80) There are probably some limitations to executing code in parallel with DMA; the Wizardry games still do a 160-cycle delay loop after triggering DMA.

I've never seen any test results for writes during OAM DMA, or whether OAM DMA automatically suppresses interrupts.
What is the exact behaviour of EI?
On a real Z80 (and I believe an 8080 as well), "EI's effect is delayed one cycle" is not true so much as "EI actually disables interrupts until after the next instruction" The reason is to ensure that the sequence "EI; RET" is atomic. If you put a hundred EIs in a row, no interrupts can occur between any of them. You should test whether this is true on the GB as well.

On a real Z80, prefix instructions are the same: no interrupt can occur between the prefix and the instruction it's modifying (the 8080 doesn't have any prefix instructions). This is almost certainly true on the GB as well, otherwise chaos would ensue (you could never safely use a CB instruction any time an interrupt could possibly happen)
What is the exact timing of PUSH rr?
It's not surprising that PUSH has an extra internal delay that POP doesn't. Remember that the GB, like other 8080-family CPUs, has a "full" stack: SP points to the last item pushed. So PUSH has to decrement SP first to generate the address for the write, whereas POP can immediately read memory while incrementing SP in parallel. The 6502 family has an "empty" stack, and pops take one more cycle than pushes do--exactly the opposite of the 8080 family.
What does MBC1 do if you request a ROM bank number higher than what the cartridge supports?
MBC1 only has five data pins; it can't see the top three bits of the data bus at all. That's why ROMs bigger than 4 MBit need a second register to select the upper bank bits. So a data value of 32 will mirror to 0 in the MBC and trigger the "0 actually selects 1" behaviour, but a data value of 16 on a 2 Mbit ROM, or 8 on a 1 Mbit ROM will mirror to 0 in the ROM and won't be converted to 1. The MBC doesn't "know" how big the ROM is; smaller ROMs just leave the upper address lines of the MBC unconnected.

Likewise MBC2 only has four data pins; that's why its internal battery RAM is arranged in nybbles, and why it only supports up to 2 Mbit ROMs.

Also, MBC1 is only connected to A15, A14 and A13 of the cartridge bus, and MBC2 is only connected to A15, A14 and A8-A0. So MBC1 registers are mirrored over spans of $2000 bytes, and MBC2 registers are mirrored over spans of $100 bytes (you can select a ROM bank by writing to $0100-01FF, $0300-03FF, $0500-05FF, etc.) The reason most monochrome GB games write to $2100 to switch ROM banks is to be compatible with either MBC1 or MBC2.

re those bus timing diagrams: in case it isn't obvious, the reason why accesses to $8000-9FFF don't show any bus activity is that VRAM is on a separate bus on the GB (on the 'Pocket and everything afterwards it's built right into the CPU)

DMG:
13-bit address bus and 8-bit data bus to VRAM
16-bit address bus and 8-bit data bus to WRAM and the cartridge slot
$FF80-FFFE internal to the CPU

GBP:
Unconnected external VRAM address and data bus (maybe it can be enabled and the internal VRAM disabled somehow?)
16-bit address bus and 8-bit data bus to WRAM and the cartridge slot
VRAM and $FF80-FFFE internal to the CPU

GBC:
15-bit address bus and 8-bit data bus to WRAM (the upper 3 bits come from the bank select register)
16-bit address bus and 8-bit data bus to the cartridge slot only
VRAM and $FF80-FFFE internal to the CPU

GBA:
WRAM moved inside the CPU as well. The only external RAM on the GBA is the big slow work RAM (which isn't usable in GBC mode)

Schematics showing pinouts of the GB CPU, MBC1 and MBC2 are at http://fms.komkon.org/GameBoy/Tech/Hardware.html
User avatar
Rena
Posts: 9
Joined: Tue Mar 29, 2016 8:56 am
Location: Kitchener, Ontario, Canadia

Re: What exactly are T-states doing?

Post by Rena »

Nice, that answers a lot of questions!

One thing I've never been sure of is what actually is at FEA0-FEFF. It's only marked as "unusable" in documents, but what happens if you try to use it? Some have suggested it partially mirrors OAM.

Also, is FFFF actually within HRAM or is it separate? Can it be accessed during DMA?
Sent from my Game Boy.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: What exactly are T-states doing?

Post by AWJ »

Rena wrote:Nice, that answers a lot of questions!

One thing I've never been sure of is what actually is at FEA0-FEFF. It's only marked as "unusable" in documents, but what happens if you try to use it? Some have suggested it partially mirrors OAM.

Also, is FFFF actually within HRAM or is it separate? Can it be accessed during DMA?
No idea about FEA0-FEFF, I doubt it's an OAM mirror though.

FFFF is a memory-mapped register, like everything from FF00 to FF7F. I don't see why it wouldn't be accessible during OAM DMA (it's been discovered that accessing FF46 during DMA restarts the DMA) though some of the registers might have weird side effects.

You know about the OAM corruption hardware bug with 16-bit inc/dec instructions, right? Instructions that don't even perform memory accesses can corrupt OAM, suggesting that the CPU and PPU parts of the die aren't as well segregated as one might expect.
gekkio
Posts: 49
Joined: Fri Oct 16, 2015 6:18 am

Re: What exactly are T-states doing?

Post by gekkio »

Just some comments about the questions on your "emulation accuracy" page.
Thanks! The page is very out of date so I already have answers to many of the questions, but I don't mind discussing these things and sharing knowledge :)
I believe someone tested this and found that on the DMG, reads (presumably including opcode fetches) return the byte currently being DMAed. On the GBC, external WRAM (i.e. $C000-$DFFF) is on a separate physical bus from the cartridge slot, probably because of the WRAM bankswitching. If you read WRAM during DMA it has the same effect as on the DMG, but you can apparently run code from ROM normally while DMAing from WRAM (the Wizardry Famicom remakes do this--they don't bother copying their OAM DMA routine to $FF80) There are probably some limitations to executing code in parallel with DMA; the Wizardry games still do a 160-cycle delay loop after triggering DMA.
Yes, I've confirmed this as well. Basically, if you DMA stuff from the cartridge bus and the CPU wants to read stuff at the same time, the DMA wins and both the DMA and CPU see a byte from the current DMA source address. This applies regardless of whether it's an opcode fetch or a load.
I've never seen any test results for writes during OAM DMA, or whether OAM DMA automatically suppresses interrupts.
I'm pretty sure I've checked that interrupts are not suppressed in any way. But I don't think I've tried writes...And I need to publish test ROMs for all these things. I've got sooo many unpublished test ROMs named oam_hell1, oam_hell2, oam_hell3, etc. :)
On a real Z80 (and I believe an 8080 as well), "EI's effect is delayed one cycle" is not true so much as "EI actually disables interrupts until after the next instruction" The reason is to ensure that the sequence "EI; RET" is atomic. If you put a hundred EIs in a row, no interrupts can occur between any of them. You should test whether this is true on the GB as well.
Unfortunately, this seems to be untrue on the GB based on a test ROM I wrote. EI doesn't disable interrupts, so given an EI sequence, the interrupt happens between the second and third EIs.
It's not surprising that PUSH has an extra internal delay that POP doesn't. Remember that the GB, like other 8080-family CPUs, has a "full" stack: SP points to the last item pushed. So PUSH has to decrement SP first to generate the address for the write, whereas POP can immediately read memory while incrementing SP in parallel. The 6502 family has an "empty" stack, and pops take one more cycle than pushes do--exactly the opposite of the 8080 family.
Aha! I never thought about this, but it makes perfect sense.
FFFF is a memory-mapped register, like everything from FF00 to FF7F. I don't see why it wouldn't be accessible during OAM DMA (it's been discovered that accessing FF46 during DMA restarts the DMA) though some of the registers might have weird side effects.
I've confirmed that it's accessible as usual. What do you mean exactly mean with "accessing FF46 during DMA restarts the DMA"? Do you mean writing or also reading? Writing indeed restarts the DMA, although the behaviour during the first DMA cycle is slightly different.

I've also thought about $FFFF, and I don't see any reason why it would have to be a separate register. It doesn't really make a difference in emulation, but I find it completely plausible that it's just the last byte of high RAM. After all, all the bits are accessible unlike in the IF register.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: What exactly are T-states doing?

Post by AWJ »

gekkio wrote:Did you notice the logic analysis directory under tests in the mooneye-gb repository? I've done some logic analysis on the Game Boy hardware, and you might be interested in things like the write and read timings in the external bus.
Is there any possibility you can do something like this with the SNES? Or has someone already done it (maybe nocash?)

The SNES CPU is a lot closer to a standard 65816 than the GB is to a standard Z80, but it has a rather different bus from a 65816 (two address buses, separate /RD and /WR strobes instead of RD/WR) and I'm curious what the timings are, especially for DMA (which uses both address buses and the single data bus)
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: What exactly are T-states doing?

Post by lidnariq »

We have some SNES logic analyzer traces from the repair effort with Poot36. It's not all 58-ish signals, but it is 32 of them.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: What exactly are T-states doing?

Post by AWJ »

lidnariq wrote:We have some SNES logic analyzer traces from the repair effort with Poot36. It's not all 58-ish signals, but it is 32 of them.
Those are very interesting, thanks for pointing them out. It looks like "FastROM"/3.58MHz cycles have a 50% duty cycle (/RD or /WR is asserted 3 master clocks after the address is put on the bus), and "SlowROM"/2.68MHz cycles stretch the phase in which /RD or /WR is asserted by 2 master clocks (and, for read cycles, presumably the CPU delays latching the data by the same amount).

Unfortunately those traces are missing /PARD and /PAWR (the signals for the "B-bus" or $21xx address range), and as far as I can tell none of them shows any DMA operations (not that you could tell what was going on in DMA without both sets of signals...)

Also there's the little fact that they're traces from a defective CPU...
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: What exactly are T-states doing?

Post by lidnariq »

There's 25 different traces; Numbers 8 and up do have PARD and PAWR. Listings 12, 13, and 15 seem to show DMA.

The defect only appeared to be that the PLB and PLD instructions corrupted the stack pointer; I don't think there's any reason to believe that would affect timing.
AWJ
Posts: 433
Joined: Mon Nov 10, 2008 3:09 pm

Re: What exactly are T-states doing?

Post by AWJ »

lidnariq wrote:There's 25 different traces; Numbers 8 and up do have PARD and PAWR. Listings 12, 13, and 15 seem to show DMA.

The defect only appeared to be that the PLB and PLD instructions corrupted the stack pointer; I don't think there's any reason to believe that would affect timing.
Yeah, I see the DMA in listing 15 now. Looks like DMA cycles aren't quite the same as SlowROM cycles. In a SlowROM cycle /RD or /WR is asserted for 5 cycles out of 8, but in a DMA cycle /RD and /PAWR are asserted for only 4 cycles out of 8 (and it looks like they're asserted simultaneously, which was the main thing I was curious about. I wonder how byuu came up with that 'two stage pipeline' nonsense...)

Here's the relevant section of listing 15 annotated with what's going on during an 8-byte general-purpose DMA transfer:

Code: Select all

Label    > D  CA   CPURD CPUWR PA PARD PAWR RAMSEL REFRSH 
Base     > He Hex   Hex   Hex  He Hex  Hex   Hex    Hex   
__________ __ ____ _____ _____ __ ____ ____ ______ ______ 
----- fetch STA $420B (three slow cycles)
   2452    01 E31F     1     1 05    1    1      1      0
   2453    01 E318     1     1 06    1    1      1      0
   2454    01 E318     1     1 06    1    1      1      0
   2455    FF E318     0     1 06    1    1      1      0
   2456    8D E318     0     1 06    1    1      1      0
   2457    8D E318     0     1 06    1    1      1      0
   2458    8D E318     0     1 06    1    1      1      0
   2459    8D E318     0     1 06    1    1      1      0
   2460    8D E319     1     1 06    1    1      1      0
   2461    8D E319     1     1 06    1    1      1      0
   2462    8D E319     1     1 06    1    1      1      0
   2463    FF E319     0     1 06    1    1      1      0
   2464    0B E319     0     1 06    1    1      1      0
   2465    0B E319     0     1 06    1    1      1      0
   2466    0B E319     0     1 06    1    1      1      0
   2467    0B E319     0     1 06    1    1      1      0
   2468    0B E31B     1     1 06    1    1      1      0
   2469    0B E31A     1     1 06    1    1      1      0
   2470    0B E31A     1     1 06    1    1      1      0
   2471    FF E31A     0     1 06    1    1      1      0
   2472    42 E31A     0     1 06    1    1      1      0
   2473    42 E31A     0     1 06    1    1      1      0
   2474    42 E31A     0     1 06    1    1      1      0
   2475    42 E31A     0     1 06    1    1      1      0
----- write to $420B (fast cycle, /WR asserted even though it's an internal CPU register!)
   2476    42 E31B     1     1 06    1    1      1      0
   2477    42 420B     1     1 02    1    1      1      0
   2478    42 420B     1     1 02    1    1      1      0
   2479    01 420B     1     0 02    1    1      1      0
   2480    01 420B     1     0 02    1    1      1      0
   2481    01 420B     1     0 02    1    1      1      0
----- fetch RTL (slow cycle)
   2482    01 E31B     1     1 06    1    1      1      0
   2483    01 E31B     1     1 06    1    1      1      0
   2484    01 E31B     1     1 06    1    1      1      0
   2485    FF E31B     0     1 06    1    1      1      0
   2486    6B E31B     0     1 06    1    1      1      0
   2487    6B E31B     0     1 06    1    1      1      0
   2488    6B E31B     0     1 06    1    1      1      0
   2489    6B E31B     0     1 06    1    1      1      0
----- DMA pre-sync: align to a multiple of 8 clocks since power-on
   2490    6B E31F     1     1 06    1    1      1      0
   2491    6B E31C     1     1 07    1    1      1      0
   2492    6B E31C     1     1 07    1    1      1      0
   2493    6B E31C     1     1 07    1    1      1      0
   2494    6B E31C     1     1 07    1    1      1      0
   2495    6B FFFF     1     1 CF    1    1      1      0
----- DMA setup: 8 clocks
   2496    6B FFFF     1     1 CF    1    1      1      0
   2497    6B FFFF     1     1 CF    1    1      1      0
   2498    6B FFFF     1     1 CF    1    1      1      0
   2499    6B FFFF     1     1 CF    1    1      1      0
   2500    6B FFFF     1     1 CF    1    1      1      0
   2501    6B FFFF     1     1 CF    1    1      1      0
   2502    6B FFFF     1     1 CF    1    1      1      0
   2503    6B FFFF     1     1 CF    1    1      1      1
----- DMA transfer: 8 clocks x 8 bytes
   2504    6B F400     1     1 08    1    1      1      0
   2505    6B F400     1     1 08    1    1      1      0
   2506    6B F400     1     1 08    1    1      1      0
   2507    FF F400     0     1 08    1    0      1      0
   2508    00 F400     0     1 08    1    0      1      0
   2509    00 F400     0     1 08    1    0      1      0
   2510    00 F400     0     1 08    1    0      1      0
   2511    00 F401     1     1 08    1    1      1      0
   2512    00 F401     1     1 08    1    1      1      0
   2513    00 F401     1     1 08    1    1      1      0
   2514    00 F401     1     1 08    1    1      1      0
   2515    FF F401     0     1 08    1    0      1      0
   2516    00 F401     0     1 08    1    0      1      0
   2517    00 F401     0     1 08    1    0      1      0
   2518    00 F401     0     1 08    1    0      1      0
   2519    00 F403     1     1 08    1    1      1      0
   2520    00 F402     1     1 08    1    1      1      0
   2521    00 F402     1     1 08    1    1      1      0
   2522    00 F402     1     1 08    1    1      1      0
   2523    FF F402     0     1 08    1    0      1      0
   2524    CE F402     0     1 08    1    0      1      0
   2525    CE F402     0     1 08    1    0      1      0
   2526    CE F402     0     1 08    1    0      1      0
   2527    CE F403     1     1 08    1    1      1      0
   2528    CE F403     1     1 08    1    1      1      0
   2529    CE F403     1     1 08    1    1      1      0
   2530    CE F403     1     1 08    1    1      1      0
   2531    FF F403     0     1 08    1    0      1      0
   2532    39 F403     0     1 08    1    0      1      0
   2533    39 F403     0     1 08    1    0      1      0
   2534    39 F403     0     1 08    1    0      1      0
   2535    39 F407     1     1 08    1    1      1      0
   2536    39 F404     1     1 08    1    1      1      0
   2537    39 F404     1     1 08    1    1      1      0
   2538    39 F404     1     1 08    1    1      1      0
   2539    FF F404     0     1 08    1    0      1      0
   2540    18 F404     0     1 08    1    0      1      0
   2541    18 F404     0     1 08    1    0      1      0
   2542    18 F404     0     1 08    1    0      1      0
   2543    18 F405     1     1 08    1    1      1      0
   2544    18 F405     1     1 08    1    1      1      0
   2545    18 F405     1     1 08    1    1      1      0
   2546    18 F405     1     1 08    1    1      1      0
   2547    FF F405     0     1 08    1    0      1      0
   2548    63 F405     0     1 08    1    0      1      0
   2549    63 F405     0     1 08    1    0      1      0
   2550    63 F405     0     1 08    1    0      1      0
   2551    63 F407     1     1 08    1    1      1      0
   2552    63 F406     1     1 08    1    1      1      0
   2553    63 F406     1     1 08    1    1      1      0
   2554    63 F406     1     1 08    1    1      1      0
   2555    FF F406     0     1 08    1    0      1      0
   2556    10 F406     0     1 08    1    0      1      0
   2557    10 F406     0     1 08    1    0      1      0
   2558    10 F406     0     1 08    1    0      1      0
   2559    10 F407     1     1 08    1    1      1      0
   2560    10 F407     1     1 08    1    1      1      0
   2561    10 F407     1     1 08    1    1      1      0
   2562    10 F407     1     1 08    1    1      1      0
   2563    FF F407     0     1 08    1    0      1      0
   2564    7C F407     0     1 08    1    0      1      0
   2565    7C F407     0     1 08    1    0      1      0
   2566    7C F407     0     1 08    1    0      1      0
   2567    7C E2FF     1     1 08    1    1      1      0
------ DMA teardown: 8 clocks
   2568    7C 22FF     1     1 08    1    1      1      0
   2569    7C 22FF     1     1 08    1    1      1      0
   2570    7C 22FF     1     1 08    1    1      1      0
   2571    7C 22FF     1     1 08    1    1      1      0
   2572    7C 22FF     1     1 08    1    1      1      0
   2573    7C 22FF     1     1 08    1    1      1      0
   2574    7C 22FF     1     1 08    1    1      1      0
   2575    7C 22FF     1     1 CF    1    1      1      0
------ DMA post-sync: align to a multiple of 6 clocks since start of pre-sync
   2576    7C 22FF     1     1 CF    1    1      1      0
   2577    7C 22FF     1     1 CF    1    1      1      0
   2578    7C 22FF     1     1 CF    1    1      1      0
   2579    7C 22FF     1     1 CF    1    1      1      0
------- opcode execution resumes: two internal operation cycles for RTL
   2580    7C E31C     1     1 07    1    1      1      0
   2581    7C E31C     1     1 07    1    1      1      0
   2582    7C E31C     1     1 07    1    1      1      0
   2583    7C E31C     1     1 07    1    1      1      0
   2584    7C E31C     1     1 07    1    1      1      0
   2585    7C E31C     1     1 07    1    1      1      0
   2586    7C E31C     1     1 07    1    1      1      0
   2587    7C E31C     1     1 07    1    1      1      0
   2588    7C E31C     1     1 07    1    1      1      0
   2589    7C E31C     1     1 07    1    1      1      0
   2590    7C E31C     1     1 07    1    1      1      0
   2591    7C E31C     1     1 07    1    1      1      0
------- fetch return address from stack
   2592    7C E1FC     1     1 0F    1    1      1      0
   2593    7C 01FA     1     1 CE    1    1      0      0
   2594    7C 01FA     1     1 CE    1    1      0      0
   2595    7C 01FA     0     1 CE    1    1      0      0
   2596    3F 01FA     0     1 CE    1    1      0      0
   2597    3F 01FA     0     1 CE    1    1      0      0
   2598    3F 01FA     0     1 CE    1    1      0      0
   2599    3F 01FA     0     1 CE    1    1      0      0
(snip)
Post Reply