It is currently Mon Oct 16, 2017 11:03 pm

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 138 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next
Author Message
PostPosted: Sun Jun 26, 2016 8:36 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6273
Location: Seattle
AWJ wrote:
I want to cram as many tests into one ROM as I can to minimize the amount of soldering and EPROM burning some poor sap has to do.
I was given a DIP-socketed SuperFX (GSU-1) cart, and I have EEPROMs, and I'm willing to run tests. In case that helps.


Top
 Profile  
 
PostPosted: Sun Jun 26, 2016 10:15 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> Unfortunately it comes at a cost of about 5 FPS in all games with coprocessors.

Yeah ... truth be told, we're playing it fast and loose with coprocessor synchronization. But I'm running out of overhead to sacrifice to improve it further.

SA-1 memory conflict stalling is going to be the thing that totally destroys us. We're chasing our tails over a bit of SFX timing issues, but the SA1 is probably running 30% faster than it should.

And forget weird usages of XOR, instruction code in char arrays, or anything else ... the SNES CPU inheriting PPUcounter is far and away the grossest hack in all of higan. Has absolutely nothing to do with a real SNES; and is likely a whole lot less clear than driving the IRQ logic from the H/Vblank signals from the PPU would be (can't say for sure, I've never actually tried it. It would definitely lose the crazy "long-dot calculation" shifter.)

> bsnes already fails to handle multiple coprocessors with IRQ outputs correctly--each coprocessor directly sets the IRQ pin as if it had exclusive control over it, there's no attempt to handle "I'm no longer asserting /IRQ but the other coprocessor still is"

This is important for more than just the SNES, though. The Famicom is full of nonsense like irqLine (used by the PPU), irqLineAPU (obviously used by the APU), and weird handling by cartridges.

The best idea I have is for the IRQ line to be a uint, and all the sources set/clear their individual bits. It's not hardware-accurate, but how do we simulate open-collector logic in C, anyway? By doing this, testing if(irqLine) will work if any of the other processors drive the IRQ line. [and it's supposed to be /IRQ anyway ... we sacrifice some hardware purity for sane design.]

> the idea of a cartridge with both SuperFX and SA-1 coprocessors is so utterly impossible and ridiculous on a hardware level that I consider attempting to support such a thing in an emulator to be actively harmful to the SNES scene

... is now a bad time to tell you that I just added support for 128KiB SNES PPU VRAM? :D

Don't worry, it's mostly just a joke. I'm only doing it because the bits are actually there in the registers and through the calculations. higan will *never* ship with this option enabled, nor possible to enable from the GUI. Even with the MSU1, a key goal was to ensure it actually worked with stock hardware.

That said, the change that brought this about was implementing VRAM as a uint16[32768] array (you could also do uint8[2][32768], but the uint16 one is faster.) This leads to simplifying and removing excess shifting from tons of computations (it matches the way the real PPU used two separate VRAM chips in parallel much better), and is faster than uint8[65536] was.

> you might want to reconsider those r14/r15 changes you rejected. Particularly since accurate emulation of the icache and other things will probably require differentiating between r15-modified-by-MMIO and r15-modified-by-instruction anyway.

We'll see how the cards land. If it turns out there is indeed an MMIO/instruction difference, and I can't think of something simpler (less code), then I'll relent and accept your changes.


Top
 Profile  
 
PostPosted: Sun Jun 26, 2016 11:15 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
byuu wrote:
SA-1 memory conflict stalling is going to be the thing that totally destroys us. We're chasing our tails over a bit of SFX timing issues, but the SA1 is probably running 30% faster than it should.


It's not just the memory conflicts, bsnes' SA1 is full of things that make my inner voice scream "THIS IS BLATANTLY WRONG, HARDWARE CANNOT POSSIBLY WORK THIS WAY":

-DMA completes instantaneously
-masked IRQs fail to pull the SA-1 out of WAI (that this bug is even possible is the fault of pushing far too much interrupt logic out of the 65816 core and into the derived classes, because it started as "an S-CPU emulator" rather than "a 65816 emulator")
-Interrupt vectors are fetched instantaneously (even if they're fetched from internal registers it should still take a cycle per byte)
-the readable IRQ flags are set when each particular interrupt is taken rather than when it is requested (this would require an unrealistic degree of integration between the SA1's 65816 core and its on-chip peripherals)

But I'm not going to touch it until someone can do hardware tests, because I'm just as unwilling as you are to replace guesses with different guesses (example: what happens when the SA1 does a normal data read from $00FFEA? Does it read its own NMI vector, the S-CPU's NMI vector, or the actual ROM data? We only know what the answer is for the S-CPU because there isn't any signal on the cartridge slot to distinguish vector reads from data reads)

Quote:
And forget weird usages of XOR, instruction code in char arrays, or anything else ... the SNES CPU inheriting PPUcounter is far and away the grossest hack in all of higan. Has absolutely nothing to do with a real SNES; and is likely a whole lot less clear than driving the IRQ logic from the H/Vblank signals from the PPU would be (can't say for sure, I've never actually tried it. It would definitely lose the crazy "long-dot calculation" shifter.)


I don't understand why you consider that a gross hack? The S-CPU really does have its own internal counters that it uses for timer IRQs. It would need 18 pins connected to the PPU just to read the counters from otherwise.

Likewise, the long-dot calculation is completely accurate to hardware and I don't understand what you propose to replace it with.

Quote:
The Famicom is full of nonsense like irqLine (used by the PPU), irqLineAPU (obviously used by the APU), and weird handling by cartridges.


What? The Famicom PPU doesn't generate IRQs, it only generates NMIs. The two onboard IRQ sources are both part of the APU: the DMC channel (triggers when sample DMA finishes) and the misnamed "frame counter" (a dumb 60Hz timer that's not actually related to video "frames" at all--it's 60Hz even on a PAL unit)

Quote:
That said, the change that brought this about was implementing VRAM as a uint16[32768] array (you could also do uint8[2][32768], but the uint16 one is faster.) This leads to simplifying and removing excess shifting from tons of computations (it matches the way the real PPU used two separate VRAM chips in parallel much better), and is faster than uint8[65536] was.


I've been meaning to do this in bsnes-classic for a long time but haven't gotten around to it; having three PPU implementations plus a UI and a debugger that are full of assumptions that "memory" is synonymous with arrays of bytes is a bit of a drag on my enthusiasm.

Did you do the same for CGRAM yet? Treating CGRAM as an array of bytes is even more nonsensical than VRAM, because it is never accessed bytewise by the hardware (MMIO writes go through a latch and don't actually hit the CGRAM until the S-CPU has written a full word)


Last edited by AWJ on Mon Jun 27, 2016 1:34 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 3:18 am 
Offline

Joined: Thu Aug 12, 2010 3:43 am
Posts: 1589
byuu wrote:
... is now a bad time to tell you that I just added support for 128KiB SNES PPU VRAM? :D

Don't worry, it's mostly just a joke. I'm only doing it because the bits are actually there in the registers and through the calculations. higan will *never* ship with this option enabled, nor possible to enable from the GUI. Even with the MSU1, a key goal was to ensure it actually worked with stock hardware.

Eh, it's not as crazy as it sounds, people have added VRAM to the Mega Drive, doing it to the SNES is not far-fetched honestly. If anything, it's probably more likely to happen than people trying to cram multiple coprocessors into a single cartridge.


Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 3:52 am 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2961
Location: Tampere, Finland
AWJ wrote:
...and the misnamed "frame counter" (a dumb 60Hz timer that's not actually related to video "frames" at all--it's 60Hz even on a PAL unit)

It would be far more useful if it was indeed 60 Hz on PAL also, but it's actually 50 Hz: viewtopic.php?p=160349#p160349

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 12:38 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> It's not just the memory conflicts, bsnes' SA1 is full of things that make my inner voice scream "THIS IS BLATANTLY WRONG, HARDWARE CANNOT POSSIBLY WORK THIS WAY":

You and me both.

Now what's really scary, is how much worse everyone else's is. Probably 30% of the entire chip's feature set is completely unemulated in Snes9X. Even more in ZSNES. Why? In Snes9X's case, games don't use those parts (like H/V IRQs); and in ZSNES' case, unpopular games don't use those parts (like the golf and horse racing games.) They both have horrendous lists of ROM patches based on game ROM title that hack out wait loops to try to speed up SA-1 games more.

To add to your list, a huge issue now is that a lot of the I/O registers are only readable from either the SA-1 or CPU. Yet in most cases, both can freely read/write all the registers. This is poison to SA-1 ROM hackers, and is the exact kind of thing that leads to emulator-only hacks.

higan is basically ZSNES-level on the SA-1. And everything else is much worse than even that. The SA-1 is basically half the complexity of an SNES emulator in a single coprocessor, used by ~10 US games (and some boring horse racing/Japanese chess/fishing games in Japan.)

> I don't understand why you consider that a gross hack? The S-CPU really does have its own internal counters that it uses for timer IRQs. It would need 18 pins connected to the PPU just to read the counters from otherwise.

Exactly right.

But the SNES CPU doesn't know anything about interlace mode, or scanlines being 1364 clocks (usualy, excluding color burst lines), nor about frames being 262/263/312/313 scanlines long. These are all hacks.

The SNES CPU only goes off the PPU Vblank and Hblank pins. When it sees Hblank go 1->0, it resets the Hcounter and clocks the Vcounter. And when it sees Vblank go 1->0, it resets the Vcounter. That's it. The PPU otherwise just runs dumb incrementing counters each clock tick.

This is why the SNES CPU doesn't perceive long-dots. And given the delay between these logic units clocking and IRQs/NMIs polling them, this kind of design starts to reveal why interrupts don't fire on certain H/VTIME positions without the need for hard-coded lists.

A properly designed CPU emulator wouldn't care at all about the PPU. You could decide to make the PPU take 1834 clocks per scanline, and 512 scanlines per frame, and the CPU would happily go along with it and work.

The problem is, this proper design would require us to single-step the PPU and the CPU. Every single clock tick of the PPU would potentially change the H/Vblank pins, and the CPU would have to poll those pins and act on the results every clock tick. (well, every other tick. So ~10,000,000 context switches each way per emulated second.)

Right now, our solution is to embed PPU-specific video timing information inside PPUcounter inside the CPU core.

> Likewise, the long-dot calculation is completely accurate to hardware and I don't understand what you propose to replace it with.

The long-dots are what the PPU does, and you see it with the PPU's counter latches. But they don't exist in the CPU core ... the CPU has its own separate H/V counters as described above. That's why H/VTIME from the CPU core is unaffected by them.

The long-dots are also terrifying, by the way. They actually move around between resets on the same console. I've seen them trigger on {321,322,323} and {325,326,327}, respectively. Most likely, there's a PPU phase situation like with the NES PPU. But then I've also had SNES decks where I was unable to get the dots to move at all. So some appear to have a fixed phase.

> What? The Famicom PPU doesn't generate IRQs, it only generates NMIs.

Then that's what I meant to say, sorry. I haven't really worked on the Famicom code for over four years, so my knowledge is getting rusty.

> I've been meaning to do this in bsnes-classic for a long time but haven't gotten around to it; having three PPU implementations plus a UI and a debugger that are full of assumptions that "memory" is synonymous with arrays of bytes is a bit of a drag on my enthusiasm.

Yes!! I am glad it's not just me.

Believe me, I'm devastated at the loss of the other PPU cores more than any users would be. I could actually play Contra 3 and Super Mario World on my tiny little $80 Intel NUC Atom CPU system that's sipping 8W of power.

But now, I have to use my i7-2600K system. I can't even run higan on my laptop anymore.

Non-developer people often miss this, but pre-written code isn't free. Every single time I'd change something to improve the accuracy profile of higan, both the balanced and performance cores would break. Then I'd spend lots of time fixing them, then fighting through regressions from said changes. And every release would take me hours to build and put together instead of ten minutes (and an entire day when I had 32-bit x 3 + 64-bit x 3 + all builds run through a profile-guided optimizer. Have fun training 20 games to get all the codepaths hit when the profiling runs at 3fps. Then do the whole thing five more times.)

> Did you do the same for CGRAM yet? Treating CGRAM as an array of bytes is even more nonsensical than VRAM, because it is never accessed bytewise by the hardware (MMIO writes go through a latch and don't actually hit the CGRAM until the S-CPU has written a full word)

No, but great idea, thanks. I'll get right on this.

> Eh, it's not as crazy as it sounds, people have added VRAM to the Mega Drive, doing it to the SNES is not far-fetched honestly.

We believe it's entirely possible. The unused pins exist on the S-PPU1, and we have the diagram. Someone just needs to desolder the SNES VRAM, and put new chips on the boards, and then run one wire per chip by hand to the PPU.

It might not be possible with the later hardware revisions that integrate the PPUs into the same die as the CPU, though.

But anyway ... this is unreasonable to expect people to do this. It was really, again, just to more faithfully document the PPU registers' extra bits. Before we were masking things in an unrealistic way.

I don't want a situation where there's a bunch of "higan-only" ROM hacks floating around. That was the whole reason I started on bsnes in the first place.

(The MSU1 isn't the same; there's at least several hundred owners of the sd2snes board.)


Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 1:24 pm 
Offline
User avatar

Joined: Sun Jul 01, 2012 6:44 am
Posts: 337
Location: Lion's den :3
byuu wrote:
Someone just needs to desolder the SNES VRAM, and put new chips on the boards, and then run one wire per chip by hand to the PPU.

Let me know what type of SRAM chips I'd need as a replacement, and where exactly to run the wires. Would love to mod one of my SNES consoles for 128K of VRAM. :D (Yes, I am proficient in SMD de-/soldering.)

_________________
Some of my projects:
Furry RPG!
Unofficial SNES PowerPak firmware
(See my GitHub profile for more)


Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 2:49 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6273
Location: Seattle
They're ordinary 32 KiB 28-pin SRAMs, aren't they?

The tricky part is that there's no pin-compatible 64 KiB SRAM.
28-pin SOP SRAMs are (apparently) standardized at a package width of 330thou and a lead-tip-to-lead-tip of 465 thou, but 32-pin SOP SRAMs are instead 450thou and 550thou. You can get 32-pin SOJ SRAMs that are a package width of 300 thou... but their legs (330 thou) wouldn't reach the 400 thou footprint on my SHVC-CPU-01. And nothing else has the right leg spacing.

Wait, no, the 400thou SOJs should have a (marginally) compatible footprint. e.g. the IS6*C1024AL-*K*. Other than the four extra contacts off the top.

edit: If you still want to try, the rewiring would be simple (because RAM address lines are interchangeable). Connect pins 31,32 to +5V and pins 1,2 to PPU1 pin 46 ("VA15"). Align the bottom (128KiB SRAM pins 16,17) with the original footprint bottom (silkscreen 14,15)


Last edited by lidnariq on Mon Jun 27, 2016 8:51 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 4:19 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
Also keep in mind that an unknown number of retail games will break.

Right now, I only know of Yoshi's Island. It sets A15 differently when transferring tiles versus where it sets the tiledata address. It works only because of data mirroring (specifically, because A15 is ignored.)

I imagine if even Nintendo themselves couldn't keep this straight, that lots of other games will suffer issues as well. Most likely, if Nintendo ever -did- release an SNES+, the 128KiB VRAM would have to be a mode-setting bit to enable it, to preserve backward-compatibility.


Top
 Profile  
 
PostPosted: Mon Jun 27, 2016 11:45 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
Quote:
Did you do the same for CGRAM yet? Treating CGRAM as an array of bytes is even more nonsensical than VRAM, because it is never accessed bytewise by the hardware


This is not actually true ... CGDATAREAD will fetch one byte at a time from CGRAM.

If this were not the case, then the low-byte would have to fetch the word, and stuff the result into a latch for the high byte read later. Which would be the inverse of how CGRAM writes work. The same goes for both OAM and VRAM accesses as well.

Still, not a big deal to compensate for. I've added bool vramAccessible() to tell whether to block reads/writes or not; and with that, eliminated all but oamWrite, which I use to call object.update() for the pre-decoded sprite attributes.

OAM is still an ugly uint8[544] array, but more realistically ... it should probably be either:
uint32 oamLo[128]; uint8 oamHi[32]; or:
uint16 oamLo[256]; uint8 oamHi[32];

And we'll have to be careful to update all the nasty address shuffling I'm doing then:
Code:
r.oamBaseAddress = ((data & 0x01) << 9) | (r.oamBaseAddress & 0x01fe);

Which is basically hiding the latch inside the address value.


Top
 Profile  
 
PostPosted: Wed Jun 29, 2016 5:15 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
byuu wrote:
Quote:
Did you do the same for CGRAM yet? Treating CGRAM as an array of bytes is even more nonsensical than VRAM, because it is never accessed bytewise by the hardware


This is not actually true ... CGDATAREAD will fetch one byte at a time from CGRAM.

If this were not the case, then the low-byte would have to fetch the word, and stuff the result into a latch for the high byte read later. Which would be the inverse of how CGRAM writes work. The same goes for both OAM and VRAM accesses as well.


If you read CGDATAREAD twice during rendering (when the address read depends on the current pixel being rendered), do you always get the lower and upper halves of the same palette entry, or is it possible to read the lower half of one palette entry and the upper half of a different one? The former would indicate that there's a latch, the latter that the PPU directly reads the CGRAM every time and throws away one half of the bits or the other half.

Quote:
Still, not a big deal to compensate for. I've added bool vramAccessible() to tell whether to block reads/writes or not; and with that, eliminated all but oamWrite, which I use to call object.update() for the pre-decoded sprite attributes.


How are you going to handle debugger watchpoints on VRAM/CGRAM addresses? (OAM watchpoints aren't especially useful)

Also, the balanced PPU which you've thrown away had a fairly complicated test for whether VRAM was accessible, that differed between reads and writes, and attempted (unsuccessfully) to handle an edge case where writing during the very last cycle of vblank would write the previous value on the data bus instead of the intended data. I assume you've simply forgotten about it and haven't suddenly decided that purity of code style is more important than emulation accuracy :mrgreen:

Quote:
OAM is still an ugly uint8[544] array, but more realistically ... it should probably be either:
uint32 oamLo[128]; uint8 oamHi[32]; or:
uint16 oamLo[256]; uint8 oamHi[32];


I would suggest handling the OAM the same way you handle the CPU DMA registers: store it as an array of sprite structs, and reconstruct the bytes when a MMIO read happens. Writing to "high OAM" would set d8 of xpos and the size flag for four consecutive sprites, and reading from "high OAM" would extract the same bits.


Top
 Profile  
 
PostPosted: Wed Jun 29, 2016 10:00 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> If you read CGDATAREAD twice during rendering (when the address read depends on the current pixel being rendered), do you always get the lower and upper halves of the same palette entry, or is it possible to read the lower half of one palette entry and the upper half of a different one?

Right now, VRAM, low OAM and CGRAM all return one byte at a time, whereas writes latch to 16-bit pairs. anomie doesn't address this issue at all. So I guess we'd have to do more hardware tests to find out which is the case.

I understand why you'd buffer writes to a 16-bit interface like that; but it doesn't accomplish much with reads versus just throwing half the data away.

> How are you going to handle debugger watchpoints on VRAM/CGRAM addresses? (OAM watchpoints aren't especially useful)

At the rate loki is going, it'll probably never come up =(

But I would probably have to use vram.get(), vram.set() [with hooks in each] for things that should alert the debugger, and vram.operator[]& for things that should not.

> I assume you've simply forgotten about it and haven't suddenly decided that purity of code style is more important than emulation accuracy

I lost the test to verify that value. I don't even know if it's accurate anymore given the bus hold delays now.

It's somewhere around item #617 on the list of things to do. The last write turning into open-bus is indeed tricky, and would also necessitate a vram.set() scenario.

> I would suggest handling the OAM the same way you handle the CPU DMA registers: store it as an array of sprite structs, and reconstruct the bytes when a MMIO read happens. Writing to "high OAM" would set d8 of xpos and the size flag for four consecutive sprites, and reading from "high OAM" would extract the same bits.

Almost what I do.

I have the array of decoded sprite structs, but I also have a packed byte version for faster OAMDATAREADs. But, it's probably exceedingly rare to read from OAM.

The high OAM thing definitely wrecks treating it as raw memory with bit-fields. I get they couldn't fit things in 32-bits, but what an ugly solution for everyone. Developers must have hated that. The OBC1 designers the most of all.


Top
 Profile  
 
PostPosted: Thu Jun 30, 2016 4:09 pm 
Offline

Joined: Mon Jul 02, 2012 7:46 am
Posts: 759
byuu wrote:
> bsnes already fails to handle multiple coprocessors with IRQ outputs correctly--each coprocessor directly sets the IRQ pin as if it had exclusive control over it, there's no attempt to handle "I'm no longer asserting /IRQ but the other coprocessor still is"

This is important for more than just the SNES, though. The Famicom is full of nonsense like irqLine (used by the PPU), irqLineAPU (obviously used by the APU), and weird handling by cartridges.

The best idea I have is for the IRQ line to be a uint, and all the sources set/clear their individual bits. It's not hardware-accurate, but how do we simulate open-collector logic in C, anyway? By doing this, testing if(irqLine) will work if any of the other processors drive the IRQ line. [and it's supposed to be /IRQ anyway ... we sacrifice some hardware purity for sane design.]


The technically correct way would be each chip keeps its own publicly accessible IRQ variable, then when you want to read it, you either AND them all and if the result is 0, then IRQ is true (for active-low) or OR them all and nonzero is true (for active high). Your idea of having a single global IRQ uint (with each chip controlling a single bit) is basically the same thing, but with your goal of clarity over performance, I'd go with the first one.


Top
 Profile  
 
PostPosted: Thu Jun 30, 2016 5:56 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
That's a good idea too, thank you.

Either way, we get to a point where one chip has to know about the others.

Either the CPU is asking for APU.irq | Cart.irq; or the APU and Cart are asking for CPU.irqBit(n).

I think I like your method better. It's not worth over-engineering some kind of abstraction layer over this when there's only two sources anyway. But your method will still likely need to have to call back to eg cpu.updateIRQ() in case the CPU needs to know about the state change immediately. I'll have to look over the code to see.


Top
 Profile  
 
PostPosted: Thu Jun 30, 2016 10:26 pm 
Offline

Joined: Mon Jul 02, 2012 7:46 am
Posts: 759
byuu wrote:
Either way, we get to a point where one chip has to know about the others.


Not necessarily. Treat the bus as an object, and all of the chips are aware of it, and it handles all of the intricacies of what happens when multiple things try to write simultaneously.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 138 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group