It is currently Sun Aug 25, 2019 8:19 pm

All times are UTC - 7 hours



Forum rules





Post new topic Reply to topic  [ 252 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17  Next
Author Message
PostPosted: Sat Jul 20, 2019 10:03 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1518
Code:
if(_forcedVblank || _scanline >= _vblankStartScanline) {
   return _internalOamAddress;
} else {
   if(_memoryManager->GetHClock() <= 255 * 4) {
      return _oamEvaluationIndex << 2;
   } else {
      return _oamTimeIndex << 2;
   }
}


So then there's not a way to write to the high two bytes of the lower OAM table with this, huh?
(A0 would come from the latch behavior if $2104 writes, but A1 isn't being kept with <<2)

Also, if writes go to both low and high tables, does that mean reads become an OR/AND bus conflict result of low and high tables? ^-^

Actually it doesn't seem that the write in Uniracers goes to both tables, or that messes up the sprites. Maybe it's based on A9 from OAMADDR?


Top
 Profile  
 
PostPosted: Sat Jul 20, 2019 10:42 am 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 724
byuu wrote:
We should probably use Hclocks (0-1363) instead of Hdots (0-340) when the values are known, if that's okay ^-^;
Sorry! Coming from the NES, I'm used to only see the PPU in terms of dots rather than master clocks :p
Though in this case, using hclocks can be tricky since you need to adjust for the long dots after 1292/1310, heh.

byuu wrote:
So then there's not a way to write to the high two bytes of the lower OAM table with this, huh?
I'm unsure - the OAM address might be (_oamTimeIndex << 2) + 2 on odd cycles, when the PPU is fetching the attribute/palette word.
Internally the PPU is loading both bytes in a single dot, so I'm unsure how the latch behavior for writes works here.
The way I have it implemented atm will actually never write to the low table (hadn't noticed until now) - but I'm fairly certain paulb_nl's posts/tests said the writes can go to both tables at once.

Unsure about the behavior on reads, too - but writing a test rom for this shouldn't be overly hard I think? Just need to fill the low table with something like $77 and the high table with $88, keep reading from the register at scanline ~119+ and see what values come out of it. (presumably would be $77, $88, $00 or $FF)


Top
 Profile  
 
PostPosted: Sat Jul 20, 2019 12:18 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1518
Well just for fun ...

So to display "correct behavior" on HblankEmuTest/NTSC, any amount of waiting during sprite tile fetching when the display is forced blanked does it:

https://github.com/byuu/higan/blob/mast ... ct.cpp#L98

If we don't do that, but we block tile reads when the display is disabled, then only the first row of text properly splits between incorrect/correct:

https://github.com/byuu/higan/blob/mast ... t.cpp#L149

If we don't do either, then it shows "incorrect behavior".

None of these match the screenshots shown thus far, ah well :P

Here's my current guess for OAM reads/writes during active display. Good enough for Uniracers, but probably very wrong.

https://github.com/byuu/higan/blob/mast ... io.cpp#L31

On the bright side, very easy to change it once someone makes a test ROM to verify correct behavior :D

Quote:
Though in this case, using hclocks can be tricky since you need to adjust for the long dots after 1292/1310, heh.


Oh god I hope those don't actually affect the PPU fetching patterns for sprite tiles.

...

Looking into it again, it would be nice to merge most of the redundant main/subscreen BG rendering:

https://github.com/byuu/higan/blob/mast ... d.cpp#L149

The reason I have it split up is because it hides the lores vs hires differences without code repetition.

...

Another sore point:

https://github.com/byuu/higan/blob/mast ... in.cpp#L24

This would speed up the PPU quite significantly in NTSC games if we could short-circuit at vcounter() >= vdisp(), but of course it's possible to toggle overscan on and off between scanlines 225 and 240 during active display, so if we do, ... stuff's gonna go wrong if we just skipped over the entire line.


Last edited by byuu on Sat Jul 20, 2019 12:24 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sat Jul 20, 2019 12:22 pm 
Offline

Joined: Fri Nov 18, 2016 7:57 am
Posts: 21
Sour wrote:
The way I have it implemented atm will actually never write to the low table (hadn't noticed until now) - but I'm fairly certain paulb_nl's posts/tests said the writes can go to both tables at once.


During evaluation I was able to write to the X,Y position bytes in OAM by writing to $2104 multiple times in a row. I think it was about 16 writes in a row and only one sprite changed.

I don't know if you can write to low OAM after evaluation (PPU dot > 255).


Top
 Profile  
 
PostPosted: Mon Jul 22, 2019 2:35 am 
Offline

Joined: Mon Apr 01, 2019 12:23 am
Posts: 6
Great work on Mesen-S Sour.

I have some GSU tests for when you want to do Super-FX here:
https://github.com/PeterLemon/SNES/tree/master/CHIP/GSU/GSUTest

These tests helped redguy make his FPGA implementation on the SD2SNES.

Good luck with everything =D


Top
 Profile  
 
PostPosted: Mon Jul 22, 2019 5:50 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 724
byuu wrote:
So to display "correct behavior" on HblankEmuTest/NTSC
Yea, I haven't quite figured that one out yet, it displays some pretty "corrupted" looking sprites on Mesen-S atm :p
But at the very least, emulating the cycle fetching/evaluation for bg & sprites, even if it's not 100% perfect is enough to make it harder for a homebrew dev or romhacker to do something that the SNES wouldn't allow, which is good. Will probably revisit this whole thing later on after I'm done with the enhancement chips and the like, for now I'm fairly satisfied with the PPU's timings. There are probably still a few IRQ/DMA/CPU-timing related things I need to fix, though.

krom wrote:
I have some GSU tests for when you want to do Super-FX here
Oh, that's awesome, I didn't realize these existed at all! This should make coding the super FX core way easier than I anticipated, thank you!


Top
 Profile  
 
PostPosted: Tue Jul 23, 2019 10:55 am 
Offline

Joined: Mon Apr 01, 2019 12:23 am
Posts: 6
Sour wrote:
Oh, that's awesome, I didn't realize these existed at all! This should make coding the super FX core way easier than I anticipated, thank you!
Nice one, very glad to help out =D


Top
 Profile  
 
PostPosted: Thu Jul 25, 2019 8:25 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 724
Took a few days, but finally added SA-1 support (including a debugger window for it). There are still a few SA-1 features that aren't implemented, so 3 or so games will most likely still have some issues - beyond that it should be working relatively well.

Next up on the list will probably be Super FX support.


Top
 Profile  
 
PostPosted: Mon Jul 29, 2019 11:33 pm 
Offline

Joined: Mon Jul 29, 2019 10:42 pm
Posts: 1
First off, thank you for making Mesen-S, and thank you all for this awesome thread which was a lot of fun to read even if I didn't understand it all.

I have found an issue where the "CPU Debugger" window is showing the incorrect state due to some stack fiddling by Dragon Quest 3.
Image
Note that the code is different and the PC's are different. (And the JMP is taken if you continue stepping through)
Here's the stack fiddle
Image

So you have a "COP #$4C" and then the value on the stack is decremented twice and when returning it now uses the #$4C as the opcode (JMP).

So the debugger display shows different code on the left and a different PC (CDC59D) than the trace logger (CDC59C).

I know you've mostly been working on accuracy fixes in this thread so is this where you want debugging talk as well? And how do you feel about feature requests? (Such as selective trace logging options like log on certain addresses or address ranges, or the trace mask option bsnes-plus has)

Thanks again!

Edit: Oh, and this is the Mesen-S 0.2.0 release.


Top
 Profile  
 
PostPosted: Tue Jul 30, 2019 4:06 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1518
Gave the overclocking a try, noticed a few things.

If you run the extra scanlines before / at the start of NMI, it doesn't really have much effect on speed. Most games seem to have all the heavy lifting that lags out in NMI. But, it doesn't hurt to just offer to add extra scanlines to both/either.

If you run the extra scanlines very shortly into NMI, it screws with things like input polling.

Best default spot seems to be on the last scanline of a frame.

If you don't advance time with coprocessors (eg the SA1), those games break almost instantly with any overclocks. This is a real pain for me because my scheduler has one time clock per thread, so either I advance the CPU on all of them, or on none of them, during the overclocked time. I guess I can revert to my older style of thread syncing where each CPU<>CPU relationship had its own separate counter. That turned out to not really work for a system like the Sega CD, but it's fine for the SNES ...

We should probably find a good 'max' overclock value. The SuperFX is fine with up to 800% overclocks, but the CPU definitely doesn't want such heroic increments. Users would probably try to max out any overclocking slider, so we probably want to make clear that hey, even a 30% overclock is pretty heroic for the SNES CPU, without limiting them to only a minor overclock.


Top
 Profile  
 
PostPosted: Tue Jul 30, 2019 8:10 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 724
ansarya wrote:
First off, thank you for making Mesen-S, and thank you all for this awesome thread which was a lot of fun to read even if I didn't understand it all.
I have found an issue where the "CPU Debugger" window is showing the incorrect state due to some stack fiddling by Dragon Quest 3.
You're welcome! Thanks for the bug report - reporting debugger issues here is fine, too :p
What's screwing up the debugger is that the game is essentially reusing the 2nd byte of the COP instruction for the JMP instruction (unsure why a SNES game would want to save a byte so much, but maybe the developers might not have known about COP being a 2-byte op?)
Either way, shouldn't be too hard to fix - I'll take a look soon.

byuu wrote:
If you run the extra scanlines before / at the start of NMI, it doesn't really have much effect on speed.
As far as I remember, there was at least 1 or 2 of the 3 games I've tested that "before nmi" scanlines fixed the slowdowns. In terms of NES games, "before NMI" has always been the most "compatible" option, but this might not be true for the SNES.

For the "after NMI" scanlines, I put them at the end of vblank, too, rather than right after the NMI scanline, otherwise it tends to break some things, like you said. I'm running all the CPUs and coprocessors during the overclock on my end and only suspending the SPC, since that's actually the simpler solution in my case :p

In terms of max overclock values, on the NES, both FCEUX/Mesen/puNES support up to 1000 lines of either type, iirc, so that's what I've implemented at the moment (some games on the NES actually do need nearly 1000 extra lines to get rid of the slowdown completely, believe it or not...). FYI, AxlRocks has been testing some SNES titles on Mesen-S to see how they behave on with different values of the before/after NMI settings (he's done this on the NES for like 100+ games on FCEUX/Mesen in the past, too).

---

In other news, just committed Super FX support (including the %-based overclocking for it, though mine only supports multiples of 100%, up to 1000% atm). Like the SA-1, it also has its own debugger window, breakpoints and the like.

It seems to be working in all the games I've tested. In terms of implementation, I'm cheating with how the super fx pauses itself when it tries to access ROM/RAM while the CPU has access to them. Trying to make the super fx CPU into a state machine quickly turned into a nightmare, so I gave up on that fairly early on - this is definitely one of the scenarios where using something like libco can really simplify the code when that much accuracy is needed.

Will probably do S-DD1 next since it seems like that'll be fairly straightforward to add using the existing public domain implementation. And it won't require any debug tools, too, which certainly helps a lot!


Top
 Profile  
 
PostPosted: Tue Jul 30, 2019 10:22 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1518
Quote:
In terms of max overclock values, on the NES, both FCEUX/Mesen/puNES support up to 1000 lines of either type, iirc, so that's what I've implemented at the moment


Hmm ... I'm doing my overclocking as a % value. For NTSC, each frame gets:

uint clocks = 262*1364;
uint extraclocks = (clocks*overclock)-clocks; //overclock = 1.0 - 4.0
So 786 extra scanlines per frame max currently.
I guess I'll make the upper limit 500%, but ... damn that's a lot of overhead to the emulation, heh.
400% already drops max framerate from ~370fps to ~190fps.
Now throw in a 500% SA-1 overclock to go with it, or better yet, a 500% overclock of the ARM6 ... 105MHz 32-bit CPU, anyone? MSU2? :P

Quote:
FYI, AxlRocks has been testing some SNES titles on Mesen-S to see how they behave on with different values of the before/after NMI settings (he's done this on the NES for like 100+ games on FCEUX/Mesen in the past, too).


Something I've been doing with bsnes' speed hacks is detecting games that don't like them to selectively disable them (eg pixel renderer for Air Strike Patrol, cycle DSP for Koushien 2, etc.)

I'd like the idea of us building out a database of 'metadata' for games like this, and we could include information like "maximum stable overclock%", etc. Stuff like this is great for the end-user experience, instead of them having to guess and change settings per-game.

Quote:
In other news, just committed Super FX support (including the %-based overclocking for it, though mine only supports multiples of 100%, up to 1000% atm).


The SuperFX overclocks so much better than other CPUs. And the way it sleeps when done means you aren't completely murdering performance like with overclocking the main CPU. You can really go to town and games don't care. The one exception is the Stunt Race FX menus will fail after about 400%, but in-game benefits so much that I'd rather not cap it.

I haven't personally noticed any gains past 800%, but I guess if we're allowing 500% on the CPU, 1000% on the SFX seems reasonable.

Quote:
this is definitely one of the scenarios where using something like libco can really simplify the code when that much accuracy is needed.


The areas where libco was a lifesaver:

1. I prefer to not enslave the other processors to the main CPU. This is because the main CPU can start a DMA that is ten frames in length. I know, no game ever will. But it can. Supporting the bus hold delays on CPU reads plus the DMA/HDMA sync handling to exit the CPU resulted in a nightmarishly complex state machine. If someone wants to be a troll, a test ROM that verifies proper DMA/HDMA sync and then does consecutive 10-frame DMAs would be a good one :P

2. building on 1, the CPU and SMP talk so infrequently, and only over a limited 4-byte range, that you can run them way out of order of each other. It ends up being a decent speed *boost* to use libco to be able to treat the SMP like a regular opcode-based interpreter and just context switch when it's (rarely) needed.

3. as you mentioned, the SuperFX ROM/RAM buffering is very pesky otherwise.

And the areas where libco has proven a hindrance:

1. the SA1 shares all of ROM and RAM, so you can effectively never run one CPU ahead of the other. All of that context switching is unbelievably painful.

2. the CPU and PPU have a lot of trouble, too. Even though there's a limited 64-byte window for communication, there's also the H/Vblank signals. I ended up implementing a PPUcounter class for the CPU to inherit which basically predicts what the PPU H/Vblank statuses woud be for any given cycle, because otherwise the CPU wouldn't know if it could run more before the PPU blanking signals would change, and the PPU couldn't run ahead because it wouldn't know if the CPU would write to one of its registers.

libco works amazingly when either thread can run well ahead of the other. Only one is enough. But it falls apart when neither thread can, because you just end up context switching every cycle of each thread.

Probably the best idea, if someone were willing, would be to use cooperative threading where it excels, and state machines where it does not.

Further, libco doesn't solve the problem of processors that do multiple things in parallel. Eg the CPU ALU, the PPU running backgrounds and sprites separately, etc. It would if we ran a thread for each of those things, but there is no fricking way we can afford the overhead of that on modern CPUs =(

Quote:
Will probably do S-DD1 next since it seems like that'll be fairly straightforward to add using the existing public domain implementation. And it won't require any debug tools, too, which certainly helps a lot!


One of these days I'd like to simplify Andreas Naive's SDD1 decompression code. Talarubi did that to neviksti's SPC7110 decompression code, and it's probably my favorite code in higan to look at. It's just been a low priority.

The one tricky thing about the SDD1 is that it spies on $4300-437f in order to operate the decompression. If you're not ideologically opposed to crude hacks, this is no problem at all in practice to have the SDD1 core peek inside the CPU core's internal state, but otherwise it's a bit pesky.

(Eg for the NES, mappers are a hundred times easier when you can just steal the NES PPU H/Vcounter.)


Top
 Profile  
 
PostPosted: Wed Jul 31, 2019 6:15 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21564
Location: NE Indiana, USA (NTSC)
byuu wrote:
Something I've been doing with bsnes' speed hacks is detecting games that don't like them to selectively disable them (eg pixel renderer for Air Strike Patrol, cycle DSP for Koushien 2, etc.)

I'd like the idea of us building out a database of 'metadata' for games like this, and we could include information like "maximum stable overclock%", etc. Stuff like this is great for the end-user experience, instead of them having to guess and change settings per-game.

That tech of using to turn speed hacks on or off based on a ROM hash reminds me of what's described in a patent that has been discussed before.

byuu wrote:
I ended up implementing a PPUcounter class for the CPU to inherit which basically predicts what the PPU H/Vblank statuses woud be for any given cycle

If others are interested in this particular tech, see "Prediction" on the wiki.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Wed Jul 31, 2019 6:59 am 
Offline

Joined: Mon May 02, 2016 5:55 am
Posts: 29
Being a happy Mesen user I decided to finally take Mesen-S for a spin.

Used latest dev build.

Threw ASP at it.


Nice PPU :)


SO excited for this emulator now.


Top
 Profile  
 
PostPosted: Wed Jul 31, 2019 2:18 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 724
Yea, I'm unsure how necessary having up to 1k extra lines on the SNES is - as people test out more games, time will tell I suppose. Regarding having a DB, it's definitely something I've wanted to do before on the NES, too (that and a DB of "ideal" overscan settings to hide the garbage at the edges on the NES). In theory, if AxlRocks continues with his tests, could probably turn his spreadsheet into a database pretty easily - having settings for every single game on the SNES is going to take a long time, though.

I essentially made the Super FX overclock go up to 1000% since it's on the same settings page as the extra scanlines, which also go up to 1000 :p I didn't actually test it too much beyond validating that it sped up starfox's gameplay (and thus was working "as intended")

RE: syncing processors, in my case everything is being "synced" by the master clock counter, pretty much. Every CPU cycle increments it, every dma read/write increments it, every time it's incremented, anything that "needs" high accuracy is updated too (e.g Super FX, SA1, irq checks, etc.) The SPC/DSP/PPU catch up when their registers are read/write (or once per frame at minimum). The PPU only generates the picture at the end of each scanline, unless registers are read/written during the scanline, in which case it'll split up the scanline into however as many batches as it needs. So it's a fairly simple system really, nothing fancy, but it does let me do more or less whatever I want with each piece relatively easily. (keeping in mind that everything except the main CPU/SPC run with instruction-level granularity because the only thing I turned into a state machine is the SPC)

Though now you've given me an idea w/ regards to the PPU counters & h/v irqs checks that I might be able to use to reduce the overhead it takes to check the irq flags every 4 master clocks, will have to try that out.

byuu wrote:
The one tricky thing about the SDD1 is that it spies on $4300-437f in order to operate the decompression.
Ah, so that's what the DMA code in the S-DD1 implementation is - hadn't really looked at it in detail yet. I'll have to see what makes it the simplest on my end, might just replace the read/write handler for the $4xxx registers with the S-DD1's handler and then forward it to the original handler after the S-DD1 is done processing it.

In general though, while I do try to keep my code as clean as I can, I've found that often times abstractions that make the code cleaner unfortunately end up also making it slower, esp. since everything in a console tends to be interconnected, heh. Most of the time I tend to favor speed over perfectly isolating the code for each piece of hardware (esp. since I end up with fairly slow code even if I do that :p)

007 wrote:
SO excited for this emulator now.
Glad to hear you like it! Let me know if you happen to find problems.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 252 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group