Visual Nes - C++/C# port of Visual 2A03 + 2C02

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Sour
Posts: 891
Joined: Sun Feb 07, 2016 6:16 pm

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Sour »

Thanks!
Yea, the 16kb loading is a bug I realized after posting, but didn't fix yet. It copies the 16kb bank to $8000-$BFFF, so that obviously doesn't work. Also, I forgot to mention it, but at the moment it's only meant to run with mapper 0 stuff - though I suppose very simple mappers could be added easily with HLE without really having too much impact on the accuracy.

For the alignment, I just tried changing the soft reset logic to not alter the state of the chip other than putting the reset signal low for a given number of cycles, and it seems to yield 6 (out of a possible 8) different alignments (on a half-master clock level). I was under the impression there were only 4 possible alignments, so maybe I'm doing this wrong.
calima wrote:This is exactly the kind of project that would benefit greatly from PGO. Perhaps even 2x or more.
A quick test with PGO seems to yield approximately ~15% faster code (4900hz -> 5500hz on my machine). Which is pretty similar to what I get on Mesen with PGO, too.
At the moment ~50-60% of the time is spent in this recursive function. I haven't been able to find any way to make it faster though. Converting "group" from a vector to a hashset makes it slower (presumably because "group" is usually very small), and the way it works makes it pretty hard/impossible to split the work across multiple threads without a ton of lock contention.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by calima »

What is the range of c1/c2? Perhaps you can use that to optimize it.
Sour
Posts: 891
Joined: Sun Feb 07, 2016 6:16 pm

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Sour »

C1/C2 corresponds to node numbers - between both chips, they range from 0 to 33000 (though some numbers are unused).

In other news, I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content) and found out the writes to the PPU don't seem to be working as expected. Writing to $2000 to enable NMIs doesn't appear to work, and a simple program like:

Code: Select all

LDA #$77
STA $2007
JMP $0000
Ends up writing garbage to VRAM instead of $77. Writing to CPU RAM works as expected, though, so the problem seems to be the communication between both chips. It's probably a silly mistake, but I've been looking at this for a few hours already and I haven't been able to figure it out.

If anyone's willing to check if they see something that's obviously wrong, it'd probably be around here: halfstep()
clk0 is the master clock, cpu_clk0 is the cpu's clock (e.g clk0 / 12) and io_ce is the chip enable input on the PPU (which is based on the cpu's address bus & phi2)
I'm unsure if the logic I'm using to replicate the 74139's behavior (io_ce) is correct, among other things.

Everything else is pretty much copy/pasted from the original javascript simulators, though. The major difference being that the original Visual 2C02 uses the function handleIoBus() to emulate a CPU (this used to work in the C# version too, before I integrated the 2A03 into it).
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by tokumaru »

Sour wrote:I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content)
I don't see much use for a high-level interpretation of the VRAM contents... Wouldn't it be MUCH better to decode the composite signal generated by the PPU? Now that would be awesome!
and found out the writes to the PPU don't seem to be working as expected.
Doesn't this have to do with the fact that the PPU needs time to "warm up"? Games are required to wait 1 or 2 frames before using the PPU for this reason. I don't know anything about this type of low-level simulation, so this is the only thing I can think of!
User avatar
Quietust
Posts: 1920
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Quietust »

Sour wrote:At the moment ~50-60% of the time is spent in this recursive function. I haven't been able to find any way to make it faster though. Converting "group" from a vector to a hashset makes it slower (presumably because "group" is usually very small), and the way it works makes it pretty hard/impossible to split the work across multiple threads without a ton of lock contention.
My first suggestion would be to add an array of booleans big enough to count every single node (e.g. "vector<bool> groupbool"), then use it to keep track of whether any node is in the list or not ("if (!groupbool) return; groupbool = true; group.push_back(i);", and the opposite when removing an element from "group") in order to avoid the delay of searching through the vector each time.

For small sets of node updates, it probably won't help that much, but for a very high-use function, every little bit helps.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by calima »

For such a small range, the first thing is to move to unsigned short/uint16_t instead of int. Including in your data struct, you're jumping all over memory in that function, so shrinking the data will increase cache hits.

You may also consider dividing the data into two containers/arrays, one just for the hot function and one with the rest. Again for improved cache hits.

The range then enables other things, like using a fixed-size presence array like Quietust said above.
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by thefox »

tokumaru wrote:
Sour wrote:I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content)
I don't see much use for a high-level interpretation of the VRAM contents... Wouldn't it be MUCH better to decode the composite signal generated by the PPU? Now that would be awesome!
And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by tokumaru »

thefox wrote:And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.
Yeah, that'd be pretty useful too!
Sour
Posts: 891
Joined: Sun Feb 07, 2016 6:16 pm

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Sour »

Thanks for the suggestions - I've changed the ints to shorts, removed anything that wasn't actually required from the structs and a few other things. Adding an array of bool to avoid scanning "group" did not make any difference, though (seemed to be 1-2% slower)
Between these and PGO, it is roughly 50% faster than before (~7500hz instead of ~5000hz). I'm using a pretty old i5, so I'd imagine more recent CPUs should be able to get above 10kHz.
thefox wrote:And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.
This was the first option I wanted to do, but couldn't find any node in the list that seemed to match. I was looking for things along the lines of "pixel" though, not palette. I just took another look at the node list and it seems like this might be what pal_d0_out to pal_d5_out are for - if so, I'll use those to generate the picture.
tokumaru wrote:Doesn't this have to do with the fact that the PPU needs time to "warm up"?
Unfortunately, no. The writes to the registers are ignored during the first frame (due to the warm up period), but once they do start actually having an effect, they aren't working properly. I'm pretty sure it has to do with io_ce not being timed properly, but still haven't figured it out completely.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by calima »

If the profiler still points to the group search, a bloom filter in front could be useful. One of my favorite speedup techniques.
Sour
Posts: 891
Joined: Sun Feb 07, 2016 6:16 pm

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Sour »

calima wrote:If the profiler still points to the group search, a bloom filter in front could be useful. One of my favorite speedup techniques.
I read up a bit of bloom filters, but I'm not quite sure see how I could apply them here? The nodes that are in a particular group change constantly as transistors turn on and off, and this is recursive, so a single transistor changing state could make a group go from 2 nodes to 50 nodes..

And I've fixed up some of the issues, it seems, but some remain (e.g: there's an incorrectly displayed sprite at the top left)
Also DK is the only game I found that boots in a reasonable amount of frames (SMB surprisingly takes about 30 frames..)
visualnes.png
For fun, here's what it looks like with Quietust's scanline test rom - bg color is wrong, but it actually displays exactly like what Mesen does (and like what Eugene posted a while ago from a real Famicom). Maybe caused by PPU-CPU alignment? I'd have to try to change the alignment and run it some more to see.
scanline.png
scanline.png (18.8 KiB) Viewed 3587 times
User avatar
Myask
Posts: 965
Joined: Sat Jul 12, 2014 3:04 pm

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Myask »

(e.g: there's an incorrectly displayed sprite at the top left)
Donkey Kong, as [mostly-]good practice initializes the Y-values of unused sprites to FF. Anything from F0-FF should be invisible. I wonder if you got a Y-wrap introduced, and where.
for (J)'s .nes, 31bd:FF is the initial value used in the OAM-page initializing loop, if you want to change it to see if that's your issue.
for (U) it should be at 31ae.
User avatar
thefox
Posts: 3134
Joined: Mon Jan 03, 2005 10:36 am
Location: 🇫🇮
Contact:

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by thefox »

There's a bug in the Visual 2C02 OAM DMA: viewtopic.php?p=169373#p169373 (it does not actually seem to corrupting the source address to 0 always unlike I said in that post, instead it seems to depend on the value written and the hibyte of "ab": spr_addr = value_written AND hibyte(ab)).
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi
User avatar
Eugene.S
Posts: 317
Joined: Sat Apr 18, 2009 4:36 am
Location: UTC+3
Contact:

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Eugene.S »

There are different versions of scanline.nes:
1) black: http://nesdev.com/scanline.zip
2) gray: https://github.com/christopherpow/nes-test-roms (see \scanline\scanline.nes)

I used gray:
viewtopic.php?f=3&t=14833
User avatar
Quietust
Posts: 1920
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: Visual Nes - C++/C# port of Visual 2A03 + 2C02

Post by Quietust »

Eugene.S wrote:There are different versions of scanline.nes:
1) black: http://nesdev.com/scanline.zip
2) gray: https://github.com/christopherpow/nes-test-roms (see \scanline\scanline.nes)

I used gray:
viewtopic.php?f=3&t=14833
The "black" one is, in fact, broken - it's the original version I wrote back in 2003 before I actually had it tested on a real NES.

The "gray" version is the correct one (and is the one taken from my website, as evidenced by the extra build-script input files in the directory).
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
Post Reply