It is currently Sun Oct 22, 2017 9:37 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 50 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Jan 08, 2017 9:31 am 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 285
Thanks!
Yea, the 16kb loading is a bug I realized after posting, but didn't fix yet. It copies the 16kb bank to $8000-$BFFF, so that obviously doesn't work. Also, I forgot to mention it, but at the moment it's only meant to run with mapper 0 stuff - though I suppose very simple mappers could be added easily with HLE without really having too much impact on the accuracy.

For the alignment, I just tried changing the soft reset logic to not alter the state of the chip other than putting the reset signal low for a given number of cycles, and it seems to yield 6 (out of a possible 8) different alignments (on a half-master clock level). I was under the impression there were only 4 possible alignments, so maybe I'm doing this wrong.

calima wrote:
This is exactly the kind of project that would benefit greatly from PGO. Perhaps even 2x or more.
A quick test with PGO seems to yield approximately ~15% faster code (4900hz -> 5500hz on my machine). Which is pretty similar to what I get on Mesen with PGO, too.
At the moment ~50-60% of the time is spent in this recursive function. I haven't been able to find any way to make it faster though. Converting "group" from a vector to a hashset makes it slower (presumably because "group" is usually very small), and the way it works makes it pretty hard/impossible to split the work across multiple threads without a ton of lock contention.


Top
 Profile  
 
PostPosted: Sun Jan 08, 2017 12:05 pm 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
What is the range of c1/c2? Perhaps you can use that to optimize it.


Top
 Profile  
 
PostPosted: Sun Jan 08, 2017 7:19 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 285
C1/C2 corresponds to node numbers - between both chips, they range from 0 to 33000 (though some numbers are unused).

In other news, I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content) and found out the writes to the PPU don't seem to be working as expected. Writing to $2000 to enable NMIs doesn't appear to work, and a simple program like:
Code:
LDA #$77
STA $2007
JMP $0000
Ends up writing garbage to VRAM instead of $77. Writing to CPU RAM works as expected, though, so the problem seems to be the communication between both chips. It's probably a silly mistake, but I've been looking at this for a few hours already and I haven't been able to figure it out.

If anyone's willing to check if they see something that's obviously wrong, it'd probably be around here: halfstep()
clk0 is the master clock, cpu_clk0 is the cpu's clock (e.g clk0 / 12) and io_ce is the chip enable input on the PPU (which is based on the cpu's address bus & phi2)
I'm unsure if the logic I'm using to replicate the 74139's behavior (io_ce) is correct, among other things.

Everything else is pretty much copy/pasted from the original javascript simulators, though. The major difference being that the original Visual 2C02 uses the function handleIoBus() to emulate a CPU (this used to work in the C# version too, before I integrated the 2A03 into it).


Top
 Profile  
 
PostPosted: Sun Jan 08, 2017 7:26 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10066
Location: Rio de Janeiro - Brazil
Sour wrote:
I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content)

I don't see much use for a high-level interpretation of the VRAM contents... Wouldn't it be MUCH better to decode the composite signal generated by the PPU? Now that would be awesome!

Quote:
and found out the writes to the PPU don't seem to be working as expected.

Doesn't this have to do with the fact that the PPU needs time to "warm up"? Games are required to wait 1 or 2 frames before using the PPU for this reason. I don't know anything about this type of low-level simulation, so this is the only thing I can think of!


Top
 Profile  
 
PostPosted: Sun Jan 08, 2017 8:15 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1390
Sour wrote:
At the moment ~50-60% of the time is spent in this recursive function. I haven't been able to find any way to make it faster though. Converting "group" from a vector to a hashset makes it slower (presumably because "group" is usually very small), and the way it works makes it pretty hard/impossible to split the work across multiple threads without a ton of lock contention.

My first suggestion would be to add an array of booleans big enough to count every single node (e.g. "vector<bool> groupbool"), then use it to keep track of whether any node is in the list or not ("if (!groupbool[i]) return; groupbool[i] = true; group.push_back(i);", and the opposite when removing an element from "group") in order to avoid the delay of searching through the vector each time.

For small sets of node updates, it probably won't help that much, but for a very high-use function, every little bit helps.

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
PostPosted: Mon Jan 09, 2017 4:48 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
For such a small range, the first thing is to move to unsigned short/uint16_t instead of int. Including in your data struct, you're jumping all over memory in that function, so shrinking the data will increase cache hits.

You may also consider dividing the data into two containers/arrays, one just for the hot function and one with the rest. Again for improved cache hits.

The range then enables other things, like using a fixed-size presence array like Quietust said above.


Top
 Profile  
 
PostPosted: Mon Jan 09, 2017 7:57 am 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2963
Location: Tampere, Finland
tokumaru wrote:
Sour wrote:
I was trying to get the simulator to display the PPU's actual output (based on the VRAM's content)

I don't see much use for a high-level interpretation of the VRAM contents... Wouldn't it be MUCH better to decode the composite signal generated by the PPU? Now that would be awesome!

And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Mon Jan 09, 2017 7:59 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10066
Location: Rio de Janeiro - Brazil
thefox wrote:
And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.

Yeah, that'd be pretty useful too!


Top
 Profile  
 
PostPosted: Mon Jan 09, 2017 6:04 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 285
Thanks for the suggestions - I've changed the ints to shorts, removed anything that wasn't actually required from the structs and a few other things. Adding an array of bool to avoid scanning "group" did not make any difference, though (seemed to be 1-2% slower)
Between these and PGO, it is roughly 50% faster than before (~7500hz instead of ~5000hz). I'm using a pretty old i5, so I'd imagine more recent CPUs should be able to get above 10kHz.

thefox wrote:
And/or tap into the intermediate signals (after palette look up, before composite signal generation) to generate a pixel-perfect output.
This was the first option I wanted to do, but couldn't find any node in the list that seemed to match. I was looking for things along the lines of "pixel" though, not palette. I just took another look at the node list and it seems like this might be what pal_d0_out to pal_d5_out are for - if so, I'll use those to generate the picture.

tokumaru wrote:
Doesn't this have to do with the fact that the PPU needs time to "warm up"?
Unfortunately, no. The writes to the registers are ignored during the first frame (due to the warm up period), but once they do start actually having an effect, they aren't working properly. I'm pretty sure it has to do with io_ce not being timed properly, but still haven't figured it out completely.


Top
 Profile  
 
PostPosted: Tue Jan 10, 2017 4:39 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
If the profiler still points to the group search, a bloom filter in front could be useful. One of my favorite speedup techniques.


Top
 Profile  
 
PostPosted: Tue Jan 10, 2017 5:38 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 285
calima wrote:
If the profiler still points to the group search, a bloom filter in front could be useful. One of my favorite speedup techniques.
I read up a bit of bloom filters, but I'm not quite sure see how I could apply them here? The nodes that are in a particular group change constantly as transistors turn on and off, and this is recursive, so a single transistor changing state could make a group go from 2 nodes to 50 nodes..

And I've fixed up some of the issues, it seems, but some remain (e.g: there's an incorrectly displayed sprite at the top left)
Also DK is the only game I found that boots in a reasonable amount of frames (SMB surprisingly takes about 30 frames..)
Attachment:
visualnes.png
visualnes.png [ 176.64 KiB | Viewed 753 times ]

For fun, here's what it looks like with Quietust's scanline test rom - bg color is wrong, but it actually displays exactly like what Mesen does (and like what Eugene posted a while ago from a real Famicom). Maybe caused by PPU-CPU alignment? I'd have to try to change the alignment and run it some more to see.
Attachment:
scanline.png
scanline.png [ 18.8 KiB | Viewed 753 times ]


Top
 Profile  
 
PostPosted: Tue Jan 10, 2017 8:26 pm 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 936
Quote:
(e.g: there's an incorrectly displayed sprite at the top left)

Donkey Kong, as [mostly-]good practice initializes the Y-values of unused sprites to FF. Anything from F0-FF should be invisible. I wonder if you got a Y-wrap introduced, and where.
for (J)'s .nes, 31bd:FF is the initial value used in the OAM-page initializing loop, if you want to change it to see if that's your issue.
for (U) it should be at 31ae.


Top
 Profile  
 
PostPosted: Tue Jan 10, 2017 11:37 pm 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2963
Location: Tampere, Finland
There's a bug in the Visual 2C02 OAM DMA: viewtopic.php?p=169373#p169373 (it does not actually seem to corrupting the source address to 0 always unlike I said in that post, instead it seems to depend on the value written and the hibyte of "ab": spr_addr = value_written AND hibyte(ab)).

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Wed Jan 11, 2017 7:24 am 
Offline
User avatar

Joined: Sat Apr 18, 2009 4:36 am
Posts: 257
Location: Russia
There are different versions of scanline.nes:
1) black: http://nesdev.com/scanline.zip
2) gray: https://github.com/christopherpow/nes-test-roms (see \scanline\scanline.nes)

I used gray:
viewtopic.php?f=3&t=14833


Top
 Profile  
 
PostPosted: Wed Jan 11, 2017 5:59 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1390
Eugene.S wrote:
There are different versions of scanline.nes:
1) black: http://nesdev.com/scanline.zip
2) gray: https://github.com/christopherpow/nes-test-roms (see \scanline\scanline.nes)

I used gray:
viewtopic.php?f=3&t=14833

The "black" one is, in fact, broken - it's the original version I wrote back in 2003 before I actually had it tested on a real NES.

The "gray" version is the correct one (and is the one taken from my website, as evidenced by the extra build-script input files in the directory).

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 50 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group