It is currently Tue Oct 17, 2017 6:30 am

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 138 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 10  Next
Author Message
PostPosted: Tue May 24, 2016 2:33 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
https://github.com/awjackson/bsnes-clas ... a6100569b8

For byuu: the functions that need changing (based on the datasheet) are op_pei, op_read_ildp_[bw], op_read_ildpy_[bw], op_sta_ildp_[bw], and op_sta_ildpy_[bw].

Actually, I just realized that as a microoptimization you could use readdpn() for the _w versions of all the addressing modes. Those are never going to be called in emulation mode, after all.

Along with fixing the bug, I got rid of all the separate _e versions of opcode handlers. Nearly all of them only differed in that they fix up the high byte of the SP after doing readstackn() or writestackn(), or they force the M and X flags to remain set when popping or otherwise modifying the flags register. The only instruction that's different enough between emulation and native mode to make the unified handler a tiny bit ugly is RTI.


Top
 Profile  
 
PostPosted: Tue May 24, 2016 4:55 am 
Offline

Joined: Thu Aug 12, 2010 3:43 am
Posts: 1589
byuu wrote:
I don't know of a single official game in the entire SNES library that runs in emulation mode, so there's bound to be more issues lurking in there than just this.

Doubt any official software does, but I wouldn't be surprised if the firmware of some copier or whatever does. I mean, copiers on the Mega Drive usually run in Master System mode, so I wouldn't be surprised if copiers on SNES run in emulation mode, especially seeing as you can access all the SNES registers from the first bank anyway (the biggest limit being the ROM size).


Top
 Profile  
 
PostPosted: Tue May 24, 2016 11:33 am 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 327
Location: FL
Is there something to gain from running in Master System mode?

Conversely, there's really no apparent benefit from intentionally running SNES code in emulation mode unless you really need to make your init two bytes smaller.


Top
 Profile  
 
PostPosted: Tue May 24, 2016 11:38 am 
Offline

Joined: Thu Aug 12, 2010 3:43 am
Posts: 1589
Revenant wrote:
Is there something to gain from running in Master System mode?

I'm taking a guess that they had more information about the Master System than the Mega Drive available at the time. (and possibly the gain from using a single 8-bit ROM, which would have been cheaper) I can't think of any other reason.


Top
 Profile  
 
PostPosted: Tue May 24, 2016 1:55 pm 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19085
Location: NE Indiana, USA (NTSC)
Revenant wrote:
there's really no apparent benefit from intentionally running SNES code in emulation mode unless you really need to make your init two bytes smaller.

(suppresses urge to compare Street Fighter II emulated in higan or MAME to the decidedly inferior PC-native port of Street Fighter II to make an "emulation mode" joke)

But thanks for the Super NES port of the test. I wonder whether more Super NES consoles or Apple IIGS computers are still in operation.


Top
 Profile  
 
PostPosted: Tue May 24, 2016 2:46 pm 
Offline
User avatar

Joined: Sun Jul 01, 2012 6:44 am
Posts: 337
Location: Lion's den :3
tepples wrote:
I wonder whether

This is it. I hereby officially start a "tepples wonders whether" counter. :lol:

_________________
Some of my projects:
Furry RPG!
Unofficial SNES PowerPak firmware
(See my GitHub profile for more)


Top
 Profile  
 
PostPosted: Tue May 24, 2016 5:30 pm 
Offline
User avatar

Joined: Sat Jan 03, 2015 5:58 pm
Posts: 367
Location: ...
Ramsis wrote:
tepples wrote:
I wonder whether

This is it. I hereby officially start a "tepples wonders whether" counter. :lol:

Search feature says 3 pages of posts by him with "wonder whether". Let's make it 4!


Top
 Profile  
 
PostPosted: Tue May 24, 2016 9:24 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
Alright, your fixes are in place for v098r11. Thanks again everyone, especially AWJ.
I also took part of your advice and merged the pe?_(e,n) functions, and simplified the ridiculous paranoia-masking in memory.hpp. There's a lot more cleanups needed on this core, but these alone get us a nice speed boost from 127fps to 131fps. I then gave up a bit of speed to 130fps in order to eliminate all op_*_(e,n) variants and just merge them into single opcodes. The only opcode that became really ugly as a result was op_rti, so I hard-coded a split for the function tail in order to preserve the seamless flow of instructions and the L prefixes, but it keeps us from duplicating the first 80% of the instruction at least.

I added "#define E if(regs.e)" and "#define N if(regs.n)", which I realize is kind of evil, but it's no worse than "#define L lastCycle();" was.

If we could come up with some alternative to remove the need for L, then we could add M/X for if(!regs.p.m) and if(!regs.p.x) and merge every single _b/_w split instruction together, which would probably be a big boost for cache locality (less code.)

If you weren't aware, Screwtape is now hosting WIP releases publicly via Gitlab, so there's full version history information now. There's a small delay in my WIPs making it there, so you won't see it if you look right now.

https://gitlab.com/higan/higan/tree/master/

Quote:
For anyone who cares, here's a quick and dirty SNES version of koitsu's test. As expected(?), it shows BBBB on hardware, but BBAA in bsnes-plus (not sure about current higan).


It shows nothing at all in higan because you aren't initializing the system correctly.


Top
 Profile  
 
PostPosted: Wed May 25, 2016 7:46 am 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 327
Location: FL
Whoops, fixed it. (See what I mean by "quick and dirty"? :P)


Top
 Profile  
 
PostPosted: Wed May 25, 2016 9:19 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
byuu wrote:
Alright, your fixes are in place for v098r11. Thanks again everyone, especially AWJ.
I also took part of your advice and merged the pe?_(e,n) functions, and simplified the ridiculous paranoia-masking in memory.hpp. There's a lot more cleanups needed on this core, but these alone get us a nice speed boost from 127fps to 131fps. I then gave up a bit of speed to 130fps in order to eliminate all op_*_(e,n) variants and just merge them into single opcodes. The only opcode that became really ugly as a result was op_rti, so I hard-coded a split for the function tail in order to preserve the seamless flow of instructions and the L prefixes, but it keeps us from duplicating the first 80% of the instruction at least.

I added "#define E if(regs.e)" and "#define N if(regs.n)", which I realize is kind of evil, but it's no worse than "#define L lastCycle();" was.

If we could come up with some alternative to remove the need for L, then we could add M/X for if(!regs.p.m) and if(!regs.p.x) and merge every single _b/_w split instruction together, which would probably be a big boost for cache locality (less code.)

If you weren't aware, Screwtape is now hosting WIP releases publicly via Gitlab, so there's full version history information now. There's a small delay in my WIPs making it there, so you won't see it if you look right now.

https://gitlab.com/higan/higan/tree/master/


I was aware of that Gitlab repository, but thanks all the same.

I don't think there's anything you can do about L; it's fundamental to how the 65816 checks interrupts. At least the ugliness would be confined to the tails of the addressing mode methods.

Here's what I had in mind for unifying the _b and _w and reducing code duplication. Define a few helper functions:

Code:
alwaysinline uint16_t regmask(bool eight)
{
  return eight ? 0xff : 0xffff;
}

alwaysinline uint16_t signbit(bool eight)
{
  return eight ? 0x80 : 0x8000;
}

alwaysinline void setreg(uint16_t &reg, uint16_t value, bool eight)
{
  reg = reg & ~regmask(eight) | value & regmask(eight);
}


Then use these helpers everywhere when modifying the a/x/y/s registers and setting the c/v/n/z flags, passing either r.p.m, r.p.x, or r.e (for the stack pointer) as the "eight" argument as appropriate. With the helper functions you can make the registers plain uint16_t and not compiler-dependent unions or your auto-bitmasking integer classes (which will perform extremely poorly when the range of bits to access isn't a compile-time constant--trust me, don't even try it)

The idea behind these helper functions is to minimize branches by encouraging the compiler to use conditional moves instead. For even stronger protection against gratuitous branches, put the mask and signbit lookup values into static const arrays inside the respective functions (I have no idea whether this would actually be smaller/faster or not)

Another idea I had was making p (the flags) incorporate e, and make the class itself enforce e/m/x consistency. But then you'll have to use getters and setters to access the individual flags and I know you hate that style (I trust you have the good judgement not to consider attaching an operator=() callback pointer to each flag, like you did with the SuperFX registers!) One other advantage to making p an opaque class is that you can experiment with things like byte-packing the less volatile flags, or storing n/z lazily (store the last result that affected them and convert it to a bool on demand) and see if they have any effect on performance.

Off topic, by any chance have you read this article? http://blog.codef00.com/2014/12/06/port ... using-c11/ It reminded me a lot of your new integer classes, though the specifics are a bit different (the templates described in the article don't support accessing arbitrary numeric bit ranges, only fields defined and named at compile time).


Top
 Profile  
 
PostPosted: Wed May 25, 2016 9:39 am 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19085
Location: NE Indiana, USA (NTSC)
AWJ wrote:
Another idea I had was making p (the flags) incorporate e

Google Search results for "envmxdizc" show that this idea may be viable.


Top
 Profile  
 
PostPosted: Wed May 25, 2016 11:44 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> I don't think there's anything you can do about L; it's fundamental to how the 65816 checks interrupts

The basic idea is obviously that it's a two-stage pipeline. So if we could manipulate the read/write/io functions to have a one-slot delay, and have the "execute new instruction" trigger the delay slot, it would work. But obviously it's not nearly that simple, since by the time we get to the next opcode execute, the last opcode work cycle has executed already.

I don't even want to imagine how we would emulate more than two stages of pipeline accurately. L is a crude hack.

> alwaysinline uint16_t regmask(bool eight) { return eight ? 0xff : 0xffff; }

That's not going to address the difference between A (leaves upper 8-bits alone) and X/Y (upper 8-bits always zero.)

You'd probably want regmaskM / regmaskX or something.

I kind of use what you're saying here though in my other cores, like V30MZ and it's 8/16/32-bit modes (32-bit for the long multiply/division.) But it's much easier to merge the the byte/word/dword instructions there because there's no lastCycle nonsense to worry about (or at least, we don't bother trying to.)

> you can make the registers plain uint16_t and not compiler-dependent unions

I wanted to build them around Natural<16>, and then a.l == a.byte(0), a.h == a.byte(1), but that's too much extra typing.

> your auto-bitmasking integer classes (which will perform extremely poorly when the range of bits to access isn't a compile-time constant--trust me, don't even try it)

They only experience a very minor slowdown due to poor compiler code generation. The logic to handle dynamic bit-masking would be identical if you write it yourself in your own code (in your case above, you're optimizing because there are only two possibilities, of course.)

> I trust you have the good judgement not to consider attaching an operator=() callback pointer to each flag, like you did with the SuperFX registers!

Say what you will, that avoids the need to have manual checks for r15 (pc) modifications in every single instruction handler.

> One other advantage to making p an opaque class is that you can experiment with things like byte-packing the less volatile flags, or storing n/z lazily

That will definitely speed things up (not by a lot, probably 1-2% of the total emulation time.)

This should be obvious by now, but I am intentionally writing a lot of code in higan sub-optimally. The goal is to make the code easier to read, so I don't employ 100% safe tricks like delayed computations of register flags. Even when they're expensive and rarely used, like V30MZ's ridiculous parity flag.

My goal (I don't always achieve it) is always "the least possible amount of code with the least amount of caches/copies."

I'm all for you applying them to your fork, but it's unfortunate the way they've become so different that there's no hope of you merging upstream changes anymore. You're gonna have to add any of my changes/fixes by hand to your code. That was already the case though since you're still working off a five-year old fork.

> Off topic, by any chance have you read this article?

No, but I'm very familiar with bitfields.

Whereas the order_[ml]sb macros can result in union/structs on bytes that work on 100% of platforms I've tested, bit-packing is vastly more volatile in doing what you want. C doesn't make any guarantees for bit-ordering, when things will actually compact versus pad, etc.

The code gen for them is also worse than my flag classes are. (Yes, I had a dumb issue where I was recomputing all of P everywhere. That's been fixed for a while now.) I'm referring to both when I used to pack the eight bits of P into one uint8_t, and the current method where I split them into eight booleans.

> the templates described in the article don't support accessing arbitrary numeric bit ranges, only fields defined and named at compile time

What I really wanted to see was C++ support unified function call syntax, including on native types.

So I could say:
Code:
uint bits(uint16_t&, uint lo, uint hi);
uint16_t x;
x.bits(2, 3);


Because I am never going to be happy with "bits(x, 2, 3);" for many obvious reasons.

If I could get that, then uint8/16/32/64 could continue to be their native types in higan, and I'd only need Natural<T> for non-power-of-two integers, which are a whole lot less common in the codebase.

Unfortunately, some dipshit on the Jacksonville panel torpedoed the idea. It would have been the most revolutionary feature added to C++ since at least 1998 had they done it. We could have started implementing truly encapsulated data structures without resorting to "useless" (std) or "kitchen sink" (nall) style classes.


Top
 Profile  
 
PostPosted: Wed May 25, 2016 1:09 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 429
Quote:
> alwaysinline uint16_t regmask(bool eight) { return eight ? 0xff : 0xffff; }

That's not going to address the difference between A (leaves upper 8-bits alone) and X/Y (upper 8-bits always zero.)


As long as you make sure you clear the upper bits of X/Y every time the x flag changes from 0 to 1, that doesn't matter.

Quote:
> your auto-bitmasking integer classes (which will perform extremely poorly when the range of bits to access isn't a compile-time constant--trust me, don't even try it)

They only experience a very minor slowdown due to poor compiler code generation. The logic to handle dynamic bit-masking would be identical if you write it yourself in your own code (in your case above, you're optimizing because there are only two possibilities, of course.)


The reason the slowdown is only "minor" is because the compiler can optimize for only one possibility when it knows what the bit range is at compile time. Remember what I said once about how "inline does more than chopping off call and ret"?

Quote:
> I trust you have the good judgement not to consider attaching an operator=() callback pointer to each flag, like you did with the SuperFX registers!

Say what you will, that avoids the need to have manual checks for r15 (pc) modifications in every single instruction handler.


I thought of a way to improve on that, but haven't tested it for performance yet (mainly because I know it will break Revenant's debugger enhancements). My idea is to have each opcode return the index of the register it modified (SuperFX instructions never modify multiple arbitrary registers--if they modify two registers one of them is always one with no side effects) Then put the r14/r15 check after the opcode dispatch:

Code:
auto changedreg = op_exec();
if(changedreg == 14) {// handle data caching }
else if(changedreg == 15) {// handle pipeline flushing }


The other places where registers can change are in the MMIO and debugger, and you can simply duplicate the checks there (well, that assumes you use an actual interface to modify emulated state from your debugger, and not just the equivalent of #define private public...)

(the previous parenthesized snark was targeted at Revenant as much as at byuu)

Quote:
What I really wanted to see was C++ support unified function call syntax, including on native types.

So I could say:
Code:
uint bits(uint16_t&, uint lo, uint hi);
uint16_t x;
x.bits(2, 3);


Because I am never going to be happy with "bits(x, 2, 3);" for many obvious reasons.

If I could get that, then uint8/16/32/64 could continue to be their native types in higan, and I'd only need Natural<T> for non-power-of-two integers, which are a whole lot less common in the codebase.

Unfortunately, some dipshit on the Jacksonville panel torpedoed the idea. It would have been the most revolutionary feature added to C++ since at least 1998 had they done it. We could have started implementing truly encapsulated data structures without resorting to "useless" (std) or "kitchen sink" (nall) style classes.


Your Natural class supports writing to arbitrary ranges of bits, not just reading from them. I don't see how you could achieve that with unified function call syntax even if it was in the language.

ETA: Completely offtopic, but while we're communicating on more-or-less civil terms I'm going to take this opportunity to call out something you said a while back about how hypothetical Java-style inner classes could be a magic fast box for C++:

Quote:
In this case, PPU::vblank knows exactly where the cpu. object is. It doesn't have to do any pointer lookups. When it comes to accessing properties, we can compute it as a single displacement against the object's address, as if the property existed inside the PPU class itself.


Even in Java, instances of inner classes contain hidden pointers to their parent outer-class instance. That's why Java programming guides advise you to make inner classes "static" unless they really need access to non-static members of their parent outer class ("static" in the context of Java inner classes means "omit the implicit pointer-to-parent-instance")

Why can't the compiler just apply a displacement to the address of the inner object to get the address of its parent? Think about what happens if you have multiple instances of an inner class inside the outer class, as in your own hypothetical example:

Code:
class SuperFamicom {
  subclass CPU cpu;
  subclass PPU {
    subclass BG { ... } bg1, bg2, bg3, bg4;
    subclass OBJ { ... } obj;
  } ppu;
  subclass SMP smp;
  subclass DSP dsp;
};


What displacement does a member function of BG need to apply to its "this" pointer to get the address of PPU (or CPU, or any of the other classes)? That depends on whether the BG is a bg1, a bg2, a bg3 or a bg4! It's going to need a per-instance "offset to parent" hidden member, which takes up just as much space and imposes just as much runtime overhead as a straight pointer to the parent object would.


Last edited by AWJ on Wed May 25, 2016 2:48 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Wed May 25, 2016 2:23 pm 
Offline

Joined: Sat Apr 25, 2015 1:47 pm
Posts: 327
Location: FL
I've got no problem reworking my register editor (or any other feature I'm responsible for) to support / allow for a cleaner approach to modifying state. I remember we briefly discussed it a while ago but I haven't really looked into at all since then.


Top
 Profile  
 
PostPosted: Wed May 25, 2016 5:40 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1338
> (the previous parenthesized snark was targeted at Revenant as much as at byuu)

You can always write your own emulator from scratch if mine is really so bad.

We really do need someone to write an Snes9X replacement, given said project is dead in all but name. And PCs aren't getting any faster, so bsnes is unlikely to run on cheap portable hardware any time soon.

> Your Natural class supports writing to arbitrary ranges of bits, not just reading from them. I don't see how you could achieve that with unified function call syntax even if it was in the language.

The same way it does it now:
Code:
template<typename T> bitrange_t {
  T& source;
  const type Lo;
  const type Hi;
  operator T() const;
  auto& operator=(T);
};

bitrange_t<uint16_t> bits(uint16_t& source, uint lo, uint hi);


The key advantage here would be that uint(8,16,32,64) would be native types for when .bits is not being used on them. This would be huge. Things like vector<Natural<8>> is not compatible with a function that takes a vector<uint8_t>.

> hypothetical Java-style inner classes could be a magic fast box for C++:

Speaking for the PPU, I never said it would be faster. I said the code would be nicer to look at. We'd lose a bunch of excess "ppu." prefixes all over the PPU core. Even if it ends up slower than capturing PPU& inside PPU::BG1, etc (and I can't see how); I'd still do it.

As for the general case of a "class SuperFamicom", it would be the same performance as I had now (if things were static), or it would incur the same 10% performance penalty from when I used to have global CPU*, PPU*, etc handles.

The magic speedup I speak of happens from having "CPU cpu;" in the global namespace. Which makes instantiating multiple SNES cores with bsnes impossible, but allows the PPU's reference to cpu.foo to be turned into a static address without the need for displacement.

> while we're communicating on more-or-less civil terms

I never made things personal. I asked you several times not to. I very much didn't want to block you on my own forums, and I gave you more chances than anyone.

I am extremely grateful for your help in addressing legitimate bugs, and always will be no matter what transpires in the future between us.

I wish you wouldn't keep attacking every last detail of my programming style. I hate that, but I can mostly take it. I know I'm weird. What do you expect from a high school dropout who self-taught himself everything? At the end of the day, crazy as my software is, it works. Right now, no one can point to a game bug that isn't possible on real hardware in the entire SNES library. Something no one else can claim. And yes, that was thanks to the help of probably two dozen other people, including yourself. I'm not claiming all the credit, but I am the one that took everything and put it together into a real product. I'm saying that the horrible codebase that's so heaped with scorn ... works. So surely it can't be all bad. (And I mean seriously, have you seen other SNES emulator codebases? One of them has a CRC32 table inside the memmap.c file for the SNES CPU address bus handling, and has code like the block below. Another one is pure commentless x86 assembler. Why are you being so critical of my work? >_<)

Code:
//x is a uint16_t type
if(x == 0xffff)
{
    x = 0;
}
else
{
    x = x + 1;
}


We have our differences on how we'd design things. You can make any changes you like in your fork. You don't have to agree with my choices, but please respect that we're each entitled to our own opinions on how software should be designed.

Here's the thing I'll admit to: I am not as smart as you! Or as smart as most of bsnes' contributors. The codebase is simple not just because I believe in simplicity, but because anything more and I can't comprehend it. I keep it this simple, this slow, this in my own personal style, because that's what I need to do to keep it all straight in my head. And when you laugh at many of my ideas, you're attacking code that's 12+ years old. I'm still not a perfect programmer, but I like to think I'm better than some of the code in there from 2004. Singling out a line like a XOR swap in a 370,000-line codebase is kind of unfair.

When it comes to stupid design choices: you can achieve the same results by politely pointing out, "have you tried X instead of Y here?", and I may or may not agree. But if I don't, being rude about it is only going to make me less likely to make the changes you want (I'm somewhat petty in that regard), and will just damage our relationship. I know it can be frustrating when someone won't agree with you. We both experience that about each other on our disagreements.

It's when you go around insulting my integrity, competence, and anyone who doesn't take your side that it gets to be a problem. As important as bsnes is to me, I'm not going to put up with ad hominem attacks against my character. If you do that here, then I'll stop reading your messages and stop responding to you for good. I really don't want to do that. I don't want to miss out on bugfixes like from this thread. But if that's what I have to do, then I will.

If you can avoid insulting me as a person, then I'd like us to consider the past water under the bridge. If there's something I can do to help facilitate that, please let me know. I'd like to make an equal effort here.

If you have to attack me, please at least do it in places where it won't get back to me. Admittedly not easy, I seem to unintentionally have more little birds than Lord Varys >_> I've even had to threaten to ban people on my board to try and get them to stop relaying stuff like that to me.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 138 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 10  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot], Jarhmander and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group