It is currently Sun Dec 17, 2017 3:16 am

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Aug 31, 2017 6:33 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
I noticed via Twitter that byuu acquired additional upd7725 documentation and changed the implementation of the overflow flags in higan. The upd7725 overflow flags are something I had puzzled over and discussed with Lord Nightmare once (quite a long time ago, probably when I was initially backporting the DSP LLE into bsnes-classic) because I'd noticed that he'd changed the way MAME calculated the flags from how bsnes did it. Neither the original bsnes implementation nor LN's modified MAME implementation looked quite right to me, but although I understood how the flags were meant to be used, I couldn't figure out how to calculate them so that they could be used in that way.

Thanks to the new documentation, specifically the explanation of how the S1 flag is calculated and the "1, 0, 1" (overflow, no overflow, overflow) case, I think I've figured out how everything works, and it's a good bit simpler than byuu's new implementation. In particular, I believe that there is no need for a flag "history buffer" and that the chip contains no such thing.

First of all, we have to understand what the OV0 and OV1 flags mean arithmetically. Basically, whereas OV0 indicates whether the most recent operation produced a signed overflow, OV1 indicates whether the value in the accumulator is in bounds (between -32768 and +32767) or whether it is overflowed. Let's think about how to calculate that, and build up a truth table.

First, the easy cases. If the accumulator previously contained an in-bounds value, and no overflow occurred in the last operation, then the accumulator must still contain an in-bounds value. Likewise, if the accumulator previously contained an in-bounds value and an overflow occurred, the value in the accumulator is now out of bounds.

Code:
OV1in  OV0 | OV1out
-----------+--------
  0     0  |   0
  0     1  |   1


Next, if the accumulator was previously out of bounds, and no overflow occurred in the last operation, then the accumulator is still out of bounds. This is perhaps not quite as easy to intuit as the first two cases, but think about it: the only way the accumulator can go from out of bounds back to in bounds is if a second overflow occurs, in the opposite direction of the original overflow.

Code:
OV1in  OV0 | OV1out
-----------+--------
  1     0  |   1


Finally, the hard case: what happens if the accumulator was out of bounds and another overflow occurs? Let's look at a couple of examples:

Code:
32767 + 1 + (-2) (hex: $7FFF + $0001 + $FFFE)

$7FFF + $0001 = $8000 + overflow
$8000 + $FFFE = $7FFE + overflow


Adding $7FFF to $0001 gives a result of $8000 with an overflow (positive + positive = negative). Adding $FFFE to the result gives a result of $7FFE and a second overflow (negative + negative = positive). However, despite the two overflows, the final result is correct and in bounds: 32767 + 1 + (-2) equals 32766. The two overflows have cancelled each other out. Now let's look at another example:

Code:
32767 + 32767 + 32767 + 32767 (hex: $7FFF + $7FFF + $7FFF + $7FFF)

$7FFF + $7FFF = $FFFE + overflow
$FFFE + $7FFF = $7FFD
$7FFD + $7FFF = $FFFC + overflow


On the first addition, an overflow occurs (positive + positive = negative). No overflow occurs on the second addition, but on the third addition another positive + positive = negative overflow occurs. This time, the final result is not in bounds: 32767 + 32767 + 32767 + 32767 isn't -4 or even 65532, it's 131068 (hex $1FFFC).

The difference between these two cases is that in the first case the two overflows were in opposite directions, and in the second case both overflows were in the same direction. The purpose of the S1 flag is to distinguish between these two cases. According to the datasheet, the S1 flag contains the sign of the result of the last operation that took place with the incoming OV1 flag clear; in other words, the last operation that took place while the incoming accumulator was in bounds. If the S1 flag is the same as the S0 flag produced by the current overflowing operation, then two overflows in the same direction have occurred and the accumulator is still out of bounds. If the S1 flag and the S0 flag are different, then two overflows in opposite directions have occurred, meaning the accumulator went out of bounds and then back in bounds.

So here are the complete truth tables for S1 and OV1:

Code:
OV1in | S1
------+----
  0   | S0
  1   | unchanged

OV1in  OV0 | OV1out
-----------+--------
  0     0  |   0
  0     1  |   1
  1     0  |   1
  1     1  | (S0 == S1)


byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.

The datasheet implies that "overflow, no overflow, overflow" is some kind of special case that the chip explicitly checks for, but in fact it's just a consequence of the math. Two consecutive operations can't both overflow in the same direction; just look at the results from adding $7FFF (the largest possible positive number) to itself. If you work out the results of repeatedly adding $8000 (the smallest possible negative number) to itself, it's the same. You can only have two overflows in the same direction if there is at least one non-overflowing operation between them.

Note that in order to make use of this overflow mechanism, it is essential that the OV1 flag be cleared before you start doing your additions, or the S1 flag won't be updated when it should be. According to the datasheet, any ALU operation other than an addition or subtraction clears the OV1 flag, and if you look at a disassembly of the DSP1 program or Lord Nightmare's prose2k DSP program, you can see that they in fact do xor a,a or and a,a prior to any sequence of calculations that use the OV1 flag or the SGN pseudo-register.

One more thing: why is this overflow mechanism only good for three operations? Let's look at what happens if you do four additions in a row with the following values:

Code:
32767 + 32767 + 32767 + 32767 + -32768 (hex: $7FFF + $7FFF + $7FFF + $7FFF + $8000)

$7FFF + $7FFF = $FFFE  S0 = 1 S1 = 1 OV0 = 1 OV1 = 1
$FFFE + $7FFF = $7FFD  S0 = 0 S1 = 1 OV0 = 0 OV1 = 1
$7FFD + $7FFF = $FFFC  S0 = 1 S1 = 1 OV0 = 1 OV1 = 1
$FFFC + $8000 = $7FFC  S0 = 0 S1 = 1 OV0 = 1 OV1 = 0


We've already gone over the first three additions, so just look at what happens with the fourth. An overflow occurs (OV0 = 1) and S1 and S0 have opposite values, so the OV1 flag is cleared. Which means the accumulator is considered to be in bounds. But this is wrong--32767 + 32767 + 32767 + 32767 + -32768 is 98300, not 32764! If you do four additions in a row, it becomes possible for two overflows in one direction to occur followed by an overflow in the opposite direction, resulting in a false negative.


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 6:39 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Here's a python program that implements the overflow flags according to the previous post and verifies that for a representative range of input values they produce arithmetically correct results for three additions (OV1 is set if and only if the final result is out of bounds) but that they don't produce correct results for four additions.

Code:
#!/usr/bin/python3

def int16(n):
    return (n & 0x7fff) - (n & 0x8000)

class ALU(object):
    def __init__(self):
      self.a = 0
      self.s0 = 0
      self.s1 = 0
      self.ov0 = 0
      self.ov1 = 0

    def add(self, right):
      left = self.a
      result = left + right

      self.s0 = result & 0x8000
      if not self.ov1:
          self.s1 = self.s0

      # ov0 (result overflow)
      self.ov0 = (left ^ result) & (right ^ result) & 0x8000

      # ov1 (accumulator overflow)
      if self.ov1 and self.ov0:
          self.ov1 = (self.s1 == self.s0)
      else:
          self.ov1 = self.ov1 or self.ov0

      self.a = int16(result)


def main():
    alu = ALU()

    testrange = (-0x8000, -0x4000, -1, 0, 1, 0x3fff, 0x4000, 0x7fff)
    for a in testrange:
        for b in testrange:
            for c in testrange:
                for d in testrange:
                    alu.a = a
                    alu.ov1 = 0
                    alu.add(b)
                    alu.add(c)
                    alu.add(d)
                    overflow = (alu.a != a+b+c+d)
                    if overflow != bool(alu.ov1):
                        print("Uh oh! %d %d %d %d" % (a, b, c, d))

    for a in testrange:
        for b in testrange:
            for c in testrange:
                for d in testrange:
                    for e in testrange:
                        alu.a = a
                        alu.ov1 = 0
                        alu.add(b)
                        alu.add(c)
                        alu.add(d)
                        alu.add(e)
                        overflow = (alu.a != a+b+c+d+e)
                        if overflow != bool(alu.ov1):
                            print("4 is too many! %d %d %d %d %d OV1=%d" % (a, b, c, d, e, bool(alu.ov1)))

if __name__ == "__main__":
    main()


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 9:22 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
... wow, hat tip to you! This also solves a third question of mine, which was whether the three OV history values were cleared on other ALU operations (strongly leaning toward yes.) If they don't exist, then the question is void.

So we can essentially boil this down to:
Code:
s0 = result&0x8000;
ov0 = (the usual xor-and overflow magic);
if(!ov1) s1 = s0;
ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;


However, I do want to say ... there's no absolute guarantee the original designers of this chip were so clever, short of studying die scans. It's quite possible they implemented this with two extra boolean latches. The implementation details were simply beyond the scope of the programmer's manuals. And though I'll definitely add this, it does lose a bit of clarity in the process. Not that my article's code example was all that clear to begin with, but still. If there were ever a time for source code comments, this would be it.

By the way, my DSP LLE had a nasty flaw with SGN:
Code:
case  7: idb = 0x8000 - flags.a.s1; break;


I'm not sure why I chose to hard-code this to OVA1. Should be:
Code:
case  7: idb = 0x8000 - (!asl ? flags.a.s1 : flags.b.s1); break;

=> mov a,sgn
=> mov b,sgn

Quote:
if you look at a disassembly of the DSP1 program or Lord Nightmare's prose2k DSP program, you can see that they in fact do xor a,a or and a,a prior to any sequence of calculations that use the OV1 flag or the SGN pseudo-register.


So I did hear that on the SNES coprocessors, only the DSP1 uses SGN once. Does it actually do anything significant with the result where proper emulation of S1 would make an observable difference? If you're not sure, no need to look into it. I've been operating under the assumption that proper S1/OV1 support was mostly busywork, but ... perfectionism and all, finally got around to it thanks to much help from Cydrak.

Quote:
Thanks to the new documentation


If you'd like, send me your e-mail or let me know if you'd rather a mega link, and I can get the new documents to you. They're very low-quality scans, but they're quite thorough and explain a lot of the operations the SNES lacks in more detail: serial transfers, interrupts, etc.

Probably not very useful unless you're also a big fan of the prose2k hardware.

Quote:
The upd7725 overflow flags are something I had puzzled over and discussed with Lord Nightmare once


Heh, he went after you too, huh? ^-^
I'm sure he will be absolutely thrilled at your findings here :)


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 10:25 am 
Offline

Joined: Wed Apr 05, 2006 10:12 am
Posts: 126
Location: PA, USA
All of the docs came from https://web.archive.org/web/20170313202 ... -dsps/nec/
I ended up hand-feeding that entire directory of the site to archive.org manually page by page, because the site is hosted on a toaster or something and will go down hard for a few days if you download more than a dozen MB of data or so.

LN

_________________
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 10:53 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
By the way, my DSP LLE had a nasty flaw with SGN:


Actually, that's apparently not a flaw in your emulation. The upd7725 docs indicate that SGN is only affected by operations on accumulator A and not accumulator B. The upd7720 datasheet says that SGN is affected by either S1 flag, but the 1984-07 memo by Ted Knowlton points out that this is an error in the datasheet.


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 11:47 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> https://web.archive.org/web/20170313202 ... -dsps/nec/

Oh, forgot you did that. Awesome! Much easier than e-mail.

> The upd7720 datasheet says that SGN is affected by either S1 flag

... which is the one I was reading for these new updates. God dammit >_>

Okay, I'm definitely adding a source code comment on this one. Good catch.

> this is an error in the datasheet.

... as well as in the processor. Quite the hardware design oversight. Ruins using the three chained add/sub operations and then saturating the result if you use the B accumulator. Have to use J(N)SB1 now.


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 2:28 pm 
Offline

Joined: Wed Apr 05, 2006 10:12 am
Posts: 126
Location: PA, USA
byuu wrote:
> this is an error in the datasheet.

... as well as in the processor. Quite the hardware design oversight. Ruins using the three chained add/sub operations and then saturating the result if you use the B accumulator. Have to use J(N)SB1 now.


Based on Ted's note on that site, I don't believe it affects the upd7720 itself, it is just a datasheet error, which was corrected on the upd7725 datasheet.

LN

_________________
"When life gives you zombies... *CHA-CHIK!* ...you make zombie-ade!"


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 3:06 pm 
Offline

Joined: Wed Apr 01, 2009 12:03 pm
Posts: 13
Location: Langara
SGN is used twice by the DSP1, for op02 (Projection Parameter Setting) and for op06 (Object Projection Calculation). The result doesn't seem to be discarded.


Top
 Profile  
 
PostPosted: Thu Aug 31, 2017 4:21 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> Based on Ted's note on that site, I don't believe it affects the upd7720 itself, it is just a datasheet error, which was corrected on the upd7725 datasheet.

I meant the CPU should honor the ASL bit to select between SA1 and SB1. Seems like an oversight in the design.

> SGN is used twice by the DSP1, for op02 (Projection Parameter Setting) and for op06 (Object Projection Calculation). The result doesn't seem to be discarded.

Interesting. Would think that would cause some observable emulation bugs given how wrong our S1 emulation was before ... :|


Top
 Profile  
 
PostPosted: Fri Sep 01, 2017 11:46 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
Quote:
byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.


Well, on that note ...

Code:
if(!ov1) s1 = s0;
ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;


Here, we can see the s0==s1 test doesn't get hit if ov1==0.

But what if we reverse this?

Code:
ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;
if(!ov1) s1 = s0;


Let's say at the start, ov0 was set (from this ALU operation), but ov1 was clear (from the previous ALU operation.) ov1 will be set to ov0|ov1, or 1. And now the if(!ov1)s1=s0; test will fail, whereas if we did the s1 test before the ov1 assignment, it would have transferred s0 into s1. ov1 will be set correctly either way, but the order of operations will affect the s1 output when s0!=s1.

It seems pretty clear (the manual basically says as much), and your truth table seems to confirm, we should do the if(!ov1) s1=s0; test first, but it's always good to clarify these things in documentation. Flag assignments are not usually dependent upon the results of other flag assignments in CPU emulators.


Top
 Profile  
 
PostPosted: Fri Sep 01, 2017 1:28 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19353
Location: NE Indiana, USA (NTSC)
byuu wrote:
Flag assignments are not usually dependent upon the results of other flag assignments in CPU emulators.

Except carry, where ROR A or its counterpart in other instruction sets moves carry to A (and thus to sign).


Top
 Profile  
 
PostPosted: Fri Sep 01, 2017 6:40 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
Quote:
byuu was wondering whether the OV1 test should use the new or old value of S1, but you can see from these truth tables that it doesn't matter. The value of S1 only changes if the previous OV1 was clear, while OV1 only depends on S1 if the previous OV1 was set.


Well, on that note ...

Code:
if(!ov1) s1 = s0;
ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;


Here, we can see the s0==s1 test doesn't get hit if ov1==0.

But what if we reverse this?

Code:
ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;
if(!ov1) s1 = s0;


Let's say at the start, ov0 was set (from this ALU operation), but ov1 was clear (from the previous ALU operation.) ov1 will be set to ov0|ov1, or 1. And now the if(!ov1)s1=s0; test will fail, whereas if we did the s1 test before the ov1 assignment, it would have transferred s0 into s1. ov1 will be set correctly either way, but the order of operations will affect the s1 output when s0!=s1.

It seems pretty clear (the manual basically says as much), and your truth table seems to confirm, we should do the if(!ov1) s1=s0; test first, but it's always good to clarify these things in documentation. Flag assignments are not usually dependent upon the results of other flag assignments in CPU emulators.


I said that it doesn't matter whether you use the new or old S1. It does matter whether you use the new or old OV1, which is why both truth tables specify "OV1in".

Quote:
I meant the CPU should honor the ASL bit to select between SA1 and SB1. Seems like an oversight in the design.


It seems that accumulator A is meant to be used as the primary accumulator, and accumulator B to hold either temporary values or the low 16 bits of a 32-bit calculation (notice that each accumulator uses the opposite one's carry flag as its incoming carry)


Top
 Profile  
 
PostPosted: Sun Sep 03, 2017 11:57 am 
Offline

Joined: Fri Feb 24, 2012 12:09 pm
Posts: 538
This is the flags description from fullsnes.htm, I think it's looking a bit different than your code.
Code:
  S0  Sign Flag     (set if result.bit15)
  Z   Zero Flag     (set if result=0000h)
  C   Carry Flag    (set if carry or borrow)
  OV0 Overflow Flag (set if result>+7FFFh or result<-8000h)
  S1  Direction of Last Overflow (if OV0 then S1=S0, else S1=unchanged)
  OV1 Number of Overflows (0=even, 1=odd) (inverted when OV0 gets set)

On a 80x86 processor it's mainly needing LAHF+JO opcodes to get the 80x86 flags (and some more difficult handling when the JO jump on overflow is taken):
Code:
 lahf    ;ah=flags (bit0=cy, bit6=zf, bit7=sf)
 jo   short @@overflow
 and  ah,($upd_flg_s0+$upd_flg_c+$upd_flg_z)/100h  ;isolate new flags (and clear OV0)
 mov  byte ptr [upd7725_reg_flg&acc&+1],ah         ;apply 8bit (keep LSB = OV1,S1)
 jmp  upd7725_do_alu_done
;---
@@overflow:
 mov  al,byte ptr [upd7725_reg_flg&acc&+0]             ;get old OV1
 and  ax,$upd_flg_s0+$upd_flg_c+$upd_flg_z+$upd_flg_o1 ;new S0,C,Z + old OV1
 or   ah,($upd_flg_o0)/100h                            ;set OV0=1
 xor  al,ah                                            ;set S1=S0, toggle OV1 (by XORing it with OV0=1)
 mov  word ptr [upd7725_reg_flg&acc&],ax               ;apply whole 16bit
 jmp  upd7725_do_alu_done


Top
 Profile  
 
PostPosted: Sun Sep 03, 2017 12:17 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> S1 Direction of Last Overflow (if OV0 then S1=S0, else S1=unchanged)

This is wrong. It's set based on OV1, not OV0.

> OV1 Number of Overflows (0=even, 1=odd) (inverted when OV0 gets set)

Also wrong, see AWJ's truth tables above.

If you based your notes off my initial implementation, then my apologies.
Those notes were based off the only uPD7725 manual I had at the time, which did not explain S1/OV1 nearly as well as the newly discovered uPD7720 documents do.

The code you want is:
Code:
if(!ov1) s1 = s0;
ov1 = ov0&ov1 ? s0==s1 : ov0|ov1;


Top
 Profile  
 
PostPosted: Sun Sep 03, 2017 12:51 pm 
Offline

Joined: Fri Feb 24, 2012 12:09 pm
Posts: 538
There are tons of uPD77C2xxx scans on datasheetarchive.com.
I have uPD77C20 and uPD77C25 datasheets on my harddisk... downloaded back in March 2011, going by the file timestamps.
My description/code is different than AWJ's table/code, I don't know which is closer to real hardware.

PS. I think my implemention might have opposite S1 values (ie. 0=positive vs 1=positive), this works as long as SGN opcode is processing S1 accordingly.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group