It is currently Sat Aug 19, 2017 12:15 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Sun Aug 21, 2016 11:45 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 413
I figure you might be interested in this, given what you're currently working on:

http://www.devic.us/hacks/zilog-z80-und ... -behavior/

ETA: There's a slightly confusing thing about those tables: the timing for conditional instructions doesn't indicate whether the condition was met or not. Hence JR Z is shown as taking fewer cycles than the other JR cc instructions, but that's because it's showing an untaken branch whereas the others are showing taken branches. It gets especially confusing with RET because unconditional RET is 1 t-state faster than (taken) RET cc.


Top
 Profile  
 
PostPosted: Mon Aug 22, 2016 4:24 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 413
If you look at the Zilog manual and the bus traces, you can see that Z80 cycles break down into just a few categories:

Opcode fetch cycles aka M1 cycles. 2 clocks to read memory + 2 clocks to decode the opcode while refreshing DRAM.

Read/write cycles, 3 clocks. Operand fetches (including the displacements of IX/IY indexed instructions) are normal read cycles, but the second byte of a prefixed instruction is an opcode fetch.

Port in/out cycles, 4 clocks.

Internal operations, which can come after any other cycle type, up to 5 in a row depending on the operation. Technically 1- or 2-clock internal operations are part of the memory cycle they come after, and 5-clock internal operations comprise an entire machine cycle in their own right, but I don't think the difference is visible externally or matters for emulation.

Here's something very weird: Normally, the second byte of a $CB or $ED prefixed instruction is an opcode fetch (i.e. it takes 4 clocks, asserts M1, and refreshes DRAM) However, in an instruction with both a $DD/$FD prefix and a $CB prefix (i.e. an indexed bitwise instruction) the instruction encoding is $DD/$FD, $CB, displacement, subop, and the subop fetch turns into a normal read with 2 internal operation clocks after it. I assume the instructions are encoded that way so that the effective address calculation can be overlapped with the sub-opcode fetch, and the mutation of the sub-opcode fetch into a non-M1 cycle is a side effect of the out-of-order encoding. One case where this quirk is important is arcade machines with encrypted opcodes.

Since opcode fetches require the memory to respond faster than normal read/write cycles do, some machines (like the MSX) have an externally-inserted wait state only on M1 cycles. I don't think that applies to any of the Sega consoles though.

Examples:

PUSH rr:
Fetch opcode
1 internal operation (decrementing SP)
Write high register byte
Write low register byte
Total: 11 clocks

POP rr:
Fetch opcode
Read low register byte
Read high register byte
Total: 10 clocks

ADD A,(HL):
Fetch opcode
Read memory
Total: 7 clocks

ADD A,(IX+d):
Fetch $DD prefix
Fetch opcode
Read displacement
5 internal operations (calculating IX+d)
Read memory
Total: 19 clocks (remember when I said Z80 indexed instructions were slow?)

RES/SET/BIT/RLC/etc (HL):
Fetch $CB prefix
Fetch sub-opcode
Read memory
1 internal operation
Write memory (except for BIT)
Total: 15 clocks (12 for BIT)

RES/SET/BIT/RLC/etc (IX+d):
Fetch $DD prefix
Fetch $CB prefix
Read displacement
Read (not fetch!) sub-opcode
2 internal operations (calculating IX+d; partly overlapped with sub-opcode read)
Read memory
1 internal operation
Write memory (except for BIT)
Total: 23 clocks (20 for BIT)


Top
 Profile  
 
PostPosted: Sat Sep 03, 2016 2:35 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1329
Sorry, didn't notice this sooner. Don't usually frequent this subforum.
(That and I lost a week learning about X509 certificates. Those things are horrifically complex.)

Greatly appreciate the info!! Glad to have gotten it now before I went too far into writing the core.

I scrapped what I had and started over to support the T-cycles properly (including the extra clock for opcode fetches), as well as to handle the way you can stack DD/FD opcode prefix flags; and to roll them into the regular tables, so that there's only three now (main, CB, ED.)

This CPU is certainly a lot less awful to emulate than the 68K, but it's still not very fun >_>

Just out of curiosity, does anyone know the bus hold delays for the various read/write/in/out operations on the Z80?

Eg is it:
* wait 4 clocks
* read from in
* return in value

Or more like:
* wait 2 clocks
* read from in
* wait 2 clocks
* retur in value

If we have no idea, then I'll just guess something for the time being.


Top
 Profile  
 
PostPosted: Tue Sep 06, 2016 9:17 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 413
byuu wrote:
Sorry, didn't notice this sooner. Don't usually frequent this subforum.
(That and I lost a week learning about X509 certificates. Those things are horrifically complex.)

Greatly appreciate the info!! Glad to have gotten it now before I went too far into writing the core.

I scrapped what I had and started over to support the T-cycles properly (including the extra clock for opcode fetches), as well as to handle the way you can stack DD/FD opcode prefix flags; and to roll them into the regular tables, so that there's only three now (main, CB, ED.)

This CPU is certainly a lot less awful to emulate than the 68K, but it's still not very fun >_>

Just out of curiosity, does anyone know the bus hold delays for the various read/write/in/out operations on the Z80?

Eg is it:
* wait 4 clocks
* read from in
* return in value

Or more like:
* wait 2 clocks
* read from in
* wait 2 clocks
* retur in value

If we have no idea, then I'll just guess something for the time being.


You know that memory accesses aren't instantaneous but consist of a sequence of operations, right? The timing for every signal for every type of cycle (fetch, read, write, in, out) is shown starting on page 13 of the Zilog manual (page 33 of the PDF).

The important takeaway is that opcode fetches are compressed into just 2 clocks; the second 2 clocks of an M1 cycle are DRAM refresh, in which the Z80 puts the contents of the R register on the address bus and then increments the lower 7 bits of R (you probably don't have to emulate the refresh itself, but you do need to emulate the R register because software can read it; it's sometimes used by games as a PRNG seed)

I think what you really want to know is "if the Z80 does a read/write that triggers an interrupt from some device, does the device respond fast enough to interrupt the Z80 before it starts the next instruction?" And that depends on the hardware responding to the write (e.g. the VDP), so you'll have to consult Sega-specific documentation.

byuu, on twitter wrote:
Why does [inc (hl)] take 11 cycles?


Memory RMW operations on the Z80 have one internal operation between the read and the write for the same reason they do on the 6502: it takes time to actually do the inc/dec/shift/whatever. Like I said, the Z80 manual shows most IOs as part of the preceding memory cycle (which they are from the perspective of the chip's microcode, I guess). The exact breakdown of that instruction for bus timing purposes is:

fetch/decode opcode (2+2 clocks)
read memory (3 clocks)
internal operation (1 clock)
write memory (3 clocks)

Every place the manual shows a memory read taking more than 3 clocks, or an opcode fetch/decode taking more than 4 clocks, it should be interpreted as "standard read or fetch cycle with (n - 3) or (n - 4) internal operations after".


Top
 Profile  
 
PostPosted: Tue Sep 06, 2016 12:48 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1329
> You know that memory accesses aren't instantaneous but consist of a sequence of operations, right?

Considering I emulate that on the SNES and am asking about it now, I'd go with yes ;)

The problem is that I can't really determine how best to simulate these things even on platforms where I can run my own code on.

It's not really practical to emulate the entire bus propagation delay, especially when we don't even know when other things are supposed to respond.

> And that depends on the hardware responding to the write (e.g. the VDP), so you'll have to consult Sega-specific documentation.

They never document things at that fine a granularity :/

I only got where I did on the SNES because there were two ways to latch H/V counters. One for read, one for write.

> the Z80 manual shows most IOs as part of the preceding memory cycle

Ah, cool. Then my guess was correct. Thank you for confirming!


Top
 Profile  
 
PostPosted: Tue Sep 06, 2016 3:40 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 413
Quote:
there's no reason for the bitwise and logical operators to be different precedent levels


No, just no. There's a very good reason why AND has higher precedence than OR or XOR in literally every programming language in the world. AND is the Boolean analogue of multiplication and OR is the Boolean analogue of addition (XOR is addition modulus 2). Fighting against the fundamentals of algebra is... not a good start for your programming language.


Top
 Profile  
 
PostPosted: Tue Sep 06, 2016 7:44 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1329
It's impressive that I've been programming for 20 years and have never heard AND referred to as boolean multiplication, and OR as boolean addition. I've further never come across code that relied on the precedence of AND to be higher than OR. Why does XOR get precedence between AND and OR, then? And just out of curiosity, what about the rest of the operations? NOT, NAND, NOR, XNOR, etc? Is one of them divide, subtract, regular modulus, etc?

Ah well, at any rate, thanks for helping me dodge a bullet prior to any kind of formalization. A shame you weren't around when the PHP devs started out to explain to them why ternary should have right-to-left associativity, heh.

But ... https://en.wikipedia.org/wiki/Logical_c ... precedence

Why isn't XOR (exclusive disjunction) listed in that table? Is there a more thorough table that includes it?

> not a good start for your programming language.

I didn't expect to knock it out of the park with only a month's worth of practice writing programming languages. Still, I hope to do the best I can, and take input from others that know more than I do here.

I'm pretty lost right now with a billion possibilities and trying to find the best compromises for my values.


Top
 Profile  
 
PostPosted: Tue Sep 06, 2016 9:00 pm 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2924
Location: Tampere, Finland
AWJ wrote:
Quote:
there's no reason for the bitwise and logical operators to be different precedent levels

No, just no. There's a very good reason why AND has higher precedence than OR or XOR in literally every programming language in the world. AND is the Boolean analogue of multiplication and OR is the Boolean analogue of addition (XOR is addition modulus 2). Fighting against the fundamentals of algebra is... not a good start for your programming language.

Ada (and VHDL) take a different approach: and/or/xor have equal precedence, and it's an error to mix them without disambiguating. For example, true and false or true is an error, true and (false or true) and (true and false) or true are OK.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Wed Sep 07, 2016 4:14 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 413
byuu wrote:
It's impressive that I've been programming for 20 years and have never heard AND referred to as boolean multiplication, and OR as boolean addition. I've further never come across code that relied on the precedence of AND to be higher than OR. Why does XOR get precedence between AND and OR, then?


One of the important ways conjunction and disjunction are analogous to multiplication and addition is that the same distributive law applies: A & (B | C) equals A & B | A & C, just like A(B + C) equals AB + AC. Older types of programmable logic hardware such as PALs consisted of planes of AND gates linked by OR gates, and therefore implemented logic expressed in disjunctive normal form (example). Reverse engineering PAL dumps is one-half figuring out what the inputs and outputs mean and one-half algebraic factoring.

XOR isn't a fundamental operation in Boolean algebra because it can be expressed in terms of AND and OR. I'm actually not sure why XOR has higher precedence than OR in C-derived languages. It doesn't in Ruby, but it does in Python. Both of those languages, incidentally, fix the C brain damage of bitwise operators having lower precedence than comparison operators.

A practical programming benefit to the algebraic AND/OR precedence rules is that bit-mixing operations like "a & amask | b & bmask | c & cmask" don't need parentheses.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group