It is currently Mon Dec 11, 2017 6:38 am

All times are UTC - 7 hours



Forum rules


Related:



Post new topic Reply to topic  [ 30 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Schedulers (attn: AWJ)
PostPosted: Fri Jul 29, 2016 12:59 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
You mentioned that MAME uses a scheduler that's accurate to the attosecond, but for that you needed to use 128-bit numbers. May I ask why?

Code:
 1,000,000,000,000,000,000 => the number of attoseconds in one second
18,446,744,073,709,551,615 => 2^64-1


As long as we subtract the smallest value from all counters every ~9 seconds, there's no chance of an overflow. Doing it once per frame on an emulated system should be far more than enough.

If you're going to use 128-bit counters, then why not up your accuracy to Planck time? :P

...

For what it's worth, I am using this currently:

Code:
auto create(... long double frequency) {
  /*uint64*/ scalar = 1.0 / frequency * 1'000'000'000'000'000'000.0 + 0.5;  //round to attoseconds
}
auto step(uint clocks) {
  clock += clocks * scalar;
}
auto synchronize(Thread& thread) {
  if(clock > thread.clock) co_switch(thread.handle);
}


Seems to be working just fine for me. Here's my before and after results:

Code:
[FC]
Mega Man 2 = 169fps (stage select) -> 186fps
Zelda = 172fps (in-game) -> 193fps

[SFC]
Zelda 3 = 131fps (save select) -> 140fps
Kirby 3 = 99fps (stage select) -> 106fps

[GB]
Mega Man II = 228fps (stage select) -> 246fps
Zelda DX = 211fps (save select) -> 214fps

[GBA]
Riviera = 180fps (opening) -> 180fps
Minish Cap = 154fps (save select) -> 154fps

[WSC]
Riviera = 177fps (title screen) -> 198fps
Final Fantasy = 158fps (name select) -> 163fps


Not bad for a few hours of work. It's a really great idea to use attoseconds instead of relative frequency counters.


Last edited by byuu on Fri Jul 29, 2016 2:55 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 1:50 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
MAME never rebases the counters; it uses them to log the exact emulated time that events occur (e.g. inputs in a movie file) Also, the counters aren't simply 128-bit integers; they consist of a 64-bit attoseconds part and a 64-bit seconds part. Great for friendly logging, not so great for efficiency (attotime calculations are a significant source of overhead in MAME drivers with many CPUs and other devices that use the scheduler)

Choosing a suitable base unit of time for an arbitrary set of clock domains is indeed the hard part. The problem MAME has is that many common crystal values have a repeating decimal when you take their reciprocal. Even using attoseconds, you end up with component A that's supposed to run at exactly 3 times the speed of component B, but gains an extra cycle once every 20 minutes due to truncation. I believe there are some demos that fail on MAME for this exact reason.

If you want perfect accuracy you have to do something like this:

Calculate the greatest common divisor of all the clock domains in the emulated system, and divide all the clocks by it (this step is only needed to reduce the likelihood of overflow in the next step)

Calculate the scaled reciprocal of each clock. The scaled reciprocal is the product of all the clocks except that one (or alternatively, the product of all the clocks divided by that clock)

Again calculate the greatest common divisor of all the clock reciprocals, and divide them all by it.

The result is an exact-integer cycle length for each clock, in some arbitrary unit. If you care what that unit is for some reason, multiply the cycle length of any one of the devices by its frequency in Hz. The result is the number of "some arbitrary units" in a second.

(Also, if you want to make sure you've done all the math right, assert that every device's calculated cycle length times its frequency is the same number)

If you need to add a new clock domain at runtime... Sorry, you're fucked. Hope you don't need to support hot plugging devices with their own crystal in them.

ETA: I said "something like" for a reason. I literally came up with the above algorithm at 5 a.m. right before falling asleep, and I'm sure someone can improve on my method or point out some awful flaw in it...


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 3:08 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
Posted timing results to the first post before I saw you replied. They're pretty interesting.

> MAME never rebases the counters; it uses them to log the exact emulated time that events occur

Oh, I see. Well, someone out there is going to be pissed when they run MAME for 31.71 billion years and then suddenly the program deadlocks because not all of the counters wrapped. So much for MAME caring about accuracy /s

> Great for friendly logging, not so great for efficiency (attotime calculations are a significant source of overhead in MAME drivers with many CPUs and other devices that use the scheduler)

Yeah, I can totally see that. Especially given the above timing results on my end.

You could speed that up by using a __uint128 where available. And don't separate it to {top 64-bits = seconds; bottom 64-bits = attoseconds}; just divide by 1 quintillion when you need the number of seconds, and modulus by it when you need the fraction. For systems without 128-bit integers, simply add the clocks to the low 64-bit value, and if the new value is less than the old value, increment the upper 64-bit value by one. Or even better, do it with assembly instructions and use the processor carry; then add zero with carry to the upper 64-bits. Extracting time would be a bit more painful on systems without 128-bit types, but you could write a software divide for that. I'd imagine extracting timestamps is way less common than time passing during emulation.

> Even using attoseconds, you end up with component A that's supposed to run at exactly 3 times the speed of component B, but gains an extra cycle once every 20 minutes due to truncation. I believe there are some demos that fail on MAME for this exact reason.

Well ... the way I'm doing it is: if there's a single oscillator, then I'll use that. The GBA, WS/C, MD all share this. In this case, my step() function just adds the clock cycle count to the clock counter and that's it.

But if there is more than one oscillator, then I add clocks * reciprocal_attoseconds instead. In the case of the SNES, even in the worst case of an extra clock every 20 minutes ... nothing is going to break because of that. The real oscillators on real consoles vary by larger amounts than that due to aging, temperature changes, etc.

My old system did handle this case perfectly, but being perfect beyond what's even possible for real hardware really isn't all that critical, and it didn't scale to the Genesis scenario anyway, so it had to go.

> Calculate the greatest common divisor of all the clock domains in the emulated system

Completely impractical. The SNES can have 6+ oscillators thanks to controllers, cartridges and expansion port devices. And they're mostly repeating fractions thanks to color burst timings being psychotic values for both NTSC and PAL.

> If you need to add a new clock domain at runtime... Sorry, you're fucked. Hope you don't need to support hot plugging devices with their own crystal in them.

Yep, there are several cases where I have to adjust the frequency dynamically. And this is another area where this attosecond-based approach works better than my old "weighted scale" approach: there's no longer any need to do any adjustments to the clock to represent the clock skew difference anymore. Just change the scalar and you're done.

This comes up with: the SNES SGB's speed settings (with the Horii controller), the GBC toggling double-speed CPU mode, and hot-plugging controller and expansion port devices (the latter is, of course, psychotic to attempt.) I'm sure more cases will come up in the future, too.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 3:38 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
I thought the GBC only changed its clock divider. Can't you emulate it the same way as the SuperFX, by adding 1 clock per instruction cycle in double speed mode and 2 clocks in single speed mode?

Yeah, my suggested approach is complete overkill. You don't care about exact ratios between components on different clock domains, only ones that are on the same clock with different dividers.

Here's take 2, which is similar to yours but eschews floating point math (which is a whole kettle of worms...)

Code:
scaler = (HUGE_NUMBERull / frequency()) * divider();


Where frequency() and divider() are virtual functions that return integers (not data members, so as not to bloat every object layout with data that's only used once at startup).Chips that run off of the same xtal should return the same frequency
as each other; this will ensure that their scalers are exact integer ratios. Chips that change dividers on the fly (like the GBC CPU) should report a divider of 1 and multiply their cycles themselves, at runtime. On the other hand chips with a fixed divider can take advantage of precalculating it (integer multiplication is not that expensive on modern CPUs, but why do unneeded extra work every emulated cycle? And weren't you hoping to make your emulator more ARM-friendly in the future?)


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 6:49 am 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> Can't you emulate it the same way as the SuperFX, by adding 1 clock per instruction cycle in double speed mode and 2 clocks in single speed mode?

I probably can. But that'd have to go on the to-do list. I don't want to keep getting side-tracked from the Genesis core while it's so young.

> Here's take 2, which is similar to yours but eschews floating point math (which is a whole kettle of worms...)

scaler is a uint64 type. I just do the initial computation via long double. That math only happens once at power-up, so I'm not too worried about its performance. All my audio resampling / anti-aliasing is done in floating point, which is probably a whole lot worse. Especially at the obscene 2-3MHz sampling ratios of some of these systems.

> And weren't you hoping to make your emulator more ARM-friendly in the future?

I certainly was before I killed the balanced and performance profiles.

But amazingly, this change seems to have boosted framerates from 29fps (v100) to 45fps on my 1.4GHz Celeron box. I can't explain how this would be such a large speedup. Hopefully it lasts.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 8:35 am 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19322
Location: NE Indiana, USA (NTSC)
In this post, byuu wrote:
[The assumption of one crystal is] trickier though with the SNES where there are multiple independent clock rates.

You can come close by running the audio unit at 1/7 of the master clock.

APU ceramic resonator: 24576000 Hz
Divided by 8 to form audio master clock: 3072000 Hz

Cost-reduced APU shares CPU/PPU crystal: 945/44 MHz = 21477272 Hz
Divided by 7 to form audio master clock: 3068181 Hz

Would being 1 part in 804.6 off improve frame rates? And would it be within tolerance of the APU's ceramic resonator? If not, you could instead divide by 945/44/3.072 = 39375/5632 = 6.9913, and everything would still be within a clock domain.


Based on the consensus resulting from a discussion two months ago about overuse of moderator powers, I have chosen to move this post from a previous derailed topic through cut and paste, as if I were not a moderator, rather than using my moderator powers. I apologize for this bump.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 9:36 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
scaler is a uint64 type. I just do the initial computation via long double. That math only happens once at power-up, so I'm not too worried about its performance. All my audio resampling / anti-aliasing is done in floating point, which is probably a whole lot worse. Especially at the obscene 2-3MHz sampling ratios of some of these systems.


Again, it's precision I'm worried about, not performance. After all the work you've done on accuracy you don't want to start letting the S-PPU gain a cycle on the S-CPU every 20 minutes.

Here's another problem I've just spotted:

Code:
auto synchronize(Thread& thread) {
  if(clock > thread.clock) co_switch(thread.handle);
}


Before, you had one side testing for clock < 0 and the other side testing for clock >= 0: exact opposite conditions. Now you've got both sides testing for inequality. This introduces a wobble between two devices running at the same frequency: sometimes device A will run 1 cycle ahead of device B and sometimes device B will run 1 cycle ahead of device A, depending on which is currently executing when their clocks become equal.

Unfortunately I suspect that this wobble is responsible for a significant amount of the performance gain you're seeing. Devices that should be running in perfect lockstep (e.g. the SMP and the DSP) are now leapfrogging each other, each running 2 cycles at a time.

The solution is to initialize each device's clock with a unique bias, so that two devices' clocks never become exactly equal. If you initialize the SMP's clock to 0 and the DSP's to 1, the SMP will always lead the DSP instead of leapfrogging, or vice-versa (I'd look at which way the relationships went before and choose biases that preserve the same priority)


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 12:28 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> Again, it's precision I'm worried about, not performance. After all the work you've done on accuracy you don't want to start letting the S-PPU gain a cycle on the S-CPU every 20 minutes.

Will that happen when the two chips are running at the same frequency? They would both exhibit rounding errors at the same time.

But, I don't see a solution, short of moving to Planck time units or something.

> Unfortunately I suspect that this wobble is responsible for a significant amount of the performance gain you're seeing.

... damn. It was such a nice performance boost, too.

What about "if(clock >= thread.clock) co_switch(thread.handle);" instead? This test will never be called until a thread adds to the clock value, obviously, so there's not going to be some pathological non-stop switching event going on. It's starting to get sloppy, introducing virtual offset values into each clock.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 12:53 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
> Again, it's precision I'm worried about, not performance. After all the work you've done on accuracy you don't want to start letting the S-PPU gain a cycle on the S-CPU every 20 minutes.

Will that happen when the two chips are running at the same frequency? They would both exhibit rounding errors at the same time.


No, it's not a problem if both of the chips are created with exactly the same frequency. Just make sure that clock dividers are applied via integer multiplication (either at creation time or at runtime), not via division nor any floating-point operation. Example:

Code:
auto create(... long double frequency, uint divider) {
  /*uint64*/ scalar = 1.0 / frequency * 1'000'000'000'000'000'000.0 + 0.5;  //round to attoseconds
  scalar *= divider;
}


Don't divide frequency by divider, and don't multiply scalar by divider until after scalar has been cast to an integer.

For the MD, you'd create all the chips with the master oscillator as frequency and their respective dividers as divider (only for chips where it doesn't change at runtime, i.e. not the VDP).

Quote:
> Unfortunately I suspect that this wobble is responsible for a significant amount of the performance gain you're seeing.

... damn. It was such a nice performance boost, too.

What about "if(clock >= thread.clock) co_switch(thread.handle);" instead? This test will never be called until a thread adds to the clock value, obviously, so there's not going to be some pathological non-stop switching event going on. It's starting to get sloppy, introducing virtual offset values into each clock.


That's going to have exactly the same problem, except that the non-currently-executing device will win when the clocks become tied. You're still going to have the two devices leapfrogging each other. The bias values aren't arbitrary or "sloppy"; they indicate exactly what order you want lockstep devices to execute in (and in a much more clear way than the difference between >= and > does) The device that you want to execute first gets a bias of 0, the one you want to execute second gets a bias of 1, etc.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 1:07 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> That's going to have exactly the same problem, except that the non-currently-executing device will win when the clocks become tied.

But how is that ever going to be a problem in practice? The chip that runs first will be the one that gets entered first after a reset.

SMP = 0; DSP = 0
CPU runs first; synchronizes to SMP; so SMP is always first
SMP step 1; sync if 1 >= 0; switch to DSP
DSP step 1; sync if 1 >= 1; switch to SMP
SMP step 1; sync if 2 >= 1; switch to DSP
DSP step 1; sync if 2 >= 2; switch to SMP
The only way things would get nasty is if we did the sync if test before the step (would cause wobbling), or possibly a sync call without any step (infinite loop if both sides did it.) But I never do that anywhere in my code. It's always "step(n); sync();" everywhere.

> The bias values aren't arbitrary or "sloppy"

Well, I have one decent idea for them, at least. In Thread::create, I have this:
Code:
handle = co_create(entrypoint, stacksize);
scheduler.append(*this);


So inside scheduler.append, I can do:
Code:
thread.clock += scheduler.threads.size();

So that CPU=1, SMP=2, PPU=3, DSP=4, <coprocessors=5-30>, <peripherals=31-33 ... will get a bit screwy with hot-swapping but will be bounded by threads.size() so they won't go off the rails.>

I definitely don't want to build a giant enum that lists the "priority" of every possible Thread source. There's so damned many in the SNES.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 1:26 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
> That's going to have exactly the same problem, except that the non-currently-executing device will win when the clocks become tied.

But how is that ever going to be a problem in practice? The chip that runs first will be the one that gets entered first after a reset.

SMP = 0; DSP = 0
CPU runs first; synchronizes to SMP; so SMP is always first
SMP step 1; sync if 1 >= 0; switch to DSP
DSP step 1; sync if 1 >= 1; switch to SMP
SMP step 1; sync if 2 >= 1; switch to DSP
DSP step 1; sync if 2 >= 2; switch to SMP
The only way things would get nasty is if we did the sync if test before the step (would cause wobbling), or possibly a sync call without any step (infinite loop if both sides did it.) But I never do that anywhere in my code. It's always "step(n); sync();" everywhere.


That's fine for the SMP and DSP, but what about devices that have different dividers and/or that don't always sync after every cycle, like the CPU and PPU? Every time one leapfrogs the other due to adding more cycles than the other's divider or due to not syncing for a bit, their priority will flip until the next time they happen to leapfrog. With the CPU/PPU, you'll have the h/v counters sporadically becoming off by one for a while.

Quote:
> The bias values aren't arbitrary or "sloppy"

Well, I have one decent idea for them, at least. In Thread::create, I have this:
Code:
handle = co_create(entrypoint, stacksize);
scheduler.append(*this);


So inside scheduler.append, I can do:
Code:
thread.clock += scheduler.threads.size();

So that CPU=1, SMP=2, PPU=3, DSP=4, <coprocessors=5-30>, <peripherals=31-33 ... will get a bit screwy with hot-swapping but will be bounded by threads.size() so they won't go off the rails.>


How are you ending up with 25 coprocessors on the SNES? Are you creating a thread for every possible coprocessor, and not just the one in the currently-loaded cartridge?

You don't actually need a unique bias for every chip on the SNES, because you never have coprocessors syncing to the PPU etc. You can just use 0 for the CPU and SMP and 1 for everything else. The only chips that need biases to prevent leapfrogging are those that both (1) run off of the same clock (possibly with dividers) and (2) directly sync to each other.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 1:42 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
> but what about devices that have different dividers and/or that don't always sync after every cycle, like the CPU and PPU?

The CPU and PPU have the same exact scalar value 1/21MHz*Attosecond. I don't see why it matters how many clocks pass before syncing to another, as long as the value is greater than zero.

I'm likely going to need an actual demonstrable simulation showing a problem. I'll toy around with a tiny mock-up on my end to look for problems.

> How are you ending up with 25 coprocessors on the SNES?

Theoretical worst case with the manifest from hell :P

> You don't actually need a unique bias for every chip on the SNES

The goal, if I decide to use a bias, is to do it in one place rather than in ten places or more.

Does MAME actually have this bias system in place? Being honest, I wish I had known about this before I started on this redesign. But, I'd still have to solve the MD sync issue, so ... maybe I'd still have chosen this anyway.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 3:11 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
byuu wrote:
> but what about devices that have different dividers and/or that don't always sync after every cycle, like the CPU and PPU?

The CPU and PPU have the same exact scalar value 1/21MHz*Attosecond. I don't see why it matters how many clocks pass before syncing to another, as long as the value is greater than zero.

I'm likely going to need an actual demonstrable simulation showing a problem. I'll toy around with a tiny mock-up on my end to look for problems.


Okay, imagine you have two devices that normally both step 1 clock at a time but sometimes step 2 clocks at a time (because of memory wait states or whatever):

Code:
dev1 step 1 (total 1); sync if 1 >= 0; switch to dev2
dev2 step 1 (total 1); sync if 1 >= 1; switch to dev1
dev1 step 1 (total 2); sync if 2 >= 1; switch to dev2
dev2 step 1 (total 2); sync if 2 >= 2; switch to dev1
; so far everything is okay... let's see what happens when dev2 steps 2 clocks at a time
dev1 step 1 (total 3); sync if 3 >= 2; switch to dev2
dev2 step 2 (total 4); sync if 4 >= 3; switch to dev1
dev1 step 1 (total 4); sync if 4 >= 4; switch to dev2
dev2 step 1 (total 5); sync if 5 >= 4; switch to dev1
dev1 step 1 (total 5); sync if 5 >= 5; switch to dev2
; this isn't good... dev2 is now running ahead of dev1 instead of behind it
; dev1 should've gotten an extra cycle when dev2 took 2 at a time, but didn't
dev2 step 2 (total 7); sync if 7 >= 5; switch to dev1
dev1 step 1 (total 6); sync if 6 >= 7; no switch
dev1 step 1 (total 7); sync if 7 >= 7; switch to dev2
dev2 step 1 (total 8); sync if 8 >= 7; switch to dev1
dev1 step 1 (total 8); sync if 8 >= 8; switch to dev2
; well, at least they can't get more than 1 cycle out of step. what happens when dev1 takes 2 cycles?
dev2 step 1 (total 9); sync if 9 >= 8; switch to dev1
dev1 step 2 (total 10); sync if 10 >= 9; switch to dev2
dev2 step 1 (total 10); sync if 10 >= 10; switch to dev1
dev1 step 1 (total 11); sync if 11 >= 10; switch to dev2
dev2 step 1 (total 11); sync if 11 >= 11; switch to dev1
; and now dev1 is running ahead of dev2 again


Each time the trailing device gets ahead of the leading device rather than merely catching up to it, their order of execution swaps. You might object that the SNES doesn't have any pair of devices with those exact clock characteristics, but the point is that the order of execution is fundamentally unstable, and your core shared emulation functionality shouldn't rely on accidental characteristics of the emulated hardware to avoid the consequences of its bugs.

Quote:
Does MAME actually have this bias system in place? Being honest, I wish I had known about this before I started on this redesign. But, I'd still have to solve the MD sync issue, so ... maybe I'd still have chosen this anyway.


MAME uses a very different type of scheduler; the only thing it's got in common with your new scheduler is how the time bases are handled. The MAME scheduler runs CPUs (and other "executable" devices) in a round-robin sequence in the order they were declared in. A CPU in MAME can't yield to another specific CPU; it can only give up its timeslice, causing the scheduler to pass it to the next CPU in the sequence. MAME can't handle things like context switching in the middle of a read handler to let the device being read from catch up. To achieve bsnes-level accuracy in MAME, you have to run at "perfect interleave", meaning the round robin timeslices are equal to the frequency of the fastest chip in the emulated machine. The 8-bit Commodore drivers do this, and the C64 and C128 barely run at full speed on a top-end PC (and for other reasons those drivers still aren't as accurate as dedicated emulators)

Basically, MAME is amazing for being able to emulate everything from 1970s 6502 boards to Pentium-class PCs using a single infrastructure, but it's not something you should try to imitate in all its particulars in a project of much more limited scope like higan.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 4:11 pm 
Offline

Joined: Mon Mar 27, 2006 5:23 pm
Posts: 1339
Testing reveals some nasty results. You'll need the latest nall/libco from Gitlab to use this:

http://hastebin.com/raw/ukoyikabez

But this is a test framework for the old and new synchronization methods.

When I run the CPU and SMP at their specification rates, and step the CPU by 4, the SMP by 5; the attosecond-based method desyncs at 15,151 context switches. That number is horrifyingly low, and completely unacceptable.

I wasn't gaining any precision beyond femtoseconds, so I moved from long double reciprocal division to pre-scaled integer division and scaling with 128-bit types. Even yoctoseconds isn't enough, I had to go 1000x more precise than that. Since I had the headroom, I went ahead and increased this to 100000x the precision of a yoctosecond. Can't fit planck time * 32-bit frequency into a 128-bit integer, so ... boo.

Still, this is really shitty. I trusted the "one clock desync every 20 minutes" thing, but it's not even close to that in practice with attoseconds.


Top
 Profile  
 
PostPosted: Fri Jul 29, 2016 4:56 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
The bias thing only helps when the two clocks are dividers of the same base frequency (which is also the only time that level of perfection should matter). When you have two completely unrelated frequencies, there's nothing you can do to stop the streams from crossing eventually. Except for keeping a separate "priority" field and testing "if((clock > other.clock) || (clock == other.clock && priority < other.priority))

Does one of the threads actually permanently gain a cycle on the other, or do they just execute one cycle out of order and then fall back into synchronization? It's one thing if one of the chips is actually running 0.01% too fast (although I don't see how that large an error is remotely possible) but I don't think it's a big deal if one chip executes a cycle early or late every now and then compared to the previous method (relative to a chip on a different oscillator, I mean).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 30 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group