It is currently Thu Oct 17, 2019 3:46 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 51 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Thu Aug 08, 2019 4:11 am 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4210
Location: A world gone mad
What do I think of FPGAs? I think they're neat
Do you think they are important in any meaningful way? Yes
Do you think they can "preserve" history as some people or magazines put it? In many/most regards, yes
Are they important or just optional/non-essential tools for homebrew developers? Both (but skill set and skill level matters tremendously for the latter)
Is this going to be a temporary trend that will fade away? Probably not, that is, until hardware folks create some alternate technology (e.g. CPLD) that provides similar functionality


Top
 Profile  
 
PostPosted: Thu Aug 08, 2019 8:28 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 764
gauauu wrote:
Oziphantom wrote:
most of the arguments come down to cargo cult garbage.

Oh but that just code running on a CPU, where as this is almost like real hardware its a pure truer source of emulation... if it runs on a computer or an FPGA or is physical bits of silicon arranged on a die, if the output is the same it doesn't matter, none is better or more pure than the rest.


I've beat Mike Tyson's Punch Out on real hardware with a CRT. Never managed on an emulator ever. Maybe that's just anecdotal, but it convinced me that in some cases, there IS a difference in emulation. (and I mostly DO play on emulators these days, so I'm not saying that as a hardware purist)

An I could have an FPGA that has 20 frames latency, and an emulator that has none. Being FPGA or Emulation makes no difference its the quality of the emulation rather than how it is built. Run a Raspberry Pi in a bare metal emulator and use the composite output, there is a custom version of VICE that does to get true 50hz displays. It's more accurate than the C64 ASIC from the DTV ;)

93143 wrote:
Like I said, my understanding is pretty much Wikipedia-level. I've seen what an FPGA can do, and I've read descriptions of what they are, but I haven't engaged with the concept at a technical level.
https://www.youtube.com/watch?v=gUsHwi4M4xE is a good introduction to them.

there is software you use when developing FPGA/CPLD programs that emulates the logic on the PC, so run the FPGA implementation in software on your pc, where are your gods now? The problem is it runs in a pure theoretical "logic" rather than emulating exactly how the chip does it with the chips latency issues, so you get the case works fine in the "simulator", fails on the actual chip :?

FPGA doesn't lets you emulate the transistor level, they have broad blocks that get combined in odd ways to achieve this or that. For example a 6502 holds its register values, stack, pc and flags "on the line", if you stop the CPU or slow it down to much, then the 6502 looses the PC and register values, that is why it has a min hz spec. To emulate this you need to have traces inside the chip that are the exact same length and width.. this can cause compatibility issues between the NMOS and HMOS versions of the 6502 as they have a different process and hence different size of the "lines" and don't last the same amount of time.


Top
 Profile  
 
PostPosted: Thu Aug 08, 2019 10:41 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21635
Location: NE Indiana, USA (NTSC)
Oziphantom wrote:
An I could have an FPGA that has 20 frames latency, and an emulator that has none.

Let me try to express in a more rigorous way what I believe was intended:
The obvious way to architect an emulator produces more latency than the obvious way to architect an FPGA. This goes double when considering the latency to and from authentic cartridges.

Oziphantom wrote:
if you stop the CPU or slow it down to much, then the 6502 looses the PC and register values, that is why it has a min hz spec.

But it's still straightforward for an FPGA to simulate "6502 including stable unofficial opcodes, run in spec" or "6502 without decimal mode including stable unofficial opcodes, run in spec" or the like.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Thu Aug 08, 2019 1:29 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
rainwarrior wrote:
Well, there's another missing option here which is CPLD, which is kind of like a cheaper one-time-programmable FPGA.


There are three general approaches that can be used to make any form of programmable logic devices programmable:

1. Use one-time-programmable fuses or antifuses which are selectively destroyed in the process of programming (an antifuse is an insulator which can be induced to permanently fail shorted).

2. Use an addressable storage mechanism near each switching element to enable or disable it.

3. Use a latch near each switching element to enable or disable it.

Approach #1 used to be common for FPGAs even after it was largely abandoned for logic-array devices like PLDs; I wouldn't be surprised if they're still used for some specialized applications. Approach #2 is common in CPLDs (which have very regular configuration layouts), and approach #3 is more common in FPGAs. It's generally cheaper to have the configuration for an FPGA stored in an external chip than have to have the entire chip include all the layers necessary for non-volatile storage, but CPLDs typically include non-volatile storage elements nestled among the switching circuitry.


Top
 Profile  
 
PostPosted: Thu Aug 08, 2019 3:57 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 1090
Oziphantom wrote:
An I could have an FPGA that has 20 frames latency

Seems to me you'd pretty much have to do that deliberately, unless you were incredibly bad at FPGAs.

Quote:
and an emulator that has none.

Good luck emulating a fully accurate Super Accelerator System with sub-pixel latency (ie: "none"), or even a few scanlines of latency, on any modern system. A Raspberry Pi isn't powerful enough, and a full-scale PC isn't real-time enough (I'm not sure it's powerful enough either). Just using a framebuffer adds a whole frame of latency by itself. You'd basically have to completely take over a powerful PC at the hardware level, bypassing the OS and drivers, and probably write parts of the emulator in shader language, to get anywhere near what an FPGA could do if properly programmed.

Quote:
so run the FPGA implementation in software on your pc, where are your gods now?

I get the feeling you might be fighting a straw man.


Top
 Profile  
 
PostPosted: Thu Aug 08, 2019 5:28 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7598
Location: Canada
93143 wrote:
Good luck emulating a fully accurate Super Accelerator System with sub-pixel latency (ie: "none"), or even a few scanlines of latency, on any modern system. A Raspberry Pi isn't powerful enough, and a full-scale PC isn't real-time enough (I'm not sure it's powerful enough either). Just using a framebuffer adds a whole frame of latency by itself. You'd basically have to completely take over a powerful PC at the hardware level, bypassing the OS and drivers, and probably write parts of the emulator in shader language, to get anywhere near what an FPGA could do if properly programmed.

This is kind of a bizarre statement. What is "sub-pixel latency" supposed to mean?

If you want 2 simulated chips to correlate with cycle-accurate timing, that's entirely doable on PCs and RPis with emulators. This is not a performance problem, and I don't know why you think it must be. Many emulators are already doing this kind of thing just fine.

Otherwise I have no idea what the relevance is, like you want the video output (i.e. light emitted from your screen) to change at some level smaller than a pixel? What are you even talking about?

Even outputting video scanline by scanline has very little relevance at the FPGA or PC emulator level, that's a video device problem, kind of a different domain. Sure, an FPGA could be used to generate a composite video signal, but so could a suitable generator attached to a PC-- but it wouldn't make any difference unless you were connected to a CRT. Otherwise your monitor will process and display the video signal with its own process that has very little to do with that. Incidentally, you could probably do scanline-by-scanline output on many PCs' built-in video hardware in VGA mode while connected to a CRT, if you really wanted to go down this road. (Shader language would not help with this in any way, IMO.)

But even if you did this, this concept of zero latency vs 1 frame of latency is almost meaningless. The main place it matters is if you want to use an original light gun device that needs to sense light in a very immediate way, much faster than human perception. Like I said in my summary though, emulators don't work with original hardware anyway, all aspects of that have to be simulated, and there are plenty of emulators that simulate a light gun in a very accurate way. (There are even ways to go the other way and use modern pointing devices to connect with the original light gun hardware.)

5 frames of lag on video output is something to care about. <=1 frame of lag? Hardly significant. Scanline by scanline only matters if you're trying to interface with some original hardware that can see light changes that fast (not relevant when you're simulating the hardware). Pixel by pixel, I don't think many things are meaningfully sensitive to this. Sub-pixel? No, that's not a thing.


So the light gun thing is an important point for FPGA clones. That was in my list of features in my earlier post: if you want to interoperate with original hardware (carts, peripherals, etc.) then these are capable of that. If you want to replace a system but otherwise keep all the same hardware connected, this is the solution.

If you're simulating the hardware (ROM file instead of cartridge, USB joystick instead of original gamepad, mouse instead of lightgun, window on a desktop instead of a dedicated television, etc.), the simulation can certainly be as accurate as any FPGA clone could, and plenty of emulators do accomplish this.


Latency on a scale relevant to human interaction is definitely a persistent problem on PCs, but not unsolvable. It's generally down to operating systems, video hardware interfaces, monitors, and the parts of the software that interact with those. Much of what affects that can't be solved at the level of the emulator's software (e.g. video drivers, processing in the monitor itself, operating system settings, USB drivers, etc.) and unfortunately needs work from the user to address. (Fullscreen mode instead of windowed can often help. A 120Hz monitor may address the issue too. This is a whole system configuration topic of its own though.)

CPU power also doesn't really help here. Very little of the latency problem has to do with CPU speed, generally it's all upstream/downstream from the emulator itself.

...but all of that is unrelated to accuracy of emulation. Latency is its own separate issue. If you don't have a CRT and original peripherals to hook up to your FPGA system then it's going to have the same monitor latency issue anyway, at the very least.


The FPGA clones I've seen are good stuff, but you shouldn't conflate output (or input) latency with emulation timing accuracy. You can have one and not the other, and that goes both ways. There are low latency emulator setups, but FPGA systems will have that by default. Cycle accuracy, on the other hand, good emulators already have. FPGAs have no monopoly on that.

The main advantage of emulators is that they're free and run on a computer you already have, which is something FPGA clones can't compete with. The converse is also true, your PC is never going to run an NES cartridge, it needs ROM dumps. That's the main tradeoff. Accuracy is not an issue, a good emulator is just as good as an FPGA for this.

Latency is an issue, and an FPGA clone might be worth its cost to just not have to try and figure out your PC latency problems. It's solvable, though. Many people do manage to get good low-latency setups using emulators.


Top
 Profile  
 
PostPosted: Thu Aug 08, 2019 6:32 pm 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 4222
Emulators are able to use tricks involving savestates (RunAhead) to skip the game's internal lag frames, and show frames from the future, and thus reduce input lag. When combined with a lag-free CRT, you even beat the original hardware. See this post on twitter for a slow-motion video: https://twitter.com/TylerLoch/status/980278954786467842 (youtube link: https://www.youtube.com/watch?v=_qys9sdzJKI )

==Internal Lag==
Super Mario Bros 1 is known to have one frame of internal lag. Here's the rough frame timing of SMB1, assuming Mario is at the bottom of the screen.

Timing of SMB1:

- Vblank -
Copy to OAM
Read Joypad (line 11 within vblank)
- Frame Start -
Run game logic reacting to input (around line 32 within frame)
Generate sprite data (line 76 within frame)
Display Raster reaches location of Mario's Sprite (Y = 177)
- Vblank -
Copy to OAM
Read Joypad
...

Now you want to jump, you hit A.
The minimum possible lag for this game happens when you press the button right before Line 11 on vblank time. Any later, and your input won't be seen until the next frame.
While it is rendering the frame, it is reacting to your joypad input, and generating the sprites that it will display on the next frame.
Eventually the screen raster reaches reaches the point where Mario is, and Mario is drawn there.
Next frame:
Now it's actually displaying the graphics that resulted from the game's state. The screen raster reaches Y=177 and Mario's new state appears on the screen.

So it took about 1.7 frames (best case) for Mario to appear on the TV, and 2.7 frames (worst case) if you missed the deadline for sampling the joypad.

==Emulator on PC==
You can do tricks with savestates (RunAhead) to outright remove such input lag.

Example of RunAhead 1:

Run a frame (discarding audio and video)
Save State
Run a frame (presenting audio and video)
Load State

You have just displayed a future frame, which removes 1 frame of input lag. If you have a CRT plugged in, you have beaten the original hardware at latency.

For different values of RunAhead (such as 2) it looks like this:
Run a frame (discarding audio and video)
Save State
Run a frame (discarding audio and video)
Run a frame (presenting audio and video)
Load State

It's RunAhead 2 because you ran one frame, then ran two more frames, displaying the last one.

==My Own Testing==
Meanwhile, Super Mario World on the SNES is known to have 2 frames of internal lag.

I had done some latency testing on my laptop, which has about 1 frame of display lag on the screen itself. Since you need to wait for vblank before presenting frames, it's more like 2 frames of lag.
Using RetroArch with RunAhead 2, and Hard GPU sync, I got a total input lag of 3 frames. Really nice for modern hardware.
Meanwhile, on a true SNES and true CRT, you have the 2 frames of internal lag.


The RunAhead feature did not originate with RetroArch. Gens 11 re-recording had it first, and that inspired me to add it into RetroArch.
Anyone who wants to reduce input lag in their emulator will need to run in Exclusive Fullscreen mode in Windows, and have a feature similar to RetroArch's Hard GPU sync.
RunAhead is just a few lines of code, it just requires that your emulator has rock-solid savestates, and good enough performance to run multiple frames at once.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 12:03 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 764
93143 wrote:
Super Accelerator System
This is an SA-1 cart? Adding a 10mhz 65816 is not going to be a problem at all, even if we write the code to run at the minimal step and get the code to emulate various bus levels I don't see that a current cpu would have any issue. Is there somewhere that documents the issues faced? From my limited knowledge of the SA-1 it runs its clock in a strict multiple of the base clock? We got the Amiga emulated, and we have the PS2 mostly emulated, the SA-1 is trivial in comparison. I think VICE even supports detecting the Shift-lock key vs Shift key on a C64 ( they are wired to the exact same pin on the mobo ) trick.


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 12:40 am 
Offline

Joined: Thu Aug 20, 2015 3:09 am
Posts: 466
I know nothing about FPGAs, but on the topic of the discussion/flame war I do know this: getting low latency on a 'modern' PC is a stone bitch. Your keyboard alone probably has more latency than a NES+CRT end-to-end, nevermind the rest of your computer.

It's possible, sure, but you've really got to do your research... and since manufacturers (intentionally) don't publish hard specs on their hardware, that's going to involve either owning a logic analyzer or knowing someone who does.


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 1:27 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 975
pcsx2 is a big pile of hacks that still has glitches in almost every game, and doesn't run full speed in software. GPU renderers are faster but bring their own issues. Then there's the "only 32-bit" part. /siderant


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 2:20 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7598
Location: Canada
PS2 is wayyyy out of scope for the current generation of FPGAs anyway, no?


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 4:53 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 764
A single FPGA sure, no chance, but I don't see why you have to just use 1.


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 4:58 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 764
Rahsennor wrote:
I know nothing about FPGAs, but on the topic of the discussion/flame war I do know this: getting low latency on a 'modern' PC is a stone bitch. Your keyboard alone probably has more latency than a NES+CRT end-to-end, nevermind the rest of your computer.

It's possible, sure, but you've really got to do your research... and since manufacturers (intentionally) don't publish hard specs on their hardware, that's going to involve either owning a logic analyzer or knowing someone who does.
This isn't a Emulation vs FPGA argument, this is "custom designed thing to do task X" vs "giant general purpose machine that multitasks and runs lots of different software" argument.


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 3:49 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7598
Location: Canada
Oziphantom wrote:
A single FPGA sure, no chance, but I don't see why you have to just use 1.

I don't believe that's feasible either. The FPGAs have to be able to handle not just the complexity but the speed as well, and combining multiple FPGAs has diminishing returns, especially dealing with all the connections between them.

For a rough comparison, here's 3 generations of just the CPU:

SNES: 22K transistors, 4MHz CPU
PS1: 1M transistors, 34MHz CPU
PS2: 13M transistors, 300MHz CPU

SNES is currently proven to be commercially viable to reproduce in an FPGA. PS1... not yet. I've heard of a FPGA PS1 project, but not a complete one. PS2 isn't even on the table. (Please correct me if I'm wrong.)

I'm sure at a high enough price there are suitable FPGAs to reproduce a PS2, but I'm also pretty sure they're so expensive ($$$$?) it would be ridiculous to try to use them for this purpose.


General purpose CPUs and software emulators, on the other hand, are already overcoming these problems at a much more reasonable price. There's been like a 20 year lag between a system viable as a PC emulator, and one viable as an FPGA clone.


Top
 Profile  
 
PostPosted: Fri Aug 09, 2019 4:05 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21635
Location: NE Indiana, USA (NTSC)
You don't have to put the whole console in one FPGA. If a single chip from the original console fills an FPGA, a reproduction can in theory use an FPGA for just that chip. For example, one could implement a Super NES in one 65816, a few RAMs, and six FPGAs: memory (incl. DMA and input) controller, audio CPU, audio DSP, sprite compositor, background reader and priority/color math compositor, and scaler. These would correspond to the S-CPU, S-SMP, S-DSP, the two S-PPUs, and S-ENC/S-RGB. How many chips make up a PlayStation console, model SCPH-100x?

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 51 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group