It is currently Sat Oct 21, 2017 10:02 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Sun Jun 11, 2017 4:15 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
tepples wrote:
Super FX homebrew on Super NES.

You mean like the game I'm actually working on?

lidnariq wrote:
As far as I can tell, it should be trivial to get ≈16MB/sec out of the cartridge interface. Getting all the way up to the 25MHz might start being a tricky engineering problem. But one can certainly get SQI-capable NOR flash now that's capable of 104MHz, so the external engineering challenges should be doable...

Sounds reasonable. Interesting that this hypothetical "FastROM mode" is in a completely different league from what everyone actually used... Does cartridge DMA stall any of the chips, and if so, which ones?

I'm not sure how one would go about estimating the required data rate. One might have to do experiments with an actual implementation to see what combination of geometry, number of layers and update speed for each layer produces an acceptable result while leaving enough time for near-field rendering...

Let's see... If WDC and Naboo really did better than 180,000 textured polygons per second, that's over 9000 per scene at 20 fps. More than twice what you might see in a reasonably well-made game late in the system's life. If that's the case, whatever geometry you can fit into the near field can be more or less standard for later games, and if each backdrop layer runs at half the rate of the previous one, each one can have just as many visible polygons as the near field ad infinitum, or at least until the cost of compositing all the backdrops at 20 fps exceeds the cost of rendering the most distant one at 20/(2^n) fps. This is probably more draw distance than necessary even for BotW, and almost certainly exceeds the available RAM by a large fraction. Going to just Gouraud shading instead of texturing for more distant terrain exacerbates this by jacking up the potential poly count, though of course you could still use it with lower-fidelity geometry to buy performance for more detailed near-field work.

I guess the questions are:
1) how many bytes per triangle in a typical N64 mesh?
2) how many triangles would a Zelda-like game use in an environment that peaked at 4500 visible polygons per scene?
3) how much RAM can plausibly be dedicated to mesh data in a game like this?
Given estimates of those things, it should be possible to determine whether streaming geometry for distant scenery is potentially feasible.

Espozo wrote:
The leaves on the trees are just large, randomly intersecting planes

Everybody does that. Foliage is hard. Even on PS4, developers don't model individual leaves. In this case the grass seems to have gotten higher priority. There are some pretty lush fields in this game. Some of them have a bunch of trees, which cast realistic shadows on the grass... just like everything else...

Attachment:
WVW69kIkUdEHiGcY1N.jpg
WVW69kIkUdEHiGcY1N.jpg [ 74.84 KiB | Viewed 795 times ]

Quote:
it experienced framerate drops at 30fps, 900p

There have been a couple of performance patches. It's apparently much better now. (It seems the Great Plateau was the worst area for framedrops for some reason, despite being nowhere near the most graphically impressive...) They probably did a quick port of the Wii U version; I imagine the game is fighting the architecture to some degree.

But I don't really have a problem with it myself; I'm used to games having to take shortcuts. I think it's really impressive for a Wii U game, and the art style is great. I particularly like the reflections on surfaces like water and mud. Though I admit it does kinda bug me that the shadows disappear in the distance...

Go ahead and tell me this is ugly:

Attachment:
img_20170313_231335ltyz2.jpg
img_20170313_231335ltyz2.jpg [ 127.79 KiB | Viewed 795 times ]

Or this:

Attachment:
C6-iQXUVAAAWQ11.jpg
C6-iQXUVAAAWQ11.jpg [ 148.92 KiB | Viewed 795 times ]

tokumaru wrote:
That shading on Link(?) is atrocious.

It's cel shading, and I'm not sure the image compression is doing it justice. It's part of the art style. One of the reasons people compare it to a Ghibli film.

calima wrote:
Xenoblade looks better than that, and it was an open-world game on the Wii.

That's frankly silly. Xenoblade just looks like a Gamecube game with a large draw distance. The ground cover is noninteractive billboarding, and the lighting is extremely primitive. Never mind that it's not even HD...

Have you actually seen any of BotW in motion, with the dynamic lighting and weather? Have you seen what a tree looks like with the sun behind it? Or the grass for that matter? Have you seen the grass react to being stepped on, or cut, or set on fire? And IIRC the haze and possibly even the light rays seem to be volumetric according to Digital Foundry...

Slight digression: I can't stand modern games that look nearly photorealistic but have no interactivity or dynamism. I remember a Godzilla game I watched a short video of - the water of Tokyo Bay looked realistic, but it didn't react to Godzilla's presence at all except for a pasted-on splash effect that wouldn't have been out of place on the N64. A game like Horizon Zero Dawn looks great if you just stand there and look at stuff, but somebody compared it with Zelda in a couple of videos, and the comparison is rather stark (and surprisingly direct - these games have a lot in common it seems):

https://www.youtube.com/watch?v=aVPXKdSEGNQ
https://www.youtube.com/watch?v=qEGWtyJAkO0

...

Nobody has really commented on my backdrop rendering idea yet. Does it seem like the sort of thing that might work? Has it been tried before?


Top
 Profile  
 
PostPosted: Sun Jun 11, 2017 4:40 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6293
Location: Seattle
93143 wrote:
Does cartridge DMA stall any of the chips, and if so, which ones?
Not directly, I think. Obviously it consumes RDRAM bandwidth, and some of the RDRAM bandwidth is always being consumed by the video and audio output devices, and they necessarily have to take precedence. But I don't know whether (e.g.) fulfilling a CPU cache miss would take priority over cart↔RDRAM DMA or vice versa.


Top
 Profile  
 
PostPosted: Sun Jun 11, 2017 11:40 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
That's okay, probably. Right? Even at maximum speed, cartridge DMA is still only about 10% of total RDRAM bandwidth, so if you can do everything in large enough chunks it seems like there shouldn't be a huge hit to performance. Then again, wasn't doing everything in large enough chunks the problem with N64 development?

Espozo wrote:
Nothing would be compatible with this, but it would still be cool to see someone expand the expansion pack ram from 4MB to 12MB, if that's all the N64 can address.

That's an interesting idea. Games have had packed-in RAM expansions before. With my Zeno's Racecourse rendering scheme above (or do I dare call it "Zenoblade" - maybe I'd better find out if it works first), you could possibly handle another couple of layers with the extra RAM, bringing the remaining layers within range of cartridge streaming.

...

I went for a walk late this afternoon, and I came back with the impression that while Horizon Zero Dawn may look more like a (touched-up) photograph, or perhaps a movie, Zelda feels more like real life. The sunlight is really well handled.


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 2:46 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
Quote:
How many bytes per triangle

Not on N64, but a typical vertex is 3 floats for position, 3 floats for normal, 4 u8 for color, and two floats for the texcoord. That's 36 bytes, and a tri has three, 108 bytes plus three indices, which are u16 usually, so 108 + 6 = 114 bytes for a triangle. Depending on how flexible the gpu is, and what precision you can get away with, you may be able to use fixed-point u16 or u8 integers instead of floats, etc.

Quote:
That's frankly silly. Xenoblade just looks like a Gamecube game with a large draw distance. The ground cover is noninteractive billboarding, and the lighting is extremely primitive. Never mind that it's not even HD...

Go play it. It has extremely good scenery, they made good use of the resources.

Quote:
Have you actually seen any of BotW in motion, with the dynamic lighting and weather?

No, I won't bother spoiling anything until I get to play it myself.

edit:
Quote:
2) how many triangles would a Zelda-like game use in an environment that peaked at 4500 visible polygons per scene?

4500 :P

That question doesn't make sense. Every game-targeting artist models with tris or quads, and always measures in tris.


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 2:53 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
Quote:
Nobody has really commented on my backdrop rendering idea yet. Does it seem like the sort of thing that might work? Has it been tried before?

It will probably work, but maybe not at the scale you expect on a N64. For reference, an Intel GMA is put to its knees when alpha-blending 6 layers at a resolution of 1920x1080.


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 3:46 am 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
calima wrote:
Depending on how flexible the gpu is, and what precision you can get away with, you may be able to use fixed-point u16 or u8 integers instead of floats, etc.

~100 bytes is about what I'd estimated, even though I did it wrong. If that can be shaved down a bit, great, but you probably still need 32-bit positions at least. I'm not sure what sort of precision the "texcoord" would need. The only obvious saving I can see is the normal, which gets you down to 96 bytes if you use 16-bit. So probably on the order of 10,000 tris per megabyte at best...

Quote:
I won't bother spoiling anything until I get to play it myself.

Probably a good idea. Best to avoid pumping up expectations, then...

Quote:
Quote:
2) how many triangles would a Zelda-like game use in an environment that peaked at 4500 visible polygons per scene?

That question doesn't make sense. Every game-targeting artist models with tris or quads, and always measures in tris.

No, I mean how much geometry would have to be stored in RAM, before culling. 4500 tris is after culling.

Or have I seriously misunderstood what the "polygons per second" number means?

calima wrote:
It will probably work, but maybe not at the scale you expect on a N64. For reference, an Intel GMA is put to its knees when alpha-blending 6 layers at a resolution of 1920x1080.

Super Mario 64 doesn't seem to have much trouble with alpha textures. Maybe the GMA in question just isn't designed for fast alpha blending?

...seriously, that's pathetic. Even if it is in HD. Isn't the N64 supposed to have a fill rate on the order of about 40 screens worth of pixels in 240p20 with most effects turned on? Including alpha?

I admit I forgot about fill rate in my enthusiasm for the polygons per scene metric, which is embarrassing as fill rate is well known as a major bottleneck on N64. That does limit things a bit...


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 5:39 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
93143 wrote:
I'm not sure what sort of precision the "texcoord" would need. The only obvious saving I can see is the normal, which gets you down to 96 bytes if you use 16-bit.

Texcoord precision depends a lot on your art. A quad floor with a single texture, no tiling or wrapping, gets away with 1 bit. A beatiful heroine? That depends on the skill of your artist, but 8 or 16-bit is probably enough at N64 resolutions. But of course, if there's wrapping or tiling, then you need a sign and possibly scaling, reducing the precision. I doubt N64 supports 16-bit floats.

16-bit normals? Do you mean a 555x configuration, or a 768kb lookup table (12 bytes * 65536)? The table would have decent precision, but use too much RAM; a 5-bit signed value means only 16 steps per direction, which would produce visible quantization errors. With normalized 2-component normals, using a 88 config, you'd have 128 steps per direction, which is still too little for flawless rendering, I estimate. Even the third component would have a lot of error because of the low precision.

Three 16-bit signed ints? That would work for normals, as would two. Two wouldn't be enough at desktop resolutions, but 640x480 would let you get away with it.

Quote:
No, I mean how much geometry would have to be stored in RAM, before culling. 4500 tris is after culling.

Or have I seriously misunderstood what the "polygons per second" number means?
Those benchmark numbers are made with no culling at all. If you have numbers from a game, then it depends.

Quote:
Super Mario 64 doesn't seem to have much trouble with alpha textures. Maybe the GMA in question just isn't designed for fast alpha blending?

...seriously, that's pathetic. Even if it is in HD. Isn't the N64 supposed to have a fill rate on the order of about 40 screens worth of pixels in 240p20 with most effects turned on? Including alpha?

I admit I forgot about fill rate in my enthusiasm for the polygons per scene metric, which is embarrassing as fill rate is well known as a major bottleneck on N64. That does limit things a bit...
Alpha blending a full screen texture is the most fill-rate heavy operation you can do.

Reading and writing the entire size two times, plus Z buffer. 320 * 240 * 10, 750kb per such operation. (4 bytes source, 4 bytes dest, 2 bytes Z)
Quoting wikipedia, 62.5MP/s with no mipmapping means 250MB/s. At 20 fps, that would mean 17 layers of blending, and that's with nothing else going on.


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 7:33 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19115
Location: NE Indiana, USA (NTSC)
calima wrote:
16-bit normals? Do you mean a 555x configuration, or a 768kb lookup table (12 bytes * 65536)? The table would have decent precision, but use too much RAM; a 5-bit signed value means only 16 steps per direction, which would produce visible quantization errors. With normalized 2-component normals, using a 88 config, you'd have 128 steps per direction, which is still too little for flawless rendering, I estimate. Even the third component would have a lot of error because of the low precision.

The first Quake used normals quantized to 8-bit points on a tessellated icosahedron, and it looked OK. That method of quantizing points requires a lookup table for reconstruction. It's also possible to use a tessellated octahedron, whose lookup table isn't nearly as complex to represent because it maps easily onto the unit square using an octahedral mapping (source 1; source 2; source 3; source 4).

Take a square of stretchy material and draw lines connecting the midpoint of each side to the midpoint of each other side.
Code:
+---------------+---------------+
|             - | -             |
|           -   |   -           |
|         -     |     -         |
|       -       |       -       |
|     -         |         -     |
|   -           |           -   |
| -             |             - |
+---------------+---------------+
| -             |             - |
|   -           |           -   |
|     -         |         -     |
|       -       |       -       |
|         -     |     -         |
|           -   |   -           |
|             - | -             |
+---------------+---------------+


Fold inward along the diagonals.
Code:
                +
              - | -
            -   |   -
          -     |     -
        -       |       -
      -         |         -
    -           |           -
  -             |             -
+---------------+---------------+
  -             |             -
    -           |           -
      -         |         -
        -       |       -
          -     |     -
            -   |   -
              - | -
                +

Stretch the center of this upward (toward you), forming a pyramid. Then fold the flaps under, forming an octahedron. Inflate the octahedron until all points on its surface have the same distance from the center. You now have an octahedral geodesic sphere.

On the Nintendo DS, you count quads because each scene can hold up to about 6,000 vertices. A quad is 4 vertices, while two triangles count as 6.


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 11:57 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
Quake's resolution was below 320x240, and its environments were quite blocky. That's why it could get away with a table of 162 (!) normals.
http://www.gamers.org/dEngine/quake/spe ... spec_b.htm


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 12:07 pm 
Offline

Joined: Sun Mar 27, 2011 10:49 am
Posts: 192
Quake could run at lots of resolutions higher than 320x240 if you wanted it to...and conversely don't most N64 games run at something like 320x240?


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 12:11 pm 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 908
Location: Gothenburg, Sweden
Yes, there were top-tier graphics cards when quake was released that could display the game in all its lo fi polygon glory. And it didn't take too long before those resolutions became reasonably priced for consumers. You could argue Quake was the seed of explosion for consumer market graphics cards with 3d capabilities.

_________________
http://www.frankengraphics.com - personal NES blog


Top
 Profile  
 
PostPosted: Mon Jun 12, 2017 4:53 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
calima wrote:
Three 16-bit signed ints? That would work for normals, as would two. Two wouldn't be enough at desktop resolutions, but 640x480 would let you get away with it.

I wasn't too worried about format. Since you suggested it might be possible to drop precision on some elements, I just identified an element I didn't think needed 32 bits of precision. But yeah, I meant for each component, not the whole vector. Which you could have figured out by counting bytes...

Quote:
Those benchmark numbers are made with no culling at all. If you have numbers from a game, then it depends.

Not even CPU culling of triangles that are completely behind the camera?

There were racing games that loaded something like 9000 triangles into RAM at one time, and since they were racing games, the frame rate target was presumably 30 fps. That's far in excess of what anybody's claimed the RCP can pull off, and these aren't even the games that people usually claim the highest numbers for.

Interesting post here: http://www.neogaf.com/forum/showpost.ph ... count=2014
Incomplete chart here: http://www.oocities.org/zeldaadungeon20 ... chart.html

Also, why do the numbers in that post show more than three times as many vertices as triangles in some cases? I can see there being exactly three times as many if common mesh points weren't shared, but... perhaps they designate special point objects?

Speaking of sharing common vertices, are there any disadvantages to it? It seems like it would significantly reduce the memory requirements...

Quote:
Alpha blending a full screen texture is the most fill-rate heavy operation you can do.

Reading and writing the entire size two times, plus Z buffer. 320 * 240 * 10, 750kb per such operation. (4 bytes source, 4 bytes dest, 2 bytes Z)
Quoting wikipedia, 62.5MP/s with no mipmapping means 250MB/s. At 20 fps, that would mean 17 layers of blending, and that's with nothing else going on.

Why would you need to write twice?

According to this, in 1CYCLE mode (62.5 MP/s) "you can generate pixels that are perspective corrected, bilinear filtered, modulate/decal textured, transparent, and z-buffered at a maximum bandwidth of one pixel per cycle". There's no need to use mipmapping, and fogging would be baked in during layer rendering, so the more complex processing associated with 2CYCLE mode is unnecessary. In any case the RAM can certainly do 500 MB/s in sequential access (copy mode can do four two-byte RGBA pixels per cycle), and the source pixels would be preloaded in 4 KB chunks.

With Z-buffering and antialiasing turned off for performance (why would you need Z-buffering for this?) and 15-bit RGB texels with one alpha bit, I'm seeing two bytes read and two bytes written per pixel, and if the microcode is written specifically to handle this case, it should be possible to load and draw multiple full scanlines in one contiguous chunk to minimize bus delays (is it really 640 ns for random access?!). That's about 80 screens worth of pixels per frame, though bus delays could materially impact that - 640 ns is 320 bytes, and if that has to happen once for every 4 KB read or written, we're already down at 75 screens worth with nothing else interfering. Adding zoom and warp transforms would decrease the chunk size and make this worse, even assuming the internal processing doesn't add dead bus cycles...

Antialiasing without Z-buffering is a thing, and presumably improves fill rate over using both, but it seems turning off antialiasing too can roughly double the fill rate on top of that, which implies to me that in this case antialiasing would add two or three bytes of RAM access to the above, for around 50 screens per frame neglecting bus delays.

It's possible that reduced-aliasing mode could look good enough for this application (I think it should work on alpha edges), and it doesn't involve a fill rate penalty (other than the increased load from the other end due to VI activity), which brings us back up to 80 minus bus delays.

Even with full 32-bit RGBA source textures and alpha blending/antialiasing but no Z-buffer, I think it'd be four bytes per pixel to load, two to read and two to write, for a total bandwidth requirement equal to that of 1CYCLE mode using a static texture - so, 40 screens worth (minus latency) with realistic smoke and mist. Then again, rendering the layers to 32-bit RGBA in the first place might be unsupported behaviour (though there are mentions of coverage bits in conjunction with "32-bit mode"; maybe I need to read up on this), not to mention a giant RAM hog even if it did work. The plume over Death Mountain might require special handling...

It's cute how the video system stores extra data in the parity bits... According to this and this, a framebuffer value will be read as RGBC5551, with the top coverage bit showing up at the bottom of the data. The 16-bit texel format is RGBA5551, with the bottom bit being alpha. Sounds to me like the framebuffer can be used directly as 16-bit texture data...

...

...wait, the Z-buffer is supposed to be in a different bank so you can access both it and the framebuffer quickly with high granularity. Doesn't this imply that if you weren't using a Z-buffer, you could do the same trick with the source texture and just pipe it straight through with no recurring bus latency?

...and now I've started finding references that claim far lower RAM latency than 640 ns, and one reference to a CPU cache miss being measured at 600+ ns. What's the actual deal?


Top
 Profile  
 
PostPosted: Tue Jun 13, 2017 2:16 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 558
@tepples
A 16-bit uniform projection might just get away with it. A table like I meant would have weighted them according to the data, a calculable projection has less chance to do so, so it would need testing whether a projection looks good enough.

93143 wrote:
But yeah, I meant for each component, not the whole vector. Which you could have figured out by counting bytes...

I did count your bytes. You said 16-bit would drop it from 108 to 96, a loss of 12 bytes, which would mean no normals at all (3 floats = 12 bytes) :P
Or maybe I misunderstood and you also reduced some other parts.

Quote:
Also, why do the numbers in that post show more than three times as many vertices as triangles in some cases? I can see there being exactly three times as many if common mesh points weren't shared, but... perhaps they designate special point objects?

Some models are badly made, and include unrenderable vertices that aren't a part of any triangle. Point primitives or quads is another possibility, I don't remember if Blender counts tris correctly in a quad model without the manual triangulate step.

Quote:
Speaking of sharing common vertices, are there any disadvantages to it? It seems like it would significantly reduce the memory requirements...

In most cases it's the way to go, but it does cause cache misses. In special cases you can render tri strips or fans faster than indexed, they're both ways to share some verts without indices. Another old optimization was to reorder the vertices to minimize cache misses.

Quote:
Why would you need to write twice?

Read-modify-write cycle.

Quote:
(why would you need Z-buffering for this?)

"Draw the gun first", if you drew the HUD and player and maybe some other objects first, those cover significant screen area, meaning those pixels can be rejected on the backdrop layers and often achieve significant fillrate savings. (It's fun to remember all these old-style optimizations that hardly apply in the modern world anymore ;))

Quote:
15-bit RGB texels with one alpha bit

Bad choice IMHO, that results in visible banding in gradients, like the sky or around a flashlight. You're right it would allow more layers.


Top
 Profile  
 
PostPosted: Tue Jun 13, 2017 3:11 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
calima wrote:
I did count your bytes. You said 16-bit would drop it from 108 to 96, a loss of 12 bytes, which would mean no normals at all (3 floats = 12 bytes)

No, I said it would drop it from 114 bytes to 96, which is 18 bytes or nine half-floats. You said there was a normal on each point.

In any case it's not really all that important; the ballpark doesn't change all that much. There's not much point in micro-optimizing this game unless somebody's actually going to make it...

Quote:
Quote:
Speaking of sharing common vertices, are there any disadvantages to it?
it does cause cache misses.

Oh. Yeah, I can see that being an issue on N64... If the model can be preprocessed to minimize that, great.

Quote:
Quote:
Why would you need to write twice?
Read-modify-write cycle.

Okay, then why do you need to read twice? Sounds like double-counting to me.

Quote:
Quote:
(why would you need Z-buffering for this?)
fillrate savings.

Oh yeah, that.

I was disregarding Z-buffering for the near field on the grounds that it's reputed to have itself been a major hit to fill rate, and turning it off is said to have been step 1 for any developer looking to optimize the microcode. Under those circumstances, drawing the backdrops could be done back-to-front and there'd be no extra ordering load because the arrangement is so simple.

I don't know if software ordering/clipping is sufficient to make up for the lack of a Z-buffer in an open-world exploration game, or if the fill rate benefit is comparable across genres. Obviously one would want to look into this a bit if one were to actually attempt a project like this...

Quote:
Quote:
15-bit RGB texels with one alpha bit
Bad choice IMHO, that results in visible banding in gradients, like the sky or around a flashlight.

I hadn't thought about that. I guess it's a question of whether banding or dither looks worse at retarded frame rates (the N64 has a hardware dither feature that I think doesn't affect fill rate).

Then again, I wouldn't render the sky on one of these backdrops, because the sun/moon and clouds need to move smoothly, and I don't think they're computationally expensive. Other than that, most light sources in this game that would be susceptible to banding are in the near field, and those that aren't would probably need special handling anyway.

...

Okay, according to this, the RDP does in fact have a 32-bit framebuffer mode, and according to this it has exactly the same format as normal 32-bit RGBA, using the top bits of the alpha channel as coverage and ignoring the parity bits. So it is (probably) technically feasible to use 32-bit mode, presumably at the cost of fill rate when rendering the layers.

Time for a bit of handwaving figuring:

Worst case: 32-bit RGBA texture, Z-buffered drawing to a 32-bit final output buffer: 16 bytes RAM access, 20 full screens per frame. Two backdrop layers at most unless you strongly depart from the layer complexity weighting and/or frame rate scaling in the algorithm I sketched. Double buffering should suffice, so RAM usage is 1.2 MB plus whatever margin is necessary for scroll/zoom/warp (which could be substantial), not counting the actual output framebuffer or any of the game data.

Same as above but with a dithered 16-bit output buffer: 12 bytes RAM access, 27 full screens per frame. Two backdrop layers, or three if you stretch it. RAM usage for three layers is 1.8 MB plus margin. You'll also want extra mesh data for the third layer.

Best case: 16-bit RGBA texture, custom Copy+ mode drawing in reduced-aliasing mode and no Z-buffer, assuming no RAM latency issues or internal processing delays: 4 bytes RAM access, 81 full screens per frame. Three or four backdrop layers, RAM usage comparable to the worst case. Except that in this case, you're storing a fair bit of extra mesh data, both because rendering is faster in 16-bit (or is it? You'd want fog at least, and that slows down internal processing) and because there are more layers, so there could be as much as a couple of megabytes worth of extra geometry.

Definitely looking at an Expansion Pak for this...

...I was talking up geometry streaming upthread, but I'm not so sure it would be such a great idea for actual rendering. The RSP would have to access the cartridge every time it wanted more vertex data, and the throughput and latency would probably be horrible. Dynamic environment loading/mesh upgrading, on the other hand, is probably a non-issue with a 50 MB/s ceiling...

...

I just discovered that while the N64 does have an additive-transparency feature, it doesn't clamp the output. Which explains a lot...


Top
 Profile  
 
PostPosted: Tue Jun 13, 2017 3:39 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6293
Location: Seattle
93143 wrote:
Quote:
Quote:
Why would you need to write twice?
Read-modify-write cycle.
Okay, then why do you need to read twice? Sounds like double-counting to me.
Plain blitting without transparency is just "read from texture, write to memory"; with alpha it's "read from texture, read from memory, do math, write to memory". "Read from texture" should be pretty cheap...

If the memory controller is specifically designed to support a RMW cycle, it's not particularly more expensive. I know plain old FPM/EDO DRAMs do support this (and the time to do a RMW cycle is roughly 1.5x the time to do just a read or write), but I don't know anything about newer technologies.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group