It is currently Sat Oct 21, 2017 10:01 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject:
PostPosted: Wed Jan 04, 2012 9:52 am 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
tepples wrote:
The secret is that the tank demo's tiles are double buffered, and any blank tile is never written to VRAM.

Yeah, I think that's the big one. Another is that each frame is buffered as much as possible in main RAM before sending to CHRRAM, mine doesn't attempt to do that and has to do a lot of work reading stuff out of CHRRAM and immediately putting it back whenever it updates a tile again. As I fully double buffer the whole pattern table I also have to waste time copying anything still dirty between frames (though I can often pair this with a frame's first update).

I'm theoretically able to push 32 tiles per frame (60 cycles per tile, 12 to set VRAM addr, 6 for each line), Bell's actually is somehwhat slower than that (93 cycles per tile, 29-ish+8*8 for each line), and then it still needs to update the name table. All of those updates are managed by nice families of subroutines in ROM, though, whereas my code spends a lot of time generating opcodes in RAM that it then runs straight through during vblank.

Where I lose is in making updates: if I had to update every line of a tile it costs 160 cycles (12 adr, 4 dummy read, 4 read+2 immed op+3 ZP write*8, 12 adr 6 write*8 + 12 for a JSR/RTS). I could improve that by getting rid of the JSR, but then I wind up using almost twice as much buffer for it (the ZP writes store into immediate ops in a routine that we then jump to for the write phase), and both buffer and vblank cycles end up being limiting factors at different times.

Besides the massive difference in skills, it's also a difference in goal. I wanted to get as close as possible to a fully bitmapped display. The tank demo wanted to do 3D wireframes (and a nice BG) and does 'em best. Whenever the framerate starts to slow to a crawl, that is something Bell's engine simply couldn't do with its tile budget. I still haven't read through the source, though, so this is based on watching it in FCEUX's debugger.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 04, 2012 10:31 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19115
Location: NE Indiana, USA (NTSC)
Blargg and I wrote a fully bitmapped display driver once, but it was designed primarily for text (an e-book reader), not wireframes. If I were to clone Mystify, I'd use an engine designed for wireframes.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 04, 2012 12:26 pm 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
Mystify isn't a well-behaved wireframe, though. There's a lot more line intersections at hard-to-predict places, very long lines running close to each other, and you need to do clears as well as sets.

I'd be interested to see that ebook reader, but I wasn't able to find a link on your site, is it released?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 04, 2012 2:25 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19115
Location: NE Indiana, USA (NTSC)
hcs wrote:
Mystify isn't a well-behaved wireframe, though. There's a lot more line intersections at hard-to-predict places, very long lines running close to each other, and you need to do clears as well as sets.

A full clear is implicit in each frame with an engine like Bell's.

hcs wrote:
I'd be interested to see that ebook reader, but I wasn't able to find a link on your site, is it released?

No, but it was discussed in this topic.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 04, 2012 4:39 pm 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
tepples wrote:
A full clear is implicit in each frame with an engine like Bell's.

But that's why it isn't applicable,
1) Since it messes up the interference effect I remember
2) It'd be even slower drawing 16 huge (potentially screen-diagonal) lines each frame. I suppose we could take each set (those tracking the evolution of a side of the polygon) and draw them together to save a lot of duplicated effort combining them later. Still not sure what would be a good way to handle the other intersection cases.

The vblank actually isn't the limiting factor right now, it's the line drawing code itself. I'm thinking of using Bresenham's other "Run Length Slice" algorithm on longer lines, where the cost of the division can be amortized. Even my naive implementation could probably be improved with some thought, I've still got a lot of ROM. I should really take a look at Bell's method for line plotting, that much at least should be completely applicable to the full bitmap model.

tepples wrote:
hcs wrote:
I'd be interested to see that ebook reader, but I wasn't able to find a link on your site, is it released?

No, but it was discussed in this topic.

Cool, thanks.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jan 17, 2012 5:14 pm 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
Got the non-vblank processing to catch up by some optimizations in how dlists are generated, about 17% faster that v5. Now the limitation is vblank time again.
http://hcs64.com/files/nestify5.nes


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jan 17, 2012 7:44 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19115
Location: NE Indiana, USA (NTSC)
Looking great.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jan 24, 2012 4:17 am 
thefox wrote:
Not bad for not using WRAM (pretty sure Tank Demo does).


Yap, not bad... All good :D


Top
  
 
 Post subject:
PostPosted: Wed Jan 25, 2012 2:27 pm 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
Thanks spam guy!

I've been working hard on this for the past week, but only made small progress until the past two days.

1. I've ripped out the old 6502 generating code in favor of a threaded code implementation, where I keep a ring buffer of subroutine addresses-1 on the stack page, and the NMI routine executes these directly with each RTS. Arguments for the routines (VRAM addresses and bitmap data) are kept in a separate array. I put the cycle cost of each subroutine in the byte before it, so the address-1 tables allow that to be easily looked up. There is a whole lot of fairly uninterestingly code to support all the variant routines, in some cases I have 32 versions of each so I don't need to spend another byte specifying the high 8 bits of the address. It has expanded greatly as a result, from just under 8KB in v5 to 30KB in v6. I have Python generating some code (as well as lookup tables), though I don't really use it as more than a powerful macro preprocessor. I'm becoming less and less enthused with NESHLA as I deal with its limitations and faults.

2. The biggest improvement is an aggressive caching system, which dynamically allocates 64 tiles that are kept in RAM until they become empty. Any more must still go out to CHRRAM. Cache is interleaved with other memory in order to make the best use of space.

Overall it is now about 25% faster than last time. The slow parts aren't much faster, but the fast parts are speedy.

http://hcs64.com/files/nestify6.nes


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 25, 2012 2:51 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7233
Location: Chexbres, VD, Switzerland
Tank demo does not use WRAM, it says "32k PRG and nothing else".

I guess the big difference is that Tank Demo (and Elite) relies on the fact there is many completely empty tiles on the screen so they only upload the necessary characters and the name table. This demo however does only update characters, and have a completely free "bitmap surface" in the middle of the screen that is hardwired and never changed.
This implies more character updating, but no tame table updating.

_________________
Life is complex: it has both real and imaginary components.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jan 25, 2012 2:54 pm 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
Yeah, I noted that in my response, spam guy just picked a line at random.
---
Another day, another 10%: http://hcs64.com/files/nestify7.nes

I realized that I could use the stack pointer directly to index the data that the routines would deal with, that and putting one byte of the 9 on the zero page (interleaved with single byte values) saved 5 cycles per command, and simplified writing considerably (this version uses 2KB less ROM than v6).

I also reworked caching and double buffering again, cache writes are now limited to only the lines that need to be written to a frame. I had implemented this before but due to various issues (worked out by drawing a lot of state machines) it wasn't providing measurable benefit.

Also caught a bunch of cycle counting bugs that were causing v6 to bounce occasionally when it took too long for vblank. That took away some small bit of speed for the sake of caution, the other changes much more than made up for it.

I tried doing color cycling, but most colors look awful with vertically-oriented diagonals on a black background, on my old CRT anyway, so I've kept that on its own branch for now.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 13, 2012 3:08 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7233
Location: Chexbres, VD, Switzerland
Well this inspired me to do my own vector graphics code.
And guess what I did all it in an afternoon (well an afternoon + an evening to be exact).

It works quite differently form hcs' and from Ian Bell's I think I took the advantages of both and tried to do my best on it. If someone is interested just say so.

I can currently draw arbitrary pixels and almost* arbitrary lines on a 256x128 pixels monochrome surface.
As for speed it seems I can draw approximately a dozen of vectors per frame I have no idea if this is supposed to be good or bad.

(* I still have bugs with lines longers than 64 pixels I guess it's due to some kind of overflow in 8-bit signed numbers and I need twice the lenght in the line drawing algorithm)

The real problem is that I lack any source for vector graphics to play with my engine. I can draw static images but it gets boring very quickly, and I don't want to code crazy matix rotations or stuff of the like.

The goal would be to have some vertex "video streaming" where I could just have a list of 2D coordinates, and render to the screen. Anyone have an idea how I could do that other than manually entering all the cordinates (which comes quickly annoying and tedious) ?

_________________
Life is complex: it has both real and imaginary components.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 13, 2012 3:45 pm 
Offline

Joined: Mon Nov 27, 2006 11:34 pm
Posts: 31
Location: NYC
Cool, I'd like to see it. For some point of comparison, it takes 12 frames for Nestify to draw its initial set of 16 (long, overlapping) lines. The next update (erasing 4 and drawing 4 lines) takes 9 frames, and once it gets going it's usually around 7 (though with notable slowdowns). I haven't thoroughly measured (or counted lines) in the Tank Demo but it usually completes an update in one or two frames.

Something like a few rotating regular polygons is easy to do with a sine table, if you want something to test with that takes no artistry. With N entries you just pick x, x+N/4, x+2N/4, x+3N/4 and you have the angles of the corners of a square, just march x along to rotate it.

The cool thing about using Mystify as a test is that it is really easy to set up, and it exercises pretty much every possible scenario of combined writes, erasures, reuse of tiles, line lengths, etc. It's a decent stress test, which makes it an interesting challenge to a rasterizer.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 14, 2012 1:27 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7233
Location: Chexbres, VD, Switzerland
Thanks for the idea of doing rotating polygons, it's not very exiting but it gets the job done.

I think I can now share a working version of my engine here :
http://dl.dropbox.com/u/23465629/NES_junk/vector.nes
This ROM is optimized for NTSC (since it's what most emus default to) but I can make the engine works just as well on a PAL console. Oh and I tested both on my real NESes, and both works perfect.

It takes 3 frames to plot 10 vectors (in fact it takes about 2.5 frames but yeah this rounds up to 3). Since half of a frame is reserved for updates anyways I can draw approx. 5 vectors per frame, but it really depends on the lenght of the segment, since each pixel is plotted individually the number of white pixels in a frame directly determines how long it is to render this frame.

_________________
Life is complex: it has both real and imaginary components.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 14, 2012 5:07 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10066
Location: Rio de Janeiro - Brazil
Pretty smooth demo. You know what I would just love? To see the NES doing animated sequences like the ones in Another World... I'm sure that a LOT of optimization would be necessary for that!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group