Line plotting engine

A place where you can keep others updated about your NES-related projects through screenshots, videos or information in general.

Moderator: Moderators

hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

tepples wrote:The secret is that the tank demo's tiles are double buffered, and any blank tile is never written to VRAM.
Yeah, I think that's the big one. Another is that each frame is buffered as much as possible in main RAM before sending to CHRRAM, mine doesn't attempt to do that and has to do a lot of work reading stuff out of CHRRAM and immediately putting it back whenever it updates a tile again. As I fully double buffer the whole pattern table I also have to waste time copying anything still dirty between frames (though I can often pair this with a frame's first update).

I'm theoretically able to push 32 tiles per frame (60 cycles per tile, 12 to set VRAM addr, 6 for each line), Bell's actually is somehwhat slower than that (93 cycles per tile, 29-ish+8*8 for each line), and then it still needs to update the name table. All of those updates are managed by nice families of subroutines in ROM, though, whereas my code spends a lot of time generating opcodes in RAM that it then runs straight through during vblank.

Where I lose is in making updates: if I had to update every line of a tile it costs 160 cycles (12 adr, 4 dummy read, 4 read+2 immed op+3 ZP write*8, 12 adr 6 write*8 + 12 for a JSR/RTS). I could improve that by getting rid of the JSR, but then I wind up using almost twice as much buffer for it (the ZP writes store into immediate ops in a routine that we then jump to for the write phase), and both buffer and vblank cycles end up being limiting factors at different times.

Besides the massive difference in skills, it's also a difference in goal. I wanted to get as close as possible to a fully bitmapped display. The tank demo wanted to do 3D wireframes (and a nice BG) and does 'em best. Whenever the framerate starts to slow to a crawl, that is something Bell's engine simply couldn't do with its tile budget. I still haven't read through the source, though, so this is based on watching it in FCEUX's debugger.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Blargg and I wrote a fully bitmapped display driver once, but it was designed primarily for text (an e-book reader), not wireframes. If I were to clone Mystify, I'd use an engine designed for wireframes.
hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

Mystify isn't a well-behaved wireframe, though. There's a lot more line intersections at hard-to-predict places, very long lines running close to each other, and you need to do clears as well as sets.

I'd be interested to see that ebook reader, but I wasn't able to find a link on your site, is it released?
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

hcs wrote:Mystify isn't a well-behaved wireframe, though. There's a lot more line intersections at hard-to-predict places, very long lines running close to each other, and you need to do clears as well as sets.
A full clear is implicit in each frame with an engine like Bell's.
hcs wrote:I'd be interested to see that ebook reader, but I wasn't able to find a link on your site, is it released?
No, but it was discussed in this topic.
hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

tepples wrote:A full clear is implicit in each frame with an engine like Bell's.
But that's why it isn't applicable,
1) Since it messes up the interference effect I remember
2) It'd be even slower drawing 16 huge (potentially screen-diagonal) lines each frame. I suppose we could take each set (those tracking the evolution of a side of the polygon) and draw them together to save a lot of duplicated effort combining them later. Still not sure what would be a good way to handle the other intersection cases.

The vblank actually isn't the limiting factor right now, it's the line drawing code itself. I'm thinking of using Bresenham's other "Run Length Slice" algorithm on longer lines, where the cost of the division can be amortized. Even my naive implementation could probably be improved with some thought, I've still got a lot of ROM. I should really take a look at Bell's method for line plotting, that much at least should be completely applicable to the full bitmap model.
tepples wrote:
hcs wrote:I'd be interested to see that ebook reader, but I wasn't able to find a link on your site, is it released?
No, but it was discussed in this topic.
Cool, thanks.
hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

Got the non-vblank processing to catch up by some optimizations in how dlists are generated, about 17% faster that v5. Now the limitation is vblank time again.
http://hcs64.com/files/nestify5.nes
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

Looking great.
Ambit_Energy

Post by Ambit_Energy »

thefox wrote:Not bad for not using WRAM (pretty sure Tank Demo does).
Yap, not bad... All good :D
hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

Thanks spam guy!

I've been working hard on this for the past week, but only made small progress until the past two days.

1. I've ripped out the old 6502 generating code in favor of a threaded code implementation, where I keep a ring buffer of subroutine addresses-1 on the stack page, and the NMI routine executes these directly with each RTS. Arguments for the routines (VRAM addresses and bitmap data) are kept in a separate array. I put the cycle cost of each subroutine in the byte before it, so the address-1 tables allow that to be easily looked up. There is a whole lot of fairly uninterestingly code to support all the variant routines, in some cases I have 32 versions of each so I don't need to spend another byte specifying the high 8 bits of the address. It has expanded greatly as a result, from just under 8KB in v5 to 30KB in v6. I have Python generating some code (as well as lookup tables), though I don't really use it as more than a powerful macro preprocessor. I'm becoming less and less enthused with NESHLA as I deal with its limitations and faults.

2. The biggest improvement is an aggressive caching system, which dynamically allocates 64 tiles that are kept in RAM until they become empty. Any more must still go out to CHRRAM. Cache is interleaved with other memory in order to make the best use of space.

Overall it is now about 25% faster than last time. The slow parts aren't much faster, but the fast parts are speedy.

http://hcs64.com/files/nestify6.nes
User avatar
Bregalad
Posts: 8055
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Post by Bregalad »

Tank demo does not use WRAM, it says "32k PRG and nothing else".

I guess the big difference is that Tank Demo (and Elite) relies on the fact there is many completely empty tiles on the screen so they only upload the necessary characters and the name table. This demo however does only update characters, and have a completely free "bitmap surface" in the middle of the screen that is hardwired and never changed.
This implies more character updating, but no tame table updating.
Useless, lumbering half-wits don't scare us.
hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

Yeah, I noted that in my response, spam guy just picked a line at random.
---
Another day, another 10%: http://hcs64.com/files/nestify7.nes

I realized that I could use the stack pointer directly to index the data that the routines would deal with, that and putting one byte of the 9 on the zero page (interleaved with single byte values) saved 5 cycles per command, and simplified writing considerably (this version uses 2KB less ROM than v6).

I also reworked caching and double buffering again, cache writes are now limited to only the lines that need to be written to a frame. I had implemented this before but due to various issues (worked out by drawing a lot of state machines) it wasn't providing measurable benefit.

Also caught a bunch of cycle counting bugs that were causing v6 to bounce occasionally when it took too long for vblank. That took away some small bit of speed for the sake of caution, the other changes much more than made up for it.

I tried doing color cycling, but most colors look awful with vertically-oriented diagonals on a black background, on my old CRT anyway, so I've kept that on its own branch for now.
User avatar
Bregalad
Posts: 8055
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Post by Bregalad »

Well this inspired me to do my own vector graphics code.
And guess what I did all it in an afternoon (well an afternoon + an evening to be exact).

It works quite differently form hcs' and from Ian Bell's I think I took the advantages of both and tried to do my best on it. If someone is interested just say so.

I can currently draw arbitrary pixels and almost* arbitrary lines on a 256x128 pixels monochrome surface.
As for speed it seems I can draw approximately a dozen of vectors per frame I have no idea if this is supposed to be good or bad.

(* I still have bugs with lines longers than 64 pixels I guess it's due to some kind of overflow in 8-bit signed numbers and I need twice the lenght in the line drawing algorithm)

The real problem is that I lack any source for vector graphics to play with my engine. I can draw static images but it gets boring very quickly, and I don't want to code crazy matix rotations or stuff of the like.

The goal would be to have some vertex "video streaming" where I could just have a list of 2D coordinates, and render to the screen. Anyone have an idea how I could do that other than manually entering all the cordinates (which comes quickly annoying and tedious) ?
Useless, lumbering half-wits don't scare us.
hcs
Posts: 31
Joined: Mon Nov 27, 2006 11:34 pm
Location: NYC
Contact:

Post by hcs »

Cool, I'd like to see it. For some point of comparison, it takes 12 frames for Nestify to draw its initial set of 16 (long, overlapping) lines. The next update (erasing 4 and drawing 4 lines) takes 9 frames, and once it gets going it's usually around 7 (though with notable slowdowns). I haven't thoroughly measured (or counted lines) in the Tank Demo but it usually completes an update in one or two frames.

Something like a few rotating regular polygons is easy to do with a sine table, if you want something to test with that takes no artistry. With N entries you just pick x, x+N/4, x+2N/4, x+3N/4 and you have the angles of the corners of a square, just march x along to rotate it.

The cool thing about using Mystify as a test is that it is really easy to set up, and it exercises pretty much every possible scenario of combined writes, erasures, reuse of tiles, line lengths, etc. It's a decent stress test, which makes it an interesting challenge to a rasterizer.
User avatar
Bregalad
Posts: 8055
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Post by Bregalad »

Thanks for the idea of doing rotating polygons, it's not very exiting but it gets the job done.

I think I can now share a working version of my engine here :
http://dl.dropbox.com/u/23465629/NES_junk/vector.nes
This ROM is optimized for NTSC (since it's what most emus default to) but I can make the engine works just as well on a PAL console. Oh and I tested both on my real NESes, and both works perfect.

It takes 3 frames to plot 10 vectors (in fact it takes about 2.5 frames but yeah this rounds up to 3). Since half of a frame is reserved for updates anyways I can draw approx. 5 vectors per frame, but it really depends on the lenght of the segment, since each pixel is plotted individually the number of white pixels in a frame directly determines how long it is to render this frame.
Useless, lumbering half-wits don't scare us.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Post by tokumaru »

Pretty smooth demo. You know what I would just love? To see the NES doing animated sequences like the ones in Another World... I'm sure that a LOT of optimization would be necessary for that!
Post Reply