It is currently Sat Oct 21, 2017 6:14 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 40 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Tue Jul 29, 2014 6:45 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3470
Location: Indianapolis
tokumaru wrote:
I really liked Membler's idea of generating an IRQ when a certain pattern is fetched, so you can trigger it with the background or sprites, according to your needs. In fact, you can still do everything you could with the automatic bankswitching, you'll just have to do the bankswitching yourself, but that an insignificant price to pay considering you also get to do parallax effects, status bars, split screen 2-player games, rising water, big background bosses, and so on. Those are infinitely more interesting IMO, and I think it would be a shame to sacrifice that so a few lazy programmers can skip a couple of register writes in the opening and end scenes of their games.


I found it's possible to do both, with the 8T-ROM mapper in my battle to fit as much into 36 macrocells as possible, I ran out of resources needed to distinguish the two modes, so I compromised by having it trigger both CHR switch and IRQ. The hacky part is that I had them share the same 'reset' trigger (release IRQ, and stop forcing CHR bank). So you would need to disable IRQs with SEI to use the CHR switch mode without getting a barrage of interrupts. Of course it'd be possible to make a build without this weird hybrid mode, or use a bigger CPLD and select the mode at will. Maybe not relevant to the discussion, but at least it's an example of getting IRQ capability almost for free.

Actually PPU auto-banking can do something the CPU can't, which is switch banks with pixel precision. Not only that, but multiple times per scanline, and regardless of scroll position. Since my implementation uses tile numbers to force specific banks, the type of use I had in mind was that those tiles would hold something like border graphics, so effectively you have windows on the screen with their own tileset. Or the 'border' tiles could be transparent, maybe that would work with background graphics in general, I'm not sure what problems that would run into. (OK I guess some games by RARE like Pirates! and maybe Marble Madness pull off mid-scanline stuff with code, but certainly with some effort and lots of CPU time).


Top
 Profile  
 
PostPosted: Tue Jul 29, 2014 6:49 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19115
Location: NE Indiana, USA (NTSC)
Memblers wrote:
Actually PPU auto-banking can do something the CPU can't, which is switch banks with pixel precision. Not only that, but multiple times per scanline, and regardless of scroll position. Since my implementation uses tile numbers to force specific banks, the type of use I had in mind was that those tiles would hold something like border graphics, so effectively you have windows on the screen with their own tileset. Or the 'border' tiles could be transparent, maybe that would work with background graphics in general, I'm not sure what problems that would run into.

Whatever problems there were, Punch-Out!!, Fire Emblem, and Famicom Wars worked around them. The model you describe sounds very MMC2-ish.


Top
 Profile  
 
PostPosted: Tue Jul 29, 2014 7:36 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6293
Location: Seattle
tokumaru wrote:
That's cool. Traditional scanline counting is probably too complicated to implement with discrete logic parts, so the idea of firing IRQs based on tile fetches sounds like the best compromise to me.
One of the problems with small CPLDs is the lack of space allocated to store state; farming out an IRQ to a CD4020 or 4040 are both ways to move a lot of state to a separate discrete logic IC.
It'd be very easy to clock either of them with PPU A12, much like as the MMC3, and have the CPLD just compare the pertinent outputs to a latched 8 bit value.


Top
 Profile  
 
PostPosted: Tue Jul 29, 2014 8:05 pm 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
tokumaru wrote:
Programmers shouldn't be this lazy.

It's not any more lazy than taking advantage of a DMA vs. manually doing it in software. I understand your desire for raster effects, and I already said I was planning on adding a scanline counter. (and I was going to do it MMC5 style by detecting a triple-read)

Quote:
Now imagine being able to program a side-scroller without having to deal with progressive name table updates and VRAM bandwidth... I'm sure most programmers would rather use a Game Maker than code for the NES.

32k vram would indeed provide an insane amount of nametables to eliminate seam-painting and things like that, freeing up plenty of vram bandwidth, and although that's a nice feature to have, that's not how everyone's going to use it. :P Besides, Game Maker-like utilities are for people who can't program, not programmers who want to be lazy.

Quote:
My point is that there are literally no advantages for the final product... A person playing a game doesn't care whether the patterns are switched automatically or whether the programmer had to make a couple of calculations and register writes, but they will notice if one game has richer raster effects than another. This feature does not promote the betterment of the game, just the laziness of the programmer.

It's true that a hot-tile bank change vs. scanline counter driven bank changes can look the same, but both methods have their own advantages and disadvantages, and even though the consumer doesn't care how the graphics are being generated, the programmer does, and as I've been repeating over and over, I don't think hardware acceleration promotes lazy programmers.

Quote:
I don't regret calling this laziness, because the complexity involved in preparing the numbers for manual bankswitching is inferior to almost every other aspect of a decent game, such as progressive name table updates, collision detection, physics, AI, you name it. If a person can't do those they shouldn't even be thinking of "beautiful hand-drawn backgrounds", because their games are not going to be any polished. If they can handle those just fine, manual pattern switching couldn't possibly take more than half an hour to implement.

Again and again and again and again, it's not about programmers who can't do things in software, it's about being able to take advantage of the hardware doing something for them. You're right that the complexity involved in calculating numbers for bankswitching is negligible when compared to the other aspects of the game engine, so why does it matter whether the programmer is manually implementing it in software or taking advantage of a mapper feature to do it?

This mapper and its feature set isn't going to suddenly enable a whole bunch of inexperienced programmers to just flood nesdev with garbage, so there's no need to get so worked up over features you may not be used to having.


Edit: @Tepples, I really like the way the MMC2 did their hot-tile bankswitching, so I was hoping to incorporate it, but I think I need to revise my original idea a little more. I don't think IRQ-on-tile provides any advantages over a plain scanline counter (anyone's free to prove me wrong though, I'm open to the idea), so I'd just be focusing on auto bank switching.


Top
 Profile  
 
PostPosted: Tue Jul 29, 2014 8:30 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3470
Location: Indianapolis
tepples: Yep, other than playing Punchout I've never taken the slightest look into how those games used it though. It is very similar, uses much less state though since it's working within a 16kB RAM space instead of a big ROM.

For an IRQ counter, one thing I wanted to try but haven't yet, is similar to something Konami did. Count M2 cycles in the sequence of 114,114,113, decrement scanline counter. I suppose that's 2 bits of sequence state machine, 7 bits of M2 count, then however many scanlines you want to count, so 8 bits to get a 'full screen delay' totals 17 bits. I haven't looked into what the PAL timing would be, but I'd be surprised if it converted very well.. I did something similar on the old Squeedo board with the PIC18 timer input I believe pre-scaled by 8, hooked up MMC3-style.


Top
 Profile  
 
PostPosted: Tue Jul 29, 2014 8:39 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19115
Location: NE Indiana, USA (NTSC)
IRQ-on-tile allows for multiple split points, one for each sprite that has that tile. It also avoids having to dedicate 16 macrocells to the counter and its reload value.

PAL NES is 106 9/16 cycles per line. Dendy is 113 2/3 like NTSC NES. You won't be able to let the counter free-run because of the missing dot every other frame on NTSC. But whether scaled to scanlines (VRC) or not (FME-7), a CPU cycle counter will still accumulate error from variable interrupt latency when doing multiple splits. If it takes 1, 1, and 1 cycles to start the IRQ counter in one case and 6, 6, and 6 in another case because it happened to hit longer instructions at the wrong time, there's 15 cycles or 45 pixels of error right there. Plus, as you mentioned, a scanline counter eats up more macrocells for the divider. That's why the triple read scanline counter is so useful.


Top
 Profile  
 
PostPosted: Tue Jul 29, 2014 8:59 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10066
Location: Rio de Janeiro - Brazil
Drag wrote:
I understand your desire for raster effects, and I already said I was planning on adding a scanline counter. (and I was going to do it MMC5 style by detecting a triple-read)

That's great, but then it starts to get a little more complicated, doesn't it? It would be great if I could pick up a handful of discrete logic parts, solder them together according to some schematics and mod an NROM cart into something that can generate scanline IRQs.

Drag wrote:
Again and again and again and again, it's not about programmers who can't do things in software, it's about being able to take advantage of the hardware doing something for them. You're right that the complexity involved in calculating numbers for bankswitching is negligible when compared to the other aspects of the game engine, so why does it matter whether the programmer is manually implementing it in software or taking advantage of a mapper feature to do it?

Because the resources dedicated to this task could be employed into something more useful, that's all.

Quote:
This mapper and its feature set isn't going to suddenly enable a whole bunch of inexperienced programmers to just flood nesdev with garbage, so there's no need to get so worked up over features you may not be used to having.

Nothing against the feature itself, only against sacrificing the better feature in favor of it. =)

Quote:
I don't think IRQ-on-tile provides any advantages over a plain scanline counter

IRQ-on-tile is actually a little worse, since you have to use sprites for variable splits. What I like about those is the simplicity.


Top
 Profile  
 
PostPosted: Sat Aug 02, 2014 2:25 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3470
Location: Indianapolis
tepples wrote:
IRQ-on-tile allows for multiple split points, one for each sprite that has that tile. It also avoids having to dedicate 16 macrocells to the counter and its reload value.

PAL NES is 106 9/16 cycles per line. Dendy is 113 2/3 like NTSC NES. You won't be able to let the counter free-run because of the missing dot every other frame on NTSC. But whether scaled to scanlines (VRC) or not (FME-7), a CPU cycle counter will still accumulate error from variable interrupt latency when doing multiple splits. If it takes 1, 1, and 1 cycles to start the IRQ counter in one case and 6, 6, and 6 in another case because it happened to hit longer instructions at the wrong time, there's 15 cycles or 45 pixels of error right there. Plus, as you mentioned, a scanline counter eats up more macrocells for the divider. That's why the triple read scanline counter is so useful.


Oh! I must have missed the thread where triple read detection is mentioned (MMC5 counter if anyone else was wondering), that sounds like an interesting trick. But I wonder how many cycles pass before getting into hblank after such an interrupt. I might have to try it on my next board. BTW, I can think of one application where a CPU-based timer would be desirable, and that's when doing audio stuff.

Thinking more about auto-chr switching gets me thinking about some other wacky ideas. These aren't really suggestions, but like my other posts in this thread, I'm just throwing out whatever mapper-related stuff I can think of. Cycling through 2 nametables with the same data (except attribute table) sounds like a cheap way to get different attribute table sizes, like 8x16 (that's easy enough to do with timed code on a static screen though). Or with nametables changed, new tile sizes like 4x8 (an identical tile would be in another bank with the it's 4-pixel sections swapped). Or even better, what about a 16x16 mode? Say you write tile #1 to your nametables at $2020,$2021,$2040,$2041. A metatile "generator" could swap in tile #1 from banks 0,1,2,3 (based on the nametable address), giving the programmer 256 metatiles made of 1024 unique tiles with no extra work. Sounds useful, but who has the skills to design a level out of 256 meta-tiles, heheh.

Anyways, it's nice to see some discussions about a new technology mapper, instead of all the goddamn repro stuff.


Top
 Profile  
 
PostPosted: Sat Aug 02, 2014 9:14 pm 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
It's important to note, detecting sprite tile fetches doesn't work the way most people would expect. It's based on y position and sprite number; x position is irrelevant. This is because the sprite patterns are only fetched at the end of the scanline, and they're fetched in the order of sprite 0 to sprite 63, which is the order the sprite evaluation scans the OAM in. Even if it were x/y based (which it would be if it were BG tiles and not sprite tiles), I still believe it'd be the most useful to have a tile detector automatically change a bank instead of fire an IRQ, because the IRQ latency would kill everything. That's why I was insistent on leaving IRQs to scanline counters.

My super-old original "wouldn't it be cool if" idea would've been to have the mapper halt the CPU while the mapper itself performs automatic 2005/2006/2007 writes at the PPU clock, which'd be much faster than the CPU doing it in software. Sadly, it's impossible, because the 6502 cannot pause and relinquish the bus without external logic, and while the NES has that logic, it only works for the sprite DMA and for DPCM; there's no way for the cart to tell the CPU to shut up. :P

@Memblers, your attribute idea is pretty interesting. On the spot, the most straightforward way I can think of to implement this is to detect fetches to the attribute region of each nametable (so the last 64 bytes each of banks 8-11), and have the fetches instead come from a special vram page that's just 1k of attribute data (16 attribute tables = four-screen with 8x8 attribute regions). I wonder if there'd be an easy way of making the attribute bytes linear... I'd definitely try; anything to make it easier on the programmer. :P


Top
 Profile  
 
PostPosted: Sun Aug 03, 2014 12:21 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10066
Location: Rio de Janeiro - Brazil
Drag wrote:
My super-old original "wouldn't it be cool if" idea would've been to have the mapper halt the CPU while the mapper itself performs automatic 2005/2006/2007 writes at the PPU clock, which'd be much faster than the CPU doing it in software. Sadly, it's impossible, because the 6502 cannot pause and relinquish the bus without external logic, and while the NES has that logic, it only works for the sprite DMA and for DPCM; there's no way for the cart to tell the CPU to shut up. :P

Do you really need to go through the CPU and use $2006/7 though? The entire PPU bus is exposed to the cart: address lines, data lines, /RD and /WR. Can't you just use those directly to write to the PPU and leave the CPU be, as long as it doesn't try to access the PPU?


Top
 Profile  
 
PostPosted: Sun Aug 03, 2014 12:44 am 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
I meant using 2005/2006/2007 to control scrolling and do quick palette updates, which would require control of the CPU bus, which would require a way to make the 6502 relinquish the bus. The idea of a DMA like this came from trying to imagine how it'd be possible to (easily and practically) do a 4-player splitscreen like N64 games do. Being able to trigger an automatic 2006/2005/2005/2006 sequence in just 8 cycles (exactly enough!) on a particular position of the scanline (using tile counting. As a bonus, tile fetches always occur at the same locations on the scanline, regardless of fine horizontal scrolling) would've provided an extremely simple way to do it with (probably) no artifacting.

But it's impossible, so oh well.


Top
 Profile  
 
PostPosted: Sun Aug 03, 2014 6:14 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3470
Location: Indianapolis
Yeah, I believe with a normal 6502 chip, you would be able to halt it. The 2A03 uses that feature internally, for DMA like you said. And they had to re-purpose quite a few pins to integrate audio and the $4016/$4017 ports, so stuff had to go.

Last night I figured up the logic for the auto-metatile mode, somehow at first I didn't realize that it's the same logic as handling 8x8 attributes, a mapper could easily do both. Attributes just have more data to enter, while the extra 16x16 nametable data would be implied. If you think of any better way to map attribute memory, that would definitely be interesting.

I didn't think extending the PPU's graphics could be so simple. What I came up with, uses 5 inputs on a 74HC30 and 3 inputs to the CPLD. I guess I'm used to thinking of MMC5 with it's EXRAM, which makes it seem more exotic.


Top
 Profile  
 
PostPosted: Sun Aug 03, 2014 8:52 pm 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
Yeah, I realized that the ExRAM setup basically gives you a dual-ram setup where you can either have ExRAM be extra 8-bit memory, or have ExRAM be activated at the same time as normal VRAM, giving you 16-bit VRAM. Just simply having 32k VRAM doesn't give you the ability to have more than 256 tiles, but you can do pretty much everything else the MMC5 does.

I'd still like to figure out getting a better extended attribute table though. Making it so each byte refers to a 16x16 area instead of a 32x32 area seems to be the best way to go, but then you have to mangle the result somehow, since only two bits of an attribute byte are used at a time, and which two bits depends on which byte of the nametable is being fetched.

Edit: My implementation of the MMC5 style scanline detector requires one latch. The latch is always going to hold the address from the previous fetch, so I can reuse this latch to encode my own attribute byte address, since the attribute byte is always fetched immediately after the nametable byte, and the attribute byte address depends on the nametable byte address. Then, from the fetched byte, I extract the appropriate two bits and assemble a byte that's just the two bits repeating, and give that to the PPU. That'll save me from having to do a second decoding to figure out where the two bits the PPU wants are.


Top
 Profile  
 
PostPosted: Sun Aug 03, 2014 9:56 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6293
Location: Seattle
For the initial "write the same value to all four tiles of a metatile" idea, you could easily do something like the Oeka Kids boards, assuming that PPU A0 and A5 arrive at the same time as PPU A12: latching A0 and A5 on a rising edge of A12 and use that to select banks...

It would be nicer to only have to write 1 or 2 bytes, though. I don't see a way to handle that without a good deal more glue logic.


Top
 Profile  
 
PostPosted: Sun Aug 03, 2014 10:13 pm 
Offline

Joined: Mon Sep 27, 2004 2:57 pm
Posts: 1248
Just forgo the PPU's everything and manually encode nametables and attribute tables. :P One nametable with 16x16 tiles fills an entire four-screen, but you'd need 4x the pattern tables. (I don't think I'll add this, but it's still an idea for someone out there)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 40 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group