It is currently Tue Jul 23, 2019 1:54 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 54 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Apr 21, 2019 4:31 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8488
Location: Seattle
supercat wrote:
Does ensure that all displayed tiles get updated on the same frame?
I can't entirely tell, but it's definitely using the standard NES thing of keeping track of which tiles need to be updated and only updating those tiles. It's on MMC1, and CHR bankswitching is used extensively – possibly to mask tearing – but it's subtle enough I can't tell. No extra RAM.

Here's a longplay: https://www.youtube.com/watch?v=mLQzL8vsNVM

Quote:
Not familiar with [Driar].
Single-screen collection platformer.
Original
my NROM optimization.

Quote:
As I think about it, though, I wonder if the best way to make a cheap but versatile Nintendo cart might be to adapt the same approach used by the Atari 2600 melody cart, using one 70MHz ARM7TDMI or similar device on each bus, and maybe running an SPI port between them.
I've honestly been wondering why I haven't seen anyone do anything like the Harmony Cart on the NES. Is that extra 600kHz too much? Memory limits? Incompatible with existing library? Two independent buses?

Hard part is not just ending up with something ridiculous like this.

Quote:
[Not being able to move the DMA target] seems like a missed opportunity in the NES design.
Yeah...

Previous times I thought someone had said that the 2C02 can't keep up with OAM DMA's pace, and needed no faster than one byte every 3 CPU cycles. But right now, testing in Visual2C02 seems to imply it works?

Quote:
as compared with something like:
Code:
    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    lda $7C00,y ; uses LSB of address, plus last value accessed at $FC, plus $010000.
Ah, yes. I misunderstood. All clear. Somehow I'd misunderstood you to be talking about blocks of 128 bytes.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 7:29 am 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
lidnariq wrote:
supercat wrote:
Does ensure that all displayed tiles get updated on the same frame?
I can't entirely tell, but it's definitely using the standard NES thing of keeping track of which tiles need to be updated and only updating those tiles. It's on MMC1, and CHR bankswitching is used extensively – possibly to mask tearing – but it's subtle enough I can't tell. No extra RAM.

Here's a longplay: https://www.youtube.com/watch?v=mLQzL8vsNVM


Some interesting graphics changes compared to the C64 and Atari versions; I notice they lost the bonus levels that were 20 meta-tiles wide and designed to fit on a single screen, and also run at a much faster gametick rate than normal levels.

Looking at the video, it appears that is with the original the CHR bank switching is used to cycle among tile sets to animate everything in a fashion asynchronous to gameplay, to do things like make the diamonds sparkle or (in this version) make the rocks back back and forth. The entire nametable is getting redrawn every gametick over the course of several frames. Single-step through the video while watching the left and right edges of the screen during horizontal scrolling, and this effect will be visible.

Since Boulderdash tile-set animation is done asynchronously with regard to gameplay, it doesn't really matter if all of the tiles update at once. Ruby Runner, though, uses tile-set animation to smooth out motion and create "in-between" frames that must be synchronized with nametable updates. Since even a stock NES has two frames worth of tiles in the nametable, I don't think page-flipping should pose any difficulty; I'm curious why you think that's symptomatic of using a "wrong" approach.

My objective is to make Ruby Runner have a play mechanic similar to Boulderdash, but not exactly copying it (you'll notice, for example, that the mosnters in the animated .GIF move straight ahead if they can, while the Boulderdash monsters either follow the left wall or right wall), but with all of the objects animated to move smoothly, and also hopefully without any display glitches like the sides of the Boulderdash screen. Being able to draw all of the nametable tiles in a single frame would be convenient because it would allow the game logic to compute everything that will happen in a game tick if the player were to remain stationary, then draw all of the nametable entries, wait for the frame cycling animations, wait for the frame before the first animation frame of the next game tick, and read the controller for what's should happen on that game tick. The game logic would thus need to synchronize with video only once per game tick, and the game could run smoothly provided only that each gametick's worth of game logic was complete before it was time to show the first frame of the next game tick.

Being limited to updating a quarter of the name table per frame would require either having the game logic for each gametick finish four frames early, or else having a means of starting the name table updates before the game logic is done. The first may or may not adversely affect gameplay; the latter would add complexity.

My guess would be that if I limit boards to about the same size as Boulderdash I could probably get away with the first approach, but if I use an 8K RAM memory expansion so as to allow either boards, adding four extra frames per game tick could be annoying.

Quote:
Quote:
[Not being able to move the DMA target] seems like a missed opportunity in the NES design.
Yeah...

Previous times I thought someone had said that the 2C02 can't keep up with OAM DMA's pace, and needed no faster than one byte every 3 CPU cycles. But right now, testing in Visual2C02 seems to imply it works?


I wonder why the speed would be so limited, given that the PPU bus normally runs much so faster than that? Perhaps the NES was originally planned to have the CPU run much faster?

There's a lot of really good stuff in the NES design, but a few missteps with how things fit together. The biggest omission, IMHO, is probably the lack of any on-chip way of requesting an interrupt at a certain line, or at least finding out where the beam is. Ruby Runner on the 2600 didn't have any raster interrupts available to it, but it was able to find out how much time remained before the end of overscan or vblank, run the game processing loop until those times were close to used up, and then go into a polling loop to find the exact ends of those intervals. Having an address which, when read, would report half the number of the current scan line would have been enormously useful, and being able to make the NMI trip at a configurable line would have been even moreso. Starting blanking early and extending the end of it would have made it possible for games that need to perform more updates during vblank to actually do so.

Quote:
Quote:
as compared with something like:
Code:
    stx $FC ; Set bits 8-15 of address for $7C00-$7CFF region
    lda $7C00,y ; uses LSB of address, plus last value accessed at $FC, plus $010000.
Ah, yes. I misunderstood. All clear. Somehow I'd misunderstood you to be talking about blocks of 128 bytes.


So how do you like that idea now that you understand it? Having a larger contiguous regions banked in is useful for running code, or for objects that are going to be accessed via absolute indexed addressing modes, objects that would need to be accessed via indirect indexed addressing modes when using large-bank switching can be accessed more conveniently using absolute indexed addressing mode and page-level switching.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 11:59 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8488
Location: Seattle
Quote:
Some interesting graphics changes compared to the C64 and Atari versions; I notice they lost the bonus levels that were 20 meta-tiles wide and designed to fit on a single screen
NES only has 32 tiles / 16 attributes on a screen, unlike the C64's 40. I suppose they could have retained them, if scrolling were acceptable... probably not.

supercat wrote:
The entire nametable is getting redrawn every gametick over the course of several frames. Single-step through the video while watching the left and right edges of the screen during horizontal scrolling, and this effect will be visible.
No, that's not the same thing ... that's how all games on the NES have to do scrolling if the entire level doesn't fit in the available nametables. See the page on our wiki: nesdevwiki:File:NTS scrolling seam.gif.

Or look at the game in Mesen with the "PPU viewer" enabled. (Maybe also the "Event viewer").

Quote:
Since even a stock NES has two frames worth of tiles in the nametable, I don't think page-flipping should pose any difficulty; I'm curious why you think that's symptomatic of using a "wrong" approach.
Because when you repurpose the PPU's scrolling registers to act as double-buffering, it means you can't use the NES's scrolling hardware, which was the thing that made the NES meaningfully different from its predecessors. (There'd been consoles with tilemaps before. The C64 had sub-tile scroll. But the Famicom was the first widespread commercial device to allow both sub-tile and tile-level scrolling and enough tilemap memory for that to be useful)

Now, that said, a few mappers instead let the mapper IC control which of the two nametables are being used at any given moment. It turns out that this commercial release of Boulder Dash actually runs in this 1-screen mode, which is why you saw scrolling seams on all edges.

And in the case of the commercial release, which needs 1KiB just to hold the level state (look in memory from $3E0 to $74F), they couldn't justify the cost of an extra RAM just for an unrolled copy.

Quote:
I wonder why the speed would be so limited, given that the PPU bus normally runs much so faster than that? Perhaps the NES was originally planned to have the CPU run much faster?
It's an entirely independent FSM. If you try to read or write to $2007 during rendering the result on the outputs will be some combination of the two FSMs, smearing data and address across itself as ALE and /WR or /RD are active at the same time.

Quote:
There's a lot of really good stuff in the NES design, but a few missteps with how things fit together.
I'd say a lot more than a few. The original design would have at least had a programmable interval timer, but it was defective and removed instead of fixed in later silicon versions.

Quote:
Ruby Runner on the 2600 didn't have any raster interrupts available to it, but it was able to find out how much time remained before the end of overscan or vblank, run the game processing loop until those times were close to used up, and then go into a polling loop to find the exact ends of those intervals.
Sure? But the 2600 has any timers at all. The NES just lets you misuse the DAC FIFO empty IRQ...

Quote:
So how do you like that idea now that you understand it?
It is a delightful gem to work around the slowness of the indirect modes.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 1:50 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
lidnariq wrote:
supercat wrote:
The entire nametable is getting redrawn every gametick over the course of several frames. Single-step through the video while watching the left and right edges of the screen during horizontal scrolling, and this effect will be visible.
No, that's not the same thing ... that's how all games on the NES have to do scrolling if the entire level doesn't fit in the available nametables. See the page on our wiki: nesdevwiki:File:NTS scrolling seam.gif.

Or look at the game in Mesen with the "PPU viewer" enabled. (Maybe also the "Event viewer").


Hmm... if you single-step through the video of world 3-1, the system generally takes multiple frames to process all the name table updates associated with each game tick even in cases where most entries stay the same. It also takes multiple frames to draw the column of tiles which needs to be updated during a side scroll. I thought the multi-frame updates were indicative of blindly copying everything, but it seems the game is using a slow partial-update routine that takes about as long as blindly copying everything would.

Quote:
Quote:
Since even a stock NES has two frames worth of tiles in the nametable, I don't think page-flipping should pose any difficulty; I'm curious why you think that's symptomatic of using a "wrong" approach.
Because when you repurpose the PPU's scrolling registers to act as double-buffering, it means you can't use the NES's scrolling hardware, which was the thing that made the NES meaningfully different from its predecessors. (There'd been consoles with tilemaps before. The C64 had sub-tile scroll. But the Famicom was the first widespread commercial device to allow both sub-tile and tile-level scrolling and enough tilemap memory for that to be useful)


The C64 and Atari 400/800 could easily update their "name tables" fast enough to allow continuous smooth scrolling. On the NES, that would be harder and require quite a bit more code, and would also require using some sprites to mask the right edge of the screen, so for cases where "NES-style" scrolling would be adequate, it would likely be preferable.

Quote:
And in the case of the commercial release, which needs 1KiB just to hold the level state (look in memory from $3E0 to $74F), they couldn't justify the cost of an extra RAM just for an unrolled copy.


If one updates four rows of meta-tiles per frame using an unrolled loop, one would need a 64-byte buffer to accommodate that. That hardly seems excessive.

Quote:
Quote:
Ruby Runner on the 2600 didn't have any raster interrupts available to it, but it was able to find out how much time remained before the end of overscan or vblank, run the game processing loop until those times were close to used up, and then go into a polling loop to find the exact ends of those intervals.
Sure? But the 2600 has any timers at all. The NES just lets you misuse the DAC FIFO empty IRQ...


The 2600 has a Ram/I/O/Timer chip with a timer that can measure duration up to about half a frame with units of 64 cycles.

Quote:
Quote:
So how do you like that idea now that you understand it?
It is a delightful gem to work around the slowness of the indirect modes.


I wonder why I've not seen that approach used on any banking designs other than my own?


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 2:58 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8488
Location: Seattle
supercat wrote:
Hmm... if you single-step through the video of world 3-1, the system generally takes multiple frames to process all the name table updates associated with each game tick even in cases where most entries stay the same. It also takes multiple frames to draw the column of tiles which needs to be updated during a side scroll. I thought the multi-frame updates were indicative of blindly copying everything, but it seems the game is using a slow partial-update routine that takes about as long as blindly copying everything would.
Yeah, I think they could have done better even without resorting to blind copies.

Maybe the engine still runs on fours, but they deliberately smeared the updates across multiple refreshes to make it feel less quantized? But I bet they just didn't see the need to make it better.

Quote:
If one updates four rows of meta-tiles per frame using an unrolled loop, one would need a 64-byte buffer to accommodate that. That hardly seems excessive.
But you won't have the CPU time to translate metatiles at the same time you're uploading things to the PPU...?

I mean, you can put the unrolled loop in ROM and have it copy bytes from RAM. Slows you down to only 217 in-order bytes in vblank ((20·341÷3 - 514(OAMDMA))÷8 - 8(set scroll)).

Quote:
I wonder why I've not seen that approach used on any banking designs other than my own?
I have to assume that people just didn't think of it.

Maybe it's that it's entirely orthogonal to what banking normally does ... Normally banking is a work-around to being able to address more total address space, but your technique is instead a work-around for a different structural deficiency of the 6502.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 3:40 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21511
Location: NE Indiana, USA (NTSC)
That and it takes more mapper registers to hold more bank bits and more I/Os to control more address lines.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 3:48 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
lidnariq wrote:
Quote:
If one updates four rows of meta-tiles per frame using an unrolled loop, one would need a 64-byte buffer to accommodate that. That hardly seems excessive.
But you won't have the CPU time to translate metatiles at the same time you're uploading things to the PPU...?


The average speed of
Code:
; Top half of first line
  lda $C0
  sta $2007
  eor #1
  sta $2007
  lda $C1
  sta $2007
  ...
  lda $CF
  sta $2007
  eor #1
  sta $2007
; Bottom half of first line
  lda $C0
  eor #2
  sta $2007
  eor #1
  sta $2007
  lda $C1
  eor #2
  sta $2007

ends up the same as if all of the tiles were stored individually in zero page. Within each group of four tiles, the upper-left corner takes 7 cycles, the upper-right corner takes 6, the lower-left 9, and the lower-right 6. 7+6+9+6 is 28, the same time as would be needed to fetch each byte individually from zero page.

Quote:
I mean, you can put the unrolled loop in ROM and have it copy bytes from RAM. Slows you down to only 217 in-order bytes in vblank ((20·341÷3 - 514(OAMDMA))÷8 - 8(set scroll)).


I'd figured 256 should work. Since the only sprites would be the player sprite, the score, and the side masks, and only the player sprite would need frequent updates, I was figuring on something like:
Code:
   ldx #0
   stx OAMADDR
   ldy playerY
   sty OAMDATA
   lda #3
   sta OAMADDR
   lda playerX1
   sta OAMDATA
   sty OAMDATA
   lda #7
   sta OAMADDR
   lda playerX2
   sta OAMDATA
   stx OAMADDR

That should cost a lot less than 514 cycles.

Quote:
Quote:
I wonder why I've not seen that approach used on any banking designs other than my own?
I have to assume that people just didn't think of it.

Maybe it's that it's entirely orthogonal to what banking normally does ... Normally banking is a work-around to being able to address more total address space, but your technique is instead a work-around for a different structural deficiency of the 6502.


Probably, but such a design could make a lot of things more efficient on the NES. Games that use bitmap displays could probably benefit from a little assistance there. On my 2600 cart, it's possible to set up a 96x200 bitmap display using stripes that run down 12 pages of RAM, and then plot a pixel at x,y with simply:
Code:
    lda $7F00,x ; Load mask and switch bank to proper stripe [$7F00-$7FFF triggers banking strobes]
    ora $7E00,y ; Mix with data at address Y of stripe
    sta $7E00,y ; Store it back

I've never seen any 6502-based pixel-plotting code faster than that.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 4:21 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8488
Location: Seattle
supercat wrote:
Since the only sprites would be the player sprite, the score, and the side masks, and only the player sprite would need frequent updates, I was figuring on something like:
Sadly OAMADDR is buggy.

You can update the first 7 bytes safely, only in order, only by relying on OAMADDR being zero when rendering turns off naturally for vblanking. Otherwise you basically have to use OAMDMA.

Or you could make a PAL-only release, where they fixed the bug :P

Quote:
Probably, but such a design could make a lot of things more efficient on the NES. Games that use bitmap displays could probably benefit from a little assistance there.
Approximately no games do. I think the CPU-to-PPU bandwidth was limited enough that there was every reason to avoid it, and the games that strictly need to not be in a tilemap were either modified heavily or didn't see a port.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 4:45 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21511
Location: NE Indiana, USA (NTSC)
"Approximately no games" use bitmap-style backgrounds or software composited sprites. But I can think of a few rounding errors that you might enjoy:

  • Licensed in the US market: Qix, Videomation, Faxanadu, Hatris, Color a Dinosaur, Solstice, Shanghai II
  • Europe exclusive, benefiting from longer vblank: Elite
  • Canceled, prototype discovered later: Block Out
  • Japan only: Oeka Kids, Cocoron, Final Fantasy II
  • East Asia: 3D Block
  • Homebrew: All Action 53 volumes, Nova the Squirrel

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 7:35 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
tepples wrote:
"Approximately no games" use bitmap-style backgrounds or software composited sprites. But I can think of a few rounding errors that you might enjoy:

  • Licensed in the US market: Qix, Videomation, Faxanadu, Hatris, Color a Dinosaur, Solstice, Shanghai II
  • Europe exclusive, benefiting from longer vblank: Elite
  • Canceled, prototype discovered later: Block Out
  • Japan only: Oeka Kids, Cocoron, Final Fantasy II
  • East Asia: 3D Block
  • Homebrew: All Action 53 volumes, Nova the Squirrel


I was thinking most notably of Elite. I'm not sure something like that would be possible on a system with an NTSC vblank unless it had two blocks of memory which could be switched between the CPU or PPU bus, which would require a fair number of multiplexer chips, but then again I'm not sure how Elite manages to obtain any kind of reasonable performance even *with * a PAL vblank. Any idea what it's doing?


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 7:42 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21511
Location: NE Indiana, USA (NTSC)
The same author's "Tank demo" dynamically allocates tiles in RAM, draws to them, copies them and the associated tilemap to VRAM during vblank, and double buffers CHR using the palette. Bit plane 0 is drawn using [black, white, black, white], and bit plane 1 is drawn using [black, black, white, white].

By "dynamically allocates tiles" I mean this: It keeps a tilemap in RAM storing which tile number corresponds to each (x, y) tile position. When drawing a pixel into a tile, it first checks whether a tile is allocated for that (x, y) position, and if not, allocates the next unused tile. Because of the sparse nature of this vector-style geometry, it's unlikely for all tiles in the viewport to get allocated as nonblank.

Other ways to improve video memory bandwidth are to disable rendering early and enable rendering late. This can become very tricky, as doing so requires working around quirks of the OAM DRAM controller.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 9:37 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
lidnariq wrote:
supercat wrote:
Since the only sprites would be the player sprite, the score, and the side masks, and only the player sprite would need frequent updates, I was figuring on something like:
Sadly OAMADDR is buggy.

You can update the first 7 bytes safely, only in order, only by relying on OAMADDR being zero when rendering turns off naturally for vblanking. Otherwise you basically have to use OAMDMA.

Or you could make a PAL-only release, where they fixed the bug :P


Bummer. It would have been nice to avoid having to blow 256 bytes of storage on the OAM. If I need to update two sprites on a frame, would that mean that I'd have to drop back to updating three rows of tiles per frame instead of four? That might not be the worst thing in the world, since the game could probably be pretty zippy even if it wastes five frames per gametick essentially waiting for vblank. Still, it does seem a bit icky.


Top
 Profile  
 
PostPosted: Mon Apr 22, 2019 9:48 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8488
Location: Seattle
Hm. Thinking closely about the bug, maybe there's a goofy workaround...

So, there's two halves to the bug:
1- if you write to OAMADDR, on several CPU-PPU alignments, it'll smear data from one row of OAM DRAM with another.
2- if you leave OAMADDR at a value of 8 or higher, it'll copy the eight bytes from that row of DRAM over the first eight bytes.

So...
if you write eight padding values...
then the eight values you want...
you'll have the two sprites you want in slots 2 and 3, and whatever had been in slots 4 and 5 is copied on top of slots 0 and 1.


Top
 Profile  
 
PostPosted: Tue Apr 23, 2019 7:25 am 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
lidnariq wrote:
Hm. Thinking closely about the bug, maybe there's a goofy workaround...

So, there's two halves to the bug:
1- if you write to OAMADDR, on several CPU-PPU alignments, it'll smear data from one row of OAM DRAM with another.
2- if you leave OAMADDR at a value of 8 or higher, it'll copy the eight bytes from that row of DRAM over the first eight bytes.

So...
if you write eight padding values...
then the eight values you want...
you'll have the two sprites you want in slots 2 and 3, and whatever had been in slots 4 and 5 is copied on top of slots 0 and 1.


That would seem likely to work, but if there's a race condition between the DRAM machinery and the CPU cycles, things may appear to work under most conditions, but fail on some machines under certain temperature conditions, phases of the moon, etc. I wouldn't trust any workaround that couldn't be justified based upon "analog" transistor-level simulation of the components involved.

One weird quirk about DRAM is that the process of reading a row into a buffer corrupts the data on that row within the array. Normally the corruption isn't a problem because the buffer will get written back to the row, but if things are disrupted so read occurs without the writeback, the row would likely be corrupted. Depending upon the design of the DRAM, such corruption might only be capable of turning ones into zeroes, only turning zeroes into ones, or doing an arbitrary mixture of both.

Think of DRAM as being a system of reservoirs connected via gates to canals. If the drain is opened on a canal and a gate is opened to the reservoir, the reservoir will be emptied. If a canal is connected to a lake with a water level of 3m and a gate is opened on the reservoir, the reservoir will fill to 3m. Writing is thus pretty simple. Reading, though, is harder. If one were to empty the canal but close the drain, and then opened the gate to a reservoir, the water level in the canal would go up if there was water in the reservoir, but it wouldn't go up to 3m. If the surface area of the canal were equal to that of the reservoir, the level in the canal would go up to 1.5m while the level in the reservoir would go down to 1.5m.

In most DRAM chips, however, the "area" of the array is orders of magnitude larger than that of any individual reservoir. If the canal were drained to zero before reading, the canal would end up with a depth of 0.00m if the reservoir had been empty, but only 0.01m if it had been full. It's hard to tell the difference between something not going up at all, versus it going up 0.01m. It turns out to be much easier to instead start with the canal filled to half depth, and then check whether the level goes up or down. Ideally, the circuit would be precisely balanced so that going up by even a micron would read as "1", and going down by even a micron would read as "0", but in practice circuits aren't going to be perfectly balanced so something which doesn't move meaningfully could read arbitrarily as 1 or 0.

I would guess that the DRAM array in the OAM is small enough that it probably doesn't use half-level biasing and, as a consequence, any splatted reads would only be capable of turning 0's to 1's, or only turning 1's to zeroes. If only the former can occur, a Y coordinate written as FF should remain FF. If the latter, a tile number written as 00 should remain 00. I don't know enough about the actual design, however, to know whether half-level biasing could result in other corruption patterns.


Top
 Profile  
 
PostPosted: Tue Apr 23, 2019 9:29 am 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 151
lidnariq wrote:
I've honestly been wondering why I haven't seen anyone do anything like the Harmony Cart on the NES. Is that extra 600kHz too much? Memory limits? Incompatible with existing library? Two independent buses?

Hard part is not just ending up with something ridiculous like this.


I don't think a Harmony-style approach could work very well with one microcontroller monitoring both buses. The chip it uses has 32KB of flash and 8KB of RAM, which would be a bit small for "main CPU RAM", but would be enough for many kinds of mapper, especially if one added an external serial flash chip. If one didn't need to show anything too high up on the frame, it would probably be possible to load a substantial amount (256 bytes or more) from the external flash chip every frame.

A major difference between a Harmony-style cart and a typical mapper, though, would be that most mapper designs have the main CPU control the banking on the CPU side, but on a Harmony-style mapper it would be awkward to have the cart interact with the main CPU bus in any fashion. For homebrews this would be fine, but I no of no existing mappers that use that approach. Even MMC2 and MMC4, which support bank-switch, tiles use CPU-bus writes to control most mapping functions. I would guess the most practical way of doing things would probably be to have the main CPU write to PPU address range $3000-$3EFE to control things.

I agree with you that a big design challenge would be designing a mapper which is versatile, but retains the flavor of NES programming. Perhaps that could be encouraged by "standardizing" a VM language for emulators with an instruction set that's focused on the kinds of things that would typically be done in a hardware or a CPLD (e.g. take bits a..b of register c and merge them using mode d [chosen from and, or, xor, etc.] with some bits starting at e from register f). While it might be possible in theory to express in such code anything that could be done on the ARM, it would be faster to emulate than the ARM code, and anyone wanting to go crazy on the ARM would also have to go equally crazy in the emulator bytecode.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 54 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group