It is currently Mon Nov 20, 2017 2:43 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

Minimum randomly-accessable memory
32kB 33%  33%  [ 5 ]
64kB 7%  7%  [ 1 ]
128kB 33%  33%  [ 5 ]
256kB 13%  13%  [ 2 ]
512kB 7%  7%  [ 1 ]
>512kB 7%  7%  [ 1 ]
Total votes : 15
Author Message
PostPosted: Tue Feb 02, 2016 2:27 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3484
Location: Indianapolis
Lets say that hypothetically, there could be a cartridge set up similar to the Famicom Disk System. Where you have RAM instead of ROM, and you load that RAM from a sort of mass storage. But unlike the FDS, loading would be much quicker.. say 2kB~3kB per frame. Absolute minimum access time would be 1 byte every 8 CPU cycles, surely fast enough for real-time streaming of data (with auto-incrementing address). Of course, if you need to index that data in non-sequential order, or execute it as code, then you would need to load it into CPU memory first.

The question I have, is what would be a good minimum of RAM to provide to the NES in this situation? For this question, here are some factors to consider:
  • assume smaller CPU RAM greatly allows other mapper features, finer sized CHR pages, lowers hardware cost, etc.
  • 8 CPU cycles per byte access is fast enough to load CHR-RAM even with an unrolled loop (LDA absolute, STA $2007), some data like pattern tables and possibly raw nametables wouldn't even need to be in CPU memory ever.
  • assume it could takes 3, 2 or just 1 write to set the address (if it could preserve your previously used address)
  • assume the 'mass storage' source is huge.. megabytes at the minimum, and inexpensive

This is totally different from a normal type of NES cart, so try to give some thought as to how data you would need to randomly index, and how much of that you would need immediate access to, in between loading times.

In my opinion, previously I had thought 128kB would be a good minimum, but really for quite a few NES games that is the entirety of CPU memory for the whole game. So I'm not so sure that an even smaller amount of RAM like 64kB is unrealistic. 32kB just seems kinda dinky though, but I put it in the poll anyways. Any thoughts?

Maybe I should have put this in NESdev instead of Hardware, I'm really trying to address the software aspects of this, not the hardware. I might move it later.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 5:03 am 
Offline
User avatar

Joined: Sat Jul 25, 2015 1:22 pm
Posts: 501
I'm pretty excited about this. Let me see if I understand the question properly.

My understanding:

The cart has mass-storage which holds all PRG and CHR data.

Data copying to RAM can occur behind the scenes with only a mapper register write.

Copying a routine that will be executed will require loading the data into RAM before the code is executed.

Accessing data from random places in a bank will require preloading, as you never know where in the bank the first location to access will be positioned.

Data accessed in a linear fashion at a rate >= 8 cycles per byte can be streamed without significant impact on CPU memory space.

I'd like to try and consider this from as many angles as I can given the information provided.

My first impulse is that this is a per-level load, since the way I currently use bankswapping on an MMC3 would require instant access to the map bank or the object bank. From there, I need instant access to my game code, level bank, object bank, DPCM, and any supplementary code.

The way I way I can structure the data changes greatly. Fixed bank sizes are a thing of the past, so I could create an engine to populate my CPU RAM with the best spacing for the data exclusive to my level. I won't have to store additional levels in the same level bank, so I can cram the data for each level in the smallest space possible. I could write a routine to rewrite all of my immediate address calls to their new addresses. With the size of the mass storage, if somebody wanted to get their game done more quickly, they just store each level linearly, with a new copy of their game engine along with each level, or perhaps unique code for different levels.

I don't know much about the NES audio hardware yet, but I'm guessing that DPCM data could be streamed in this fashion. If that's so, then it should only need a very small space in which to load, constantly rewriting over itself. The amount of space consumed by DPCM during the entire frame is a major contributing factor to running out of PRG space. Other banks you can swap out when unused but not DPCM.

I tend to want to low on this so I'd see the benefits in other areas:

Memblers wrote:
assume smaller CPU RAM greatly allows other mapper features, finer sized CHR pages, lowers hardware cost, etc.


The CHR and how that works will be more important to me than PRG size. Code size, you can almost always find ways to reduce, especially given the large array of options this mapper would present. However, graphics, like DPCM, they always have to be there during the frame. I almost feel like I have to know for certain how the CHR will work and how much of that will have to pass through CPU RAM. If I'd like to store a hypothetical maximum of 128KB CHR on a per-load basis, how does that affect my CPU RAM requirements?

All of this considered, I wonder how substantial of a difference in other areas would I see from a total lack of banking for CPU RAM, ie 32 KB. If I can load my per-level program data without extraneous information, 32 KB may be workable for me for PRG. I'm still a little fuzzy on how this would affect CHR though so I'm not casting my vote yet.

Quote:
8 CPU cycles per byte access is fast enough to load CHR-RAM even with an unrolled loop (LDA absolute, STA $2007), some data like pattern tables and possibly raw nametables wouldn't even need to be in CPU memory ever.


If I'm understanding this properly, I'd need to start a background copy to a certain location, allocate probably 256 bytes for buffer (so it will wrap easily), and begin loading my tiles in CPU RAM while writing them to VRAM. This contradicts what you said though, so I believe I am missing something in my interpretation.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 7:15 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19233
Location: NE Indiana, USA (NTSC)
So finally someone is deciding to make the "poor man's FDS" that I've been suggesting for years. I voted for a 128K PRG RAM, and I think it should be paired with a 32K CHR RAM.

"assume it could takes 3, 2 or just 1 write to set the address"
With how much delay between setting the address and the data becoming available? And with 1-byte or 512-byte granularity? If immediate and 1-byte, zzo38 would love this, as it'd let him keep his Z-machine interpreter's story file in sequential ROM.

I'm not sure data copying would be so "behind the scenes" as much as just fast PIO (programmed input and output). It'd be more of an unrolled copy loop. For uncompressed tile data:
Code:
  ldx #number_of_tiles
  copyloop:
    .repeat 16
      lda SDDATA
      sta PPUDATA
    .endrepeat
    dex
    bne copyloop

Copying to main RAM would be slightly slower:
Code:
  ldx #>data_length
  ldy #0
  pageloop:
    .repeat 8
      lda SDDATA
      sta (dstptr),y
      iny
    .endrepeat
    bne pageloop
    dex
    bne pageloop


With FDS-style 32K PRG RAM, you can't (say) bankswitch DPCM samples, and with 8K CHR RAM, you can't as easily do SMB2/3 style animated background tiles. But SRAMs are sold as a power of four bits, meaning the next sizes up are 128K PRG RAM and 32K CHR RAM, which should be enough for anything I can think of. I remember having worked on a handful of projects bigger than that, but they could be reengineered to fit.
  • "All Your Base" slideshow viewer: large CHR ROM, but accessed sequentially and thus suitable for double buffering.
  • "Breaking the Law" and "Max 300" sample players: lots of samples, but likewise accessed sequentially.
  • Action 53 vol. 1 and 2: 512K PRG ROM, though the biggest single activity in either of them is 128K. Loading a new activity would be entirely sequential.
  • E-book reader: I was planning a compression scheme that represents text as a sequence of indices into a dictionary up to 64K in size. I'd need random access to the dictionary, but the compressed text is accessed sequentially within each page.
  • Haunted: Halloween '85: Three-fourths of the 512K ROM is background maps, each 16K at the most and accessed sequentially, plus cut scene tiles, also loaded sequentially. What need to be kept in RAM are sprite tiles, collision maps, music, and game code, which total probably under 80K. Depending on the granularity, I might be able to move the sprite tiles and the like out of main memory, barely allowing everything to fit in 64K.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 7:31 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10114
Location: Rio de Janeiro - Brazil
The answer will vary greatly depending on the types of games people want to make. For the majority of homebrew games released so far, I'd say even 32KB should be enough, but imposing such a limitation would mean discouraging people from trying to make more advanced games.

I for example am very excited to see 16-bit-like games running on the NES, kinda like the games Hong Kong pirates used to make, but better coded. That usually means larger level maps, more enemies per level, more game state, more complex physics, all things that would have to be readily accessible to the CPU. In a lot of cases, if the programmer is clever, I'd say 64KB would be enough, but to make sure the engine design process won't be a chore and to leave some room for improvement and experimentation of new ideas, 128KB would be ideal.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 9:00 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19233
Location: NE Indiana, USA (NTSC)
I found previous posts about this concept:


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 9:25 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7270
Location: Chexbres, VD, Switzerland
If the loading is extremely fast, then I don't see any point in having more than the good old traditional 32k PRG and 8k CHR. If there is any more, bankswitching will be required, but personally I think it would be confusing to have both bankswitching and your super-modern-ultra-fast loading system. The fast loading system is supposed to replace bankswitching, and as such, the "old" bankswitching is undersirable, as such you'd get 32kb PRG and 8k CHR like the FDS.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 9:31 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19233
Location: NE Indiana, USA (NTSC)
Bregalad wrote:
If the loading is extremely fast, then I don't see any point in having more than the good old traditional 32k PRG and 8k CHR.

In theory, "extremely fast" would involve loading an entire 1024-byte bank into CHR RAM between one scanline and the next. See Cosmic Epsilon for an extreme example, or just any MMC1 or MMC3 game that uses different CHR ROM banks for the playfield and status bar, such as Super Mario Bros. 3. And good luck fitting the title screen of Smash TV into 8K. I don't think this system was intended to be that fast, unless I was misreading something.

Image
This title screen has way more than 256 unique tiles


And if it's foreground copying (as I suspect), and a game uses Sunsoft bass, prepare to spend a lot of time copying samples into $C000-$DFFF.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 10:18 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10114
Location: Rio de Janeiro - Brazil
Bregalad wrote:
I don't see any point in having more than the good old traditional 32k PRG

You might need more than that depending on the complexity of the game world, specially if a good chunk of $C000-$FFFF is used for samples.

Large game worlds, like those in the Sonic the Hedgehog games, need a lot of space (I just checked, and over 40KB of RAM are dedicated just for the current level map in Sonic games). And when you have several layers of structures/metatiles that depend on each other, you can't progressively decompress small sections of the level, you need immediate access to all structures because they can be indexed at any time. Of course you can sacrifice some things and have it fit in less space, but since were talking about new hardware, I don't see why not bet in versatility. With more resources, you can come up with a greater number of ways to implement features, and you can get things done without having to come up with crazy clever ways to do everything, that might still have undesirable restrictions.

In addition to level maps, several other things can consume a lot of space as the complexity of the game grows: logic for all objects (or would you try to make the object routines relocatable so you'd only need to load the code for objects that are actually present in each level?), sprite animations, music code and data... I can easily see all of that going over 32KB in any game more complex-looking than NROM/CNROM games. I maintain the opinion that 128KB should be enough for programmers not to feel trapped even when working on bigger, more 16-bit-like games, as long as they're not making SimCity or anything that needs obscene amounts of game state.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 10:34 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10114
Location: Rio de Janeiro - Brazil
As for the amount of CHR-RAM, more than 32KB is only really useful IMO if it's divided in banks of 2KB (or less). If you can only change the whole 8KB at once, you're restricted to buffering only one type of animation, while everything else is repeated. For example, you can dedicate 1KB of each 8KB bank to the player character, for a total of 4KB worth of player tiles, but you can't change anything else because nothing animates in sync with the player. Another thing you can do is change the background slightly on each bank and have 4 frames of waterfals, fluffy clouds, sparkling water, swaying leaves... but then you can't even upload new tiles for the player character, because the player doesn't animate in sync with the background. This makes 8KB CHR banking pretty much useless in a typical game scenario. It can still be useful for cutscenes, FMV sequences, 3D rendering, or other experimental stuff, but that's hardly the point.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 1:00 pm 
Offline
User avatar

Joined: Sat Jul 25, 2015 1:22 pm
Posts: 501
tepples wrote:
I'm not sure data copying would be so "behind the scenes" as much as just fast PIO (programmed input and output). It'd be more of an unrolled copy loop. For uncompressed tile data:

If I'm understanding this correctly, you'd have one address exposed for LDA to a mapper register, and another set of registers for the value of this address. Loading from the read register would cause the address counter register to auto-increment.

If this is correct, could it possible be cheap to allow for specifying an increment value? This would be similar to the 32-byte increment for VRAM, and would allow for routines that read from the mass storage directly. If this was possible, then map data would never have to be buffered into PRG-RAM. If one was scrolling horizontally on a map with 16x16 metatiles, one could specific an increment of 16 and load a column of metatile numbers directly.

Assuming all of this is accurate, I could see an arbitrary increment value being WAY more useful than having any mapping on PRG. Think about all of the ways you could use it. It would become an index register for your mass storage reads. It would be like you can access megabytes of data with indirect indexed precision. Map data, object data, virtually anything I can imagine aside from code could be stored on mass storage. Also, you're getting the functionality of a LDA (indirect), indexed, at the speed of a LDA absolute.

Is there any hope that this would be possible?


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 1:30 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6446
Location: UK (temporarily)
Memblers wrote:
  • 8 CPU cycles per byte access is fast enough to load CHR-RAM even with an unrolled loop (LDA absolute, STA $2007), some data like pattern tables and possibly raw nametables wouldn't even need to be in CPU memory ever.
  • assume it could takes 3, 2 or just 1 write to set the address (if it could preserve your previously used address)
  • assume the 'mass storage' source is huge.. megabytes at the minimum, and inexpensive

I just want to point out that, even as non-preferred as CompactFlash is, this is a CompactFlash card, almost exactly. It already supports streaming, it already supports fast I/O, it's already 5V tolerant, &c. The only downside is its higher-per-byte cost than other Flash storage media.


Top
 Profile  
 
PostPosted: Tue Feb 02, 2016 2:17 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19233
Location: NE Indiana, USA (NTSC)
The "8 cycles per read" gave me the impression that it was some sort of serial flash, like eMMC, with a CPLD or MCU reading a bit every M2 and collecting them in a shift register.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 2:33 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3484
Location: Indianapolis
Thanks for the input. The setup would be precisely as tepples said, SPI flash clocked by M2, which you read/write 8 bits at a time through a mapper register. microSD connectors can be cheap, so it might be nice to have that as an optional add-on, in case anyone wanted to go nuts on storage. For the SPI flash I'd be looking at doing 2Mbytes as the standard offering, bigger sizes are available and smaller ones aren't cheap enough to really matter.

I guess I'm not too surprised that a minimal setup has so many votes, as this sorta does replace bankswitching. I think most games could fit most code and data for a single level/area in 32kB. Kinda interesting too that some stuff, like CHR data would actually be faster to load this way because you would have auto-incrementing on the source and destination addresses.

Though I can see the appeal of a very basic mapper only with lots of ROM, I kinda feel like I don't want to make the mapper too simple, to the point where it overlaps with the GTROM board. I think most stuff with simple mapper requirements can fit in GTROM's 512kB, and that will remain as my low-cost board for the foreseeable future. If orders keep coming for it, eventually I'll be able to do a second run, then I can write-off the setup fees and test fixture costs and lower the price a bit.. approaching a single digit $ cost for an assembled/tested/programmed board for even a small quantity purchase.

For this mapper, I'd be looking at keeping the mapper budget under $4 (cost of the mapper chips + 3.3V regulator alone). At this cost I can make the MMC3 look weak by comparison, and I've got some novel ideas too, otherwise I wouldn't even bother because what's the fun in making another clone mapper, heheh. :)

Nevermind what I said about 1,2 or 3 writes for setting the address, after I posted I realized it would be a big waste of resources to have this mapper handle any of that. I think you would actually have to write 6 times to change the address, but I may be able to reduce that to 5. So that's kind of a pain, but at least you can read all you want after that address is set. And it's a lot better than doing it serially, which would require more like 34 writes.

With the memory sizes, I mentioned 64kB because even though it's an odd size, I've come across a virtual lifetime supply of old surplus 64kB RAMs and I'm really tempted to use those for both PRG and CHR. Inevitably those would run out for one reason or another, and then I'd probably redesign it for 128kB. So if I have room I'll either leave an extra bit in every register, or if I have to, just have a single 'meta-banking' bit that selects the upper/lower 64kB of a larger memory. I'm certainly tempted to go this way, and save a couple bucks per board for the time being. Unless there any major objections to 64kB instead of 128kB. Decisions, decisions.. but I'll have more time to think it over while I work on the mapper implementation.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 7:14 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19233
Location: NE Indiana, USA (NTSC)
I didn't even know they made 64Kx8 SRAMs. I thought it was either 32K or 128K.

Where does the reset vector come from? I thought unlike a Super NES cart, an NES cart couldn't hold the system in reset until the boot sector was loaded into RAM.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 7:33 am 
Offline
User avatar

Joined: Sat Jul 25, 2015 1:22 pm
Posts: 501
I think I could probably do my game with 64KB for CHR. That being said, I feel like it could become a restriction on some graphically intensive games.

Metal Storm swaps through 64KB of BG tiles on a continuous basis throughout gameplay.

Playing to the strengths of the NES, you can't make really big characters, and you can't make really colorful characters, but you can play a lot of frames of animation really fast because of CHR banking.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group