It is currently Fri Oct 19, 2018 10:35 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Fri Sep 07, 2018 10:45 am 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 1783
Location: Gothenburg, Sweden
Preamble - skip if uninteresting.
In my note book of wacky designs to maybe fill out some day, there's a note about using a dual ported ram wrapped in two separate read and write indexes going round and round to make an audio delay/loop unit (as an alternative to a bucket brigade delay or a conventional ram based digital delay unit). The distance of the two indexes determine the delay length. if not writing new data and if creating an artificial wraparoind, you get audio looping with settable loop lengths. The dual ports also means you could get away with the read and write business at half the clock frequency since you can do both simultaneously with two separate circuits with its own dual port ram access as long as you dont read the same cell at the same time. I don't really know if that'll work out nicely unless i can find the time to put some actual effort into it eventually, but it got me thinking about something more NES related*: An onboard stack pointer register into wram.

*well i guess something similar could be used to create a delay/looper unit to expand the nes APU without needing an expansion synth. For example laying down a basic bass, drum or chord pattern that loops, or free a channel from delay duties. The question is if the total circuit can get cheaper than the $6 dollars needed for a bucket brigade delay + clock. Both needs some passive filtering to sound nice, but that's the smallest cost component.

===

The philosophical, more nes dev related question.
Would it be useful to have a mapper with a hardware-accelerated additional stack? That is, you have a r/w register that'll also increase/decrease an onboard index register used for pushing/pulling the stack-in-wram.

Compared to just doing it manually in ram or exploiting the pc stack. An obvious downside is having to use wram in the first place. For someone going for wram anyway, it might be nice.

What sort of usage, if any, would benefit from having another hardware-driven stack?

_________________
http://www.frankengraphics.com - personal NES blog


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 2:27 pm 
Offline
User avatar

Joined: Sat Aug 15, 2015 3:42 pm
Posts: 147
Location: France
I like the idea. Do you have some code example of how it could be use?

_________________
My first game : Twin Dragons available at Broke Studio.


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 3:17 pm 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 1783
Location: Gothenburg, Sweden
Which of them? :mrgreen:

If the additional hardware stack question, that's the question... this is not something i've touched before. I guess it could make a C implementation a little simpler?

interfacing with the register responsible for the on-cartridge stack is always going to take 4 cycles. This is as efficient as pulling things off the PC stack (4), but a little less efficient than pushing things on it (3).
It is on par with just doing indexed reads into some unused part of ram. So we're not gaining anything instruction-for instruction.

But for nesting routines and wanting to have a(n otherwise virtual) data stack, this should be more convenient than storing stuff manually in ram. well, you still need to push and pull it manually, but you wouldn't need to index it manually and keep track of the index.



If you meant the looper/delay conceptualized around a dual ported RAM device, it's just a loose sketch for my synth building hobby (no software involved), although it could perhaps apply to a NES cartridge. The halved clock rate could potentially help with lowering power consumption if anything mosfet-related needs to do fast switching. This ram wouldn't need to be visible by the cpu at all btw. It's just for the audio path.



A third, much simpler project is the one i just wrote a line about in another thread is how simple it'd be to add suboctaves to the channels of the APU (to the pulses primarily). It's like ~0.25€ per channel for the effect itself, and then the cost of the interface to be able to gate or volume control it.

To be able to preview the effect, one would need to write a new module for famitracker with the intended control interface as a single column, and copy the sound off the parasited APU channel and do the software equivalent of a D-flipflop.

_________________
http://www.frankengraphics.com - personal NES blog


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 3:28 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6889
Location: Canada
So you're proposing something like this?
Code:
sta $4080 ; push A to mapper stack
lda $4080 ; pull A from mapper stack

You might consider not storing this stack in CPU space at all, eliminating the dual ported RAM thing, unless you need direct random access to it?

I wouldn't really have any ideas for what to use this for. I think it might make a FORTH implementation easier/faster, since it tends to require multiple stacks?

...but to me it's kind of a "solution" looking for a problem, rather than the other way around.


Back toward the question of audio, though, you might also consider Super Russian Roulette, which I believe (not certain) has a large serial (1-bit at a time) ROM that automatically reads out 8 bits to a register after every DPCM read, essentially allowing DPCM data to be streamed indefinitely. The quality is unfortunately NES DPCM quality but on the upside it uses only ~4% of the CPU time (normal DPCM cycle stealing) and doesn't get interrupted by NMI/etc.


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 3:30 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 7664
Location: Seattle
FrankenGraphics wrote:
But for nesting routines and wanting to have a(n otherwise virtual) data stack, this should be more convenient than storing stuff manually in ram. well, you still need to push and pull it manually, but you wouldn't need to index it manually and keep track of the index.
The important thing in C is being able to address things relative to the stack; there's a reason why cc65 uses the (sp),y addressing mode. So it's not enough to have a single byte access: you really want something equivalent to (sp),y with some external hardware that quickens up having to read the stack.

However, (sp),y is only one cycle slower than abs,y, and abs,y is the same speed as abs (as long as there's no page crossing). And you still have to be able to move the stack pointer separately from reading/writing to it. The big thing that slows down cc65's software stack—I think, I might be misremembering—is repeatedly having to load values into Y.

Quote:
A third, much simpler project is the one i just wrote a line about in another thread is how simple it'd be to add suboctaves to the channels of the APU (to the pulses primarily). It's like ~0.25€ per channel for the effect itself, and then the cost of the interface to be able to gate or volume control it.
But there's no way from the cartridge to only affect one voice... And only the famicom provides the ability to get in the middle of the audio path; the most we can do on the NES is mix something new in.


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 4:35 pm 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 1783
Location: Gothenburg, Sweden
lidnariq wrote:
But there's no way from the cartridge to only affect one voice...

Ack, didn't think that far.. mixing two square voices before a flipflop divider is only useful in some few situations, and a whole mix - practically never. just the one voice.. well you could use the other square for octaving in that case.

So, mostly only useful if it was an option tucked onto a synthesizer on-cart.

rainwarrior wrote:
So you're proposing something like this?

Yeah that was what i thought about.

The point of having it open to random access is.. well it's rather the other way around. If you already have WRAM for some good reason (battery backed RAM, simulation games, a highly manipulable game screen, self-modifying code, whatever it may be), you could aditionally use a portion of it to have another stack with hardware driven features if the mapper allowed for it.

You're also right about the solution looking for a problem. I wanted to see what you guys had to say about its use (if any) for C, since that seems to be more and more popular, or if there's anything in assembly one might need it for. But, you could also keep a shallow data stack in ZP. I don't think anyones' attempted FORTH for game development here?

Quote:
Back toward the question of audio, though, you might also consider Super Russian Roulette, which I believe (not certain) has a large serial (1-bit at a time) ROM that automatically reads out 8 bits to a register after every DPCM read, essentially allowing DPCM data to be streamed indefinitely. The quality is unfortunately NES DPCM quality but on the upside it uses only ~4% of the CPU time (normal DPCM cycle stealing) and doesn't get interrupted by NMI/etc.

That's a pretty neat trick. It also means one wouldn't need modifications or expansion port dongles for it to be heard on the NES.

_________________
http://www.frankengraphics.com - personal NES blog


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 4:45 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6889
Location: Canada
FrankenGraphics wrote:
The point of having it open to random access is.. well it's rather the other way around. If you already have WRAM for some good reason (battery backed RAM, simulation games, a highly manipulable game screen, self-modifying code, whatever it may be), you could aditionally use a portion of it to have another stack with hardware driven features if the mapper allowed for it.

I think there's a bit of magnitude of difference between "already have WRAM" and "already have dual-ported WRAM", though. (There was some theoretical discussion about how to cycle-steal to get around needing the dual port though, but it sounded like a complicated thing to build.)

FrankenGraphics wrote:
I don't think anyones' attempted FORTH for game development here?

I'm not really a FORTHer myself, but some users here seemed to have been working with it on the NES at least a bit (Garth, pubby, maybe others).


Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 5:47 pm 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 1783
Location: Gothenburg, Sweden
Quote:
a bit of magnitude of difference between "already have WRAM" and "already have dual-ported WRAM

yeah i think it's ~5€ extra for the dual port feature, not good at all unless you have a very pronounced use for it.

Do you need it to be dual ported for this task though? I don't think you're ever accessing it more than once at any time. the sp needs to be stored and read from somewhere, i guess, which you could keep in a SMT array of flipflops for 0.15€ 0.45€* or so.

the dual porting was for creating the low-clocked RAM-using digital delay with two revolving indices/stack pointers for simultaneous reads and writes. which got me thinking about stacks in general. But even for the audio purpose, some ~8€ euros might be too much if put on a cartridge. As a synth module mean as part of a musical instrument, this price is negligible. for a game - not so much. taking turns reading and writing at a higher clock speed might be more reasonable after all.


*edit: forgot filtering out cmos level devices in my search at first.

_________________
http://www.frankengraphics.com - personal NES blog


Last edited by FrankenGraphics on Fri Sep 07, 2018 5:55 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Sep 07, 2018 5:53 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6889
Location: Canada
FrankenGraphics wrote:
Do you need it to be dual ported for this task though? I don't think you're ever accessing it more than once at any time.

Ahh, yeah I guess you don't actually. I think your mapper would have to intercept all of the address lines to the RAM to be able to address the reads and writes to the stack pointer, but the data bus can go straight through. (That's probably a lot of pins to claim on your CPLD though.)


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 3:07 am 
Offline
Formerly WheelInventor

Joined: Thu Apr 14, 2016 2:55 am
Posts: 1783
Location: Gothenburg, Sweden
about pins (and provided i need instant sp access for every call), what would count as a significant stack depth improvement over keeping it in zp? 8? 16? 32? more?
I guess the need for depth depends on how fare one would want recursions to go?

also i need to look into what could be done about (),y

- but in sum, the utility of a secondary hardware driven stack seems not that strong then (in relation to already existing software techniques), and only something that might be intersting if it can be had as a cheap (max 1$) addon to a preexisting wram requirement.

_________________
http://www.frankengraphics.com - personal NES blog


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 7:34 am 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1251
The pro of a stack is that it auto increments. An auto incrementing read/write would make a lot of stuff faster. Having to push there (or copy from ROM to WRAM) first isn't terrible, but it eats a lot of the gains.

So the concept of the stack is beneficial so long as I can set the stack pointer to any place in ROM. Something like $2006/$2007 pair except for the CPU address space. Even better if there are two autoincrementing registers, for source and destination.

The only thing I use a lot of stack for is PPU updates, and only because it's faster. (And it's only faster on pull because of the autoincrement thing, filling it in the first place is slow.) Basically I invest a lot of time filling it so that unfilling it is fast. I'd rather not have to fill it.

You'd save a byte/2 cycles/The Y Register by avoiding using iny for most data accesses with (Indirect),Y, as well as an additional cycle (or two!) because absolute addressing is faster anyway.

You'd also be able to save the increment byte and cycles on an certain unrolled loops. And if the unrolled loop was the kind without increments, all of the addresses you want to unroll could use ONE unrolled loop saving many bytes.

Better than both is a CPU DMA, but I have zero idea about the cost effectiveness of any of it. I cannot think of a way I would be helped by the hardware stack. You pay for the fast unfilling by filling it.

Edit: You know what? I change my mind. I don't yet use WRAM, so I've been... ignoring that it's there too. You can copy a lot of data you want unrolled on level load, and then there is room to play. OAM updating is one of immediate things that comes to mind.

You can use absolute X for the wrapping (to do sprite shuffling) but read the data through the port. This free up Y to keep track of how many sprites in the metasprite are left, or whatever else.

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: lidnariq and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group