nesdev.comhttp://forums.nesdev.com/ Practicality of bytebeat: Can "Crowd" be optimized?http://forums.nesdev.com/viewtopic.php?f=32&t=15661 Page 1 of 1

 Author: tepples [ Sun Mar 12, 2017 6:02 pm ] Post subject: Practicality of bytebeat: Can "Crowd" be optimized? After I saw the suggestion to add the forty-two melody as an easter egg somewhere in Action 53 volume 3, I got to thinking whether more substantial bytebeat compositions such as "Crowd" by Kragen might be doable. Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502? Or are all the 32-bit shifts and bitwise ops too complex?Code:#include static inline void putchar(int c) {  *(uint8_t *)0x4011 = c >> 1;}int main(void) {  for (uint32_t t = 0; ; ++t) {    uint8_t out =       ((t<<1)^((t<<1)+(t>>7)&t>>12))      | t>>(4-(1^7&(t>>19)))      | t>>7;    putchar(out);  }}Or is this a moot point because of undefined behavior involving t>>(4-(1^7&(t>>19)))?

 Author: lidnariq [ Sun Mar 12, 2017 6:20 pm ] Post subject: Re: Practicality of bytebeat: Can "Crowd" be optimized? It doesn't really answer the question, but there are a few bytebeats ported to the 2600: e.g. http://www.acc.umu.se/~tjoppen/files/vcs/genmusic1/

 Author: rainwarrior [ Sun Mar 12, 2017 7:02 pm ] Post subject: Re: Practicality of bytebeat: Can "Crowd" be optimized? tepples wrote:Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502?What's your estimate for a "naive" implementation? (Why does it have to be exactly 224? I'd imagine it would still be musically recognizable at a range of speeds.)tepples wrote:Or is this a moot point because of undefined behavior involving t>>(4-(1^7&(t>>19)))?Since when does "undefined" mean "unimplementable"? (Just in case my tone is unclear, I'm not trying to be dismissive, just not ready to dive in and solve it myself right now. I think this is an interesting idea, so I'll be glad to hear if you come up with anything here.)

 Author: tepples [ Sun Mar 12, 2017 7:37 pm ] Post subject: Re: Practicality of bytebeat: Can "Crowd" be optimized? rainwarrior wrote:tepples wrote:Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502?What's your estimate for a "naive" implementation? (Why does it have to be exactly 224? I'd imagine it would still be musically recognizable at a range of speeds.)I was mostly trying to rule out jitter-fests due to implementations that aren't constant time, such as an inc/bne implementation of t++ or a looping implementation of the term with the variable shift amount. A constant time implementation of t++ alone would take 34 cycles:Code:clclda #1adc t_0sta t_0lda #0adc t_8sta t_8lda #0adc t_16sta t_16lda #0adc t_24sta t_24Calculating one of the shifted versions takes 32 cycles:Code:lda t_0asl asta tshl1_0lda t_8rol asta tshl1_8lda t_16rol asta tshl2_16lda t_24rol asta tshl2_24rainwarrior wrote:tepples wrote:Or is this a moot point because of undefined behavior involving t>>(4-(1^7&(t>>19)))?Since when does "undefined" mean "unimplementable"? "Crowd" is defined by a C program, but C doesn't define what negative shift amounts or shift amounts greater than the size in bits of the left operand do. At t == 0x200000 samples (around 43 minutes 41 seconds), for instance, this particular sub-expression reduces toCode:0x200000>>(4-(1^7&(0x200000>>19)))0x200000>>(4-(1^7&(4)))0x200000>>(4-(1^4))0x200000>>(4-5))One might consider defining the piece to end at the first undefined behavior. But in C, undefined behavior causes time travel:The C committee wrote:However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

 Author: JRoatch [ Sun Mar 12, 2017 7:47 pm ] Post subject: Re: Practicality of bytebeat: Can "Crowd" be optimized? The nice thing about casting shifted 32 bit numbers into 8 bits is that you don't have to bit shift all 4 bytes. Crowd also doesn't use multiplication which was the most expensive part of the forty-two melody.I not sure if the answers to this stackoverflow question helps how t >> -1 is implemented today.Edit:Or since it is 43 minutes in, we could just force the sequence to cycle at that point by changing the term to t>>(4-(1^3&(t>>19))) masking to 2 bits instead of 3.

 Author: rainwarrior [ Sun Mar 12, 2017 8:10 pm ] Post subject: Re: Practicality of bytebeat: Can "Crowd" be optimized? tepples wrote:undefined behavior causes time travelThat's not relevant to this. If you want to play the specific song, do what the specific implementation does. It doesn't matter whether it's a well formed C question. If the song plays, the compiler had a way to implement it.tepples wrote:C doesn't define what negative shift amounts or shift amounts greater than the size in bits of the left operand do.I believe Intel implemented SHL/SHR to take only an unsigned parameter, bitmasked to the available shift amounts (i.e. -1=31, -2=30, etc). That's what I'd expect would happen here, but it's easy to check.Anyhow, if the computation is unfeasible as-is, there's always lookup tables. If you're just looking for a way to fill unused ROM space, you could just precompute the whole function for as long as the space allows. ;P

Author:  rainwarrior [ Sun Mar 12, 2017 11:06 pm ]
Post subject:  Re: Practicality of bytebeat: Can "Crowd" be optimized?

tepples wrote:
Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502?

So... I got bored on the toilet, and I think the answer turns out to be yes.

This uses only 222 cycles, but you can add a NOP in there somewhere probably. The code isn't fancy, it's just not a whole lot of computation when you restrict it to just the bits you need.

I'm not entirely sure it's correct, because I don't want to listen to it for ~150 hours to compare, and also the loss of 1 bit of precision means some "quiet" sounds are lost in the mix, but... seems pretty close so far?

Edit: Realized it doesn't use the upper bits, so it actually loops after about 8.5 minutes. After listening to the whole thing (and hearing/correcting a single character typo in the 6th >>19 sound phase), I'm pretty sure this is it.

Edit: Or... well I guess some of the "negative" shifts use the upper bits? Maybe 18 hours in or something three of the eight >>19 phases will start to sound a little different?

Author:  rainwarrior [ Mon Mar 13, 2017 12:15 am ]
Post subject:  Re: Practicality of bytebeat: Can "Crowd" be optimized?

And just because I wanted to see it, here's my naive forty-two. Takes 237 cycles.

I guess JRoatch already made one of these, but hey you can always use another.