It is currently Mon Feb 18, 2019 4:42 pm

 All times are UTC - 7 hours

 Page 1 of 1 [ 11 posts ]
 Print view Previous topic | Next topic
Author Message
 Post subject: Practicality of bytebeat: Can "Crowd" be optimized?Posted: Sun Mar 12, 2017 6:02 pm

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21100
Location: NE Indiana, USA (NTSC)
After I saw the suggestion to add the forty-two melody as an easter egg somewhere in Action 53 volume 3, I got to thinking whether more substantial bytebeat compositions such as "Crowd" by Kragen might be doable. Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502? Or are all the 32-bit shifts and bitwise ops too complex?

Code:
#include <stdint.h>
static inline void putchar(int c) {
*(uint8_t *)0x4011 = c >> 1;
}

int main(void) {
for (uint32_t t = 0; ; ++t) {
uint8_t out =
((t<<1)^((t<<1)+(t>>7)&t>>12))
| t>>(4-(1^7&(t>>19)))
| t>>7;
putchar(out);
}
}

Or is this a moot point because of undefined behavior involving t>>(4-(1^7&(t>>19)))?

Top

 Posted: Sun Mar 12, 2017 6:20 pm

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8138
Location: Seattle
It doesn't really answer the question, but there are a few bytebeats ported to the 2600: e.g. http://www.acc.umu.se/~tjoppen/files/vcs/genmusic1/

Top

 Posted: Sun Mar 12, 2017 7:02 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7210
tepples wrote:
Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502?

What's your estimate for a "naive" implementation? (Why does it have to be exactly 224? I'd imagine it would still be musically recognizable at a range of speeds.)

tepples wrote:
Or is this a moot point because of undefined behavior involving t>>(4-(1^7&(t>>19)))?

Since when does "undefined" mean "unimplementable"?

(Just in case my tone is unclear, I'm not trying to be dismissive, just not ready to dive in and solve it myself right now. I think this is an interesting idea, so I'll be glad to hear if you come up with anything here.)

Top

 Posted: Sun Mar 12, 2017 7:37 pm

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21100
Location: NE Indiana, USA (NTSC)
rainwarrior wrote:
tepples wrote:
Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502?

What's your estimate for a "naive" implementation? (Why does it have to be exactly 224? I'd imagine it would still be musically recognizable at a range of speeds.)

I was mostly trying to rule out jitter-fests due to implementations that aren't constant time, such as an inc/bne implementation of t++ or a looping implementation of the term with the variable shift amount. A constant time implementation of t++ alone would take 34 cycles:
Code:
clc
lda #1
sta t_0
lda #0
sta t_8
lda #0
sta t_16
lda #0
sta t_24

Calculating one of the shifted versions takes 32 cycles:
Code:
lda t_0
asl a
sta tshl1_0
lda t_8
rol a
sta tshl1_8
lda t_16
rol a
sta tshl2_16
lda t_24
rol a
sta tshl2_24

rainwarrior wrote:
tepples wrote:
Or is this a moot point because of undefined behavior involving t>>(4-(1^7&(t>>19)))?

Since when does "undefined" mean "unimplementable"?

"Crowd" is defined by a C program, but C doesn't define what negative shift amounts or shift amounts greater than the size in bits of the left operand do. At t == 0x200000 samples (around 43 minutes 41 seconds), for instance, this particular sub-expression reduces to
Code:
0x200000>>(4-(1^7&(0x200000>>19)))
0x200000>>(4-(1^7&(4)))
0x200000>>(4-(1^4))
0x200000>>(4-5))

One might consider defining the piece to end at the first undefined behavior. But in C, undefined behavior causes time travel:
The C committee wrote:
However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

Top

 Posted: Sun Mar 12, 2017 7:47 pm
 Formerly 43110

Joined: Wed Feb 05, 2014 7:01 am
Posts: 377
Location: us-east
The nice thing about casting shifted 32 bit numbers into 8 bits is that you don't have to bit shift all 4 bytes. Crowd also doesn't use multiplication which was the most expensive part of the forty-two melody.

I not sure if the answers to this stackoverflow question helps how t >> -1 is implemented today.

Edit:
Or since it is 43 minutes in, we could just force the sequence to cycle at that point by changing the term to t>>(4-(1^3&(t>>19))) masking to 2 bits instead of 3.

Top

 Posted: Sun Mar 12, 2017 8:10 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7210
tepples wrote:
undefined behavior causes time travel

That's not relevant to this. If you want to play the specific song, do what the specific implementation does. It doesn't matter whether it's a well formed C question. If the song plays, the compiler had a way to implement it.

tepples wrote:
C doesn't define what negative shift amounts or shift amounts greater than the size in bits of the left operand do.

I believe Intel implemented SHL/SHR to take only an unsigned parameter, bitmasked to the available shift amounts (i.e. -1=31, -2=30, etc). That's what I'd expect would happen here, but it's easy to check.

Anyhow, if the computation is unfeasible as-is, there's always lookup tables. If you're just looking for a way to fill unused ROM space, you could just precompute the whole function for as long as the space allows. ;P

Top

 Posted: Sun Mar 12, 2017 11:06 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7210
tepples wrote:
Is it possible to compute each sample of "Crowd" in exactly 224 cycles on the MOS 6502?

So... I got bored on the toilet, and I think the answer turns out to be yes.

This uses only 222 cycles, but you can add a NOP in there somewhere probably. The code isn't fancy, it's just not a whole lot of computation when you restrict it to just the bits you need.

I'm not entirely sure it's correct, because I don't want to listen to it for ~150 hours to compare, and also the loss of 1 bit of precision means some "quiet" sounds are lost in the mix, but... seems pretty close so far?

Edit: Realized it doesn't use the upper bits, so it actually loops after about 8.5 minutes. After listening to the whole thing (and hearing/correcting a single character typo in the 6th >>19 sound phase), I'm pretty sure this is it.

Edit: Or... well I guess some of the "negative" shifts use the upper bits? Maybe 18 hours in or something three of the eight >>19 phases will start to sound a little different?

Last edited by rainwarrior on Mon Mar 13, 2017 1:11 pm, edited 4 times in total.
Top

 Posted: Mon Mar 13, 2017 12:15 am

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7210
And just because I wanted to see it, here's my naive forty-two. Takes 237 cycles.

I guess JRoatch already made one of these, but hey you can always use another.

Top

 Posted: Mon Mar 13, 2017 7:44 am

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21100
Location: NE Indiana, USA (NTSC)
rainwarrior wrote:
I'm not entirely sure it's correct, because I don't want to listen to it for ~150 hours to compare

One could test it in a 6502 simulator by logging \$4011 writes and comparing the output to that from a modified crowd.c that adds (...)>>1. Does NSFPlay support non-returning play?

Quote:
Edit: Realized it doesn't use the upper bits, so it actually loops after about 8.5 minutes.

And after thinking it over, I'm really only concerned about the first 8.5 minutes, given that crowd.ogg linked from Kragen's bytebeat page is only the first 8.5 minutes.

Quote:
After listening to the whole thing (and hearing/correcting a single character typo in the 6th >>19 sound phase), I'm pretty sure this is it.

May I use this as the test for \$4011 in 240p Test Suite for NES?

Top

 Posted: Mon Mar 13, 2017 9:34 am

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7210
tepples wrote:
One could test it in a 6502 simulator by logging \$4011 writes and comparing the output to that from a modified crowd.c that adds (...)>>1. Does NSFPlay support non-returning play?

Yeah, but it'd probably take longer to rig up and execute that test than it took to write in the first place. Kinda exceeds my interest level at this point.

tepples wrote:
May I use this as the test for \$4011 in 240p Test Suite for NES?

I didn't put an explicit license on the code, but yes, please do whatever you like with the code. Copy it, modify it, etc. I'm don't really care about being attributed or not, though I guess the original work is CC-BY so you might be obliged to attribute at least Kragen.

Top

 Posted: Mon Mar 13, 2017 1:10 pm

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 7210
rainwarrior wrote:
I guess some of the "negative" shifts use the upper bits? Maybe 18 hours in or something three of the eight >>19 phases will start to sound a little different?

So, this long phase stuff is really only the top 3 bits, and they are being shifted to the low 1-3 bits of output and ORed with it, so their actual impact on the sound is quite minimal. (This implementation discards the low bit of output too, so it's only the top 2 bits.)

Using a debugger it's easy to set the top 2 bits of "t" and listen, and it only affects 2 of the 8 mid-range phases (about 1 minute each of the ~8 minute "loop"). They don't really sound that different. The negative shifts are more like a surrogate for a 0 term than an actual useful result.

Originally I was considering that the calculation could be simplified by shifting the output early (>> 1) instead of at the end (and also remember only 7 of the bits matter). In particular the terms (t << 1) >> 1 and (t >> 7) >> 1 can just be directly read as single bytes from the 4-byte accumulator. I think this would lose precision on the lowest bit of output from (t>>1) + (t>>7), but would mostly sound the same. (Similarly the 8 mid-phase shift operations can each be customized with an extra >> 1.) Also if you want to ignore the long phase entirely (since it has minimal effect anyway), you can just treat the high byte of "t" as a permanent 0 and skip the extra load/add/store in its increment.

Ultimately I thought it was best to keep it as accurate as I could, but if you need extra cycles, these are options.

Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 11 posts ]

 All times are UTC - 7 hours

#### Who is online

Users browsing this forum: No registered users and 1 guest

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ NES / Famicom    NESdev    NESemdev    NES Graphics    NES Music    Homebrew Projects       2018 NESdev Competition       2017 NESdev Competition       2016 NESdev Competition       2014 NESdev Competition       2011 NESdev Competition    Newbie Help Center    NES Hardware and Flash Equipment       Reproduction    NESdev International       FCdev       NESdev China       NESdev Middle East Other    General Stuff    Membler Industries    Other Retro Dev       SNESdev       GBDev    Test Forum Site Issues    phpBB Issues    Web Issues    nesdevWiki