N64 benchmarks

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

N64 benchmarks

Post by calima »

Like on NES some time back, here's some data for which things to pick on N64. I plan to bench some audio and video codecs next. All on gcc 8.2 -O3.

Code: Select all

Results from cen64, which slightly differs from hw (~5%).

Text decompression, source LGPLv3 7.5kb, speed in kb/s

Algo    | Ratio | Speed | License, comments
-------------------------------------------
zstd    | 0.333 | 1457  | BSD, requires ~160kb RAM
zlib    | 0.343 | 2823  | zlib, requires ~4kb RAM (tinfl)
lzo     | 0.402 | 4773  | GPL, no RAM required
lz4hc   | 0.475 | 10471 | BSD, no RAM required
lzjb    | 0.591 | 4998  | CDDL, no RAM required, nemequ github version


Audio, 10s 44100 Hz mono clip, % realtime

Algo            | Ratio | Speed | License, comments
-------------------------------------------
Speex           | 0.038 | 208   | BSD, fixed point
Vorbis 128      | 0.158 | 410   | BSD, tremor lowram, measured ~35kb
Vorbis 96       | 0.122 | 458   |
Vorbis 64       | 0.089 | 498   |
Vorbis 48       | 0.068 | 498   |
Opus 64         | 0.099 | 215   | BSD, fixed point, measured ~95kb
Opus 48         | 0.075 | 229   |
Opus 32         | 0.049 | 252   |
MP3 128         | 0.131 | 215   | PD, no RAM required, lieff/minimp3
MP3 96          | 0.109 | 215   |
MP3 64          | 0.087 | 219   |
MP3 32          | 0.044 | 430   | Lame chose to downsample to 22kHz and mpeg-2l3
isac 56         | 0.105 | 234   | 32 kHz, ~400kb RAM usage


Audio, 10s 16000 Hz mono clip, % realtime

Algo            | Ratio | Speed | License, comments
-------------------------------------------
Speex           | 0.071 | 582   | BSD, fixed point
Vorbis 64       | 0.173 | 1066  | BSD, tremor lowram, measured ~32kb
Vorbis 48       | 0.142 | 1165  |
Vorbis 32       | 0.111 | 1206  |
Opus 64         | 0.266 | 252   | BSD, fixed point
Opus 48         | 0.199 | 264   |
Opus 32         | 0.135 | 276   |


Video, 5s 320x136 25fps clip, xvid simple profile L3, 247 kbps
libxvidcore (GPL) decoding to I420: 98% realtime
Zstd is pretty disappointing given how hyped it is. Barely better compression than zlib and much slower, with huge RAM usage.
Last edited by calima on Sat Jan 05, 2019 12:15 pm, edited 5 times in total.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: N64 benchmarks

Post by tepples »

How does Zstandard compare to implementations of Deflate other than zlib's, such as 7-Zip's or Google's Zopfli? I ask because I'm familiar with these two in particular from the advzip and advpng tools in AdvanceCOMP. Decompression speed probably wouldn't differ much, but compression would probably be slower, and the rate might differ.

Would you be interested in results for DTE and Huffword codecs as a low water mark? But I'll admit these may not be quite as useful on Nintendo 64 as they are on a small-RAM, execute-in-place environment like the NES.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: N64 benchmarks

Post by calima »

Zstd on modern computers beats even the best zlib implementations, according to all reports. The small size of the test data here probably hinders it a bit, and it's not very speedy on an old MIPS like this. I'll probably use zlib for everything, it hits the sweet spot here.

As for any additional codecs, sure, I'll add any data points.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: N64 benchmarks

Post by koitsu »

Results for lzjb? (Yes I'm aware lz4 is a faster implementation/replacement for lzjb)
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: N64 benchmarks

Post by lidnariq »

I wonder if Zstd is here specifically crippled by the N64's tiny cache.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: N64 benchmarks

Post by calima »

lzjb and speex added. Speex compresses quite well, but at these speeds it's not that suitable for many voices at once. For cutscenes or RPG-style talking to one character, it should work great.

lzjb was fast to test, but I had never even heard of it. Why do you find it interesting? ZFS was a Sparc thing, no MIPS relation.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: N64 benchmarks

Post by tepples »

SILK, the low-rate voice mode in Opus, is similar to Speex in several ways. Is Opus on the whole too slow for the N64? Or Codec 2?
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: N64 benchmarks

Post by calima »

It's on the list to test, along with vorbis and mp3.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: N64 benchmarks

Post by koitsu »

calima wrote:lzjb was fast to test, but I had never even heard of it. Why do you find it interesting? ZFS was a Sparc thing, no MIPS relation.
ZFS isn't a "Sparc" thing, it was originally a "Solaris thing". Around the time of the Oracle buy-out of Sun, OpenIndiana/Illumos happened (think: open-source Solaris), which then resulted in parts of ZFS becoming open-source (though under CDDL), which resulted in it being imported into FreeBSD and a fusefs version for Linux (slow). This all later resulted in OpenZFS and ZFS on Linux -- so now FreeBSD, Linux, and OpenIndiana/Illumos all have ZFS (regardless of arch; x86, amd64, aarch64/ARM, etc.).

Why I found it interesting: because I've known it to be faster than gzip, faster than zlib, but slower than lz4 (which is extremely new), and wanted to see how it performed on the N64.

I don't know if there's some "easy to add" code that would be testable, but gzip and bzip2 (for text) might be interesting as well. I've seen many cases where text compresses better with gzip than bzip2, and in other cases the exact opposite. Another one to consider might be some bare-bones native Huffman implementation, although I wouldn't be surprised if one of the previously-tested algorithms dynamically implements something like that.

For audio, you might look into Codec 2 which is know for being OSS and having extremely high compression rates, but again I have no idea how easy this would be to add/test.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: N64 benchmarks

Post by tepples »

PKZIP and Gzip use the same Deflate algorithm as zlib, and bzip2 is very RAM-intensive on the order of 1 MB, which is one-fourth of the N64's RAM.

How much of the decoding for MP3, Vorbis, Speex, Opus, Codec 2, or FLAC can be done on the RSP? FLAC can be turned into a time-domain lossy codec using the LossyWAV preprocessor, which bit-crushes each 512-byte block with noise-shaped dithering so that FLAC has fewer significant bits to code.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: N64 benchmarks

Post by calima »

Audio codecs generally don't benefit from SIMD, there isn't any vectorizable processing going on. The RSP's SU (scalar unit) lacks multiply instructions and 64-bit instructions, so even using that as a "second core" would be slower than the main core. I expect graphics processing to take the most of RSP's frame time anyway.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: N64 benchmarks

Post by tepples »

calima wrote:Audio codecs generally don't benefit from SIMD, there isn't any vectorizable processing going on.
Not even inverse FFTs or MDCTs or filtering?
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: N64 benchmarks

Post by calima »

The amount of data in one audio packet is so small, that the overhead usually kills any speedup. FFT/DCT for images is a different case, if you process a mb at once instead of a hundred bytes.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: N64 benchmarks

Post by lidnariq »

... How bad is the overhead? I haven't yet found any documentation on how writing code for the RDP/RSP works.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: N64 benchmarks

Post by calima »

Well, I was speaking in general, as in even on x86_64 you won't get much speedup if any from vectorizing parts of audio decoding. You have to load data to the vector, often from unaligned or scattered addresses, do the calculation, and store. The load/permute/store steps may make the 8x/16x processing step speedup worthless if you don't have much data. More so if the vectorizable parts alternate with non-vectorizable.

RSP specifically: vector loads have three delay slots, meaning effectively it takes four instructions worth for an aligned, perfect load. Then you have to DMA in and out of the 4kb memory, giving further overhead. "SGI_Nintendo_64_RSP_Programmers_Guide.pdf" is available on the ultra64.ca site, as well as a RDP register doc.

I've read pretty much all N64 docs by now. In some ways it's better and in others worse than expected. For example there is no flipped Z comparison mode, and rendering triangles is very much a PITA, but on the other hand the RSP will allow many kinds of software pixel effects. Gaussians, additive rendering, better scaling algos, maybe even some form of shadow mapping.
Post Reply