Non-MSU video player

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
psycopathicteen
Posts: 2963
Joined: Wed May 19, 2010 6:12 pm

Non-MSU video player

Post by psycopathicteen » Sun Sep 20, 2020 9:17 am

If I'm doing a music video from a Disney animated movie, I would need to have 24fps, which, if I'm doing a fullscreen 256x224 4bpp, I've calculated that I would have roughly 13kB or 416 8x8 tiles per animation frame. That means tiles would have to be duplicated to fill up the screen. I want to know if anybody has already made a video converter that limits the color depth, but also limits the amount of unique tiles. It would be ideal if such a converter can optimize an image around 8 palettes of 16 colors, and even do stuff like tile V-flip and H-flip, but it will be okay if it's just limited to a basic 16 color palette per frame. The 65816 probably can't handle decompressing that many tiles at 20fps, but I would want to see how it looks like before I attempt to do any decompression benchmarks.

User avatar
Nikku4211
Posts: 212
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: Non-MSU video player

Post by Nikku4211 » Sun Sep 20, 2020 9:49 am

Cool.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

coto
Posts: 50
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: Non-MSU video player

Post by coto » Sun Sep 20, 2020 10:04 am

I'm working on MPEG-2 Video decoding, and I-frames, B-frames within a GOP are stored in YV12 format. The Nintendo DS already requires handwritten assembly and the use of ARMv5te opcodes (smlawb, smulwb) to convert from YV12 to ARGB15 on the fly. So if I were you i'd preprocess everything computer side, and render them in a format the PPU would understand. (such as planar bitmap format or tile mode, ala Mario Kart Super Circuit (gba) which translates a MODE7 plane into tile/text mode.)

psycopathicteen
Posts: 2963
Joined: Wed May 19, 2010 6:12 pm

Re: Non-MSU video player

Post by psycopathicteen » Sun Sep 20, 2020 10:56 am

http://forums.nesdev.com/viewtopic.php?f=21&t=14807

I don't know what happened to the pictures, but this thread has something relevant to what I'm talking about.

tepples
Posts: 22143
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Non-MSU video player

Post by tepples » Sun Sep 20, 2020 10:57 am

I was thinking color cell compression (allowing up to two out of 16 colors per 4x4-pixel area) at 120x80 pixels (letterboxed) and 12 fps. Disney movies were animated on twos, and the N64 Resident Evil game made graphics smaller to fit. For bitrate, color cell compression would give 120*80/4/4=600 quarter tiles per frame, where each quarter tile is 24 bits (color if 0, color if 1, and a bitmap) for 1800 ROM bytes per frame or 21600 ROM bytes per second or 1266 KiB per minute. I think (but am not certain) that dealing with the pixels of two quarter tiles in parallel would let the 65816 decode it in real time.

calima
Posts: 1238
Joined: Tue Oct 06, 2015 10:16 am

Re: Non-MSU video player

Post by calima » Sun Sep 20, 2020 11:01 am

24fps is not evenly dividing 60fps, so you will run into the 5kb vblank limit per frame. 24fps -> you'll sometimes have two and sometimes three frames to upload the next one. Unless your demo is for PAL, where the 50hz aligns better and increased vblank lets you upload enough in two frames.

superfamiconv can (maybe) process an image into 8 16-color palettes, with flips. It can't do lossy though, and I haven't seen any lossy tile converter for SNES, only for NES, as you linked.

psycopathicteen
Posts: 2963
Joined: Wed May 19, 2010 6:12 pm

Re: Non-MSU video player

Post by psycopathicteen » Sun Sep 20, 2020 1:16 pm

I was thinking color cell compression (allowing up to two out of 16 colors per 4x4-pixel area) at 120x80 pixels (letterboxed) and 12 fps. Disney movies were animated on twos, and the N64 Resident Evil game made graphics smaller to fit. For bitrate, color cell compression would give 120*80/4/4=600 quarter tiles per frame, where each quarter tile is 24 bits (color if 0, color if 1, and a bitmap) for 1800 ROM bytes per frame or 21600 ROM bytes per second or 1266 KiB per minute. I think (but am not certain) that dealing with the pixels of two quarter tiles in parallel would let the 65816 decode it in real time.
https://www.youtube.com/watch?v=qJJog4BJrUw&t=65s

There are parts where this is animated at 24 fps, but the actual song is about only 2 minutes 30 seconds, so that it adds up to a little over 6MB at 24 fps, which fits in an 8MB (or 64 megabit) cart, with room for sound samples included.

120x80 is still pretty tiny, though. I think it can do a little bit better with some LZ4 or RLE-like encoding to pack it down even more.
24fps is not evenly dividing 60fps, so you will run into the 5kb vblank limit per frame. 24fps -> you'll sometimes have two and sometimes three frames to upload the next one.
frames 0-1: upload frame A
frame 2: upload end of frame A, and beginning of frame B.
frames 3-4: upload rest of frame B.
superfamiconv can (maybe) process an image into 8 16-color palettes, with flips. It can't do lossy though, and I haven't seen any lossy tile converter for SNES, only for NES, as you linked.
I tried downloading that but couldn't get it to run.

User avatar
Señor Ventura
Posts: 157
Joined: Sat Aug 20, 2016 3:58 am

Re: Non-MSU video player

Post by Señor Ventura » Sun Sep 20, 2020 2:26 pm

With your permission, I'm going to take advantage that this thread exists to ask something relevant to it (in terms of sounding).


I saw a video in youtube from a guy that restored the original audio samples to clean these to sound decompressed.

Well, since these are .pcm files, sounds at 44khz, but we can understand it. The question is, Could the original samples support uncompressed 32khz all together without big levels of compression?.

https://youtu.be/J2xAM_KcSeY?t=468

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Non-MSU video player

Post by none » Sun Sep 20, 2020 4:51 pm

I'm currently trying out 16 bit FMV too, I am using this video for testing:

https://www.youtube.com/watch?v=jnoKw0PxugQ

Downscaled to 256x96, it is 12kb per frame uncompressed. I'm currently just using LZ4 for compression, with that, I can get the whole sequence under 3 Megabytes. Each frame 4kb are unpacked and uploaded, so I have a video frame every three frames (that approximately matches the framerate of the original video).

I think because there is very much and fast motion in the video you want to try, you won't be able to get very good compression ratios just with LZ4. Compression ratios will be even worse if you do not use the "raw" data but do color cell compression or similar things additionally.

I am also trying to figure out a lossy compression scheme (more suitable for lots of flat-shaded stuff, like in the video you are trying to do, but probably also not so good with fast motion).

I have considered a quadtree based approach, where the screen is subdivided in 2x2 cells (128x128px each for fullscreen) and these cells are subdivided again.

Each node would be encoded in 1 byte, the high 4 bits would store a fixed color and the low 4 bits encode whether each of the four cells should be further subdivided or be filled with the fixed color (by updating the tilemap). When filling cells, the first color is transparent meaning that the tile(s) from the previous frame is / are kept (meaning only 15 colors are usable for filling regions). This continues until the cell is 2x2 tiles in size. For these leaf nodes, the byte encoding is different. Each tile uses 2 bits: 0 is leave tile unchanged, 1 is upload new tile, 2 is reuse cached tile from vram, 3 has a special meaning explained below. Reusing a cached tile requires an additional 2-byte word for updating the tilemap (allowing mirroring previously used tiles and maybe even changing the palette).

The data for the individual tiles that do change in each frame is not stored in the quadtree itself but in a separate memory region, sequentially (so that they can be uploaded to VRAM with dma all at once). The tilemap needs to be updated to reflect the new tiles, the indices are stored implicitly. In VRAM, 1024 tiles can be cached (minus 16 tiles for the flat colors, which remain constant).

When uploading new tiles, care must be taken so that tiles are not overwritten that are still part of the current tilemap. For this case, the upload must be relocated if possible or split into several DMAs. This can be detected and cared for in the encoder however, so no big runtime overhead - this is where the "3" from above comes into play, for these cases, the leaf node stores an additional byte for updating the tile index accordingly. Not sure about if fragmentation can become a problem though (ruining dma bandwidth), maybe the encoder would need to force reuploading tiles in some places.

I think with this approach, unpacking could be reasonably fast (walking the quadtree should not be very costly with an efficient implementation) and allow for reducing the bitrate easily in the encoder (by reducing the amount of updates dynamically, either not updating some tiles at all or filling them with flat colors or with similar tiles that are already cached selectively). The encoder could priorise tiles for example based on a mix of amount of change, color contrast, color importance, and distance to center. It would also be neat to let the encoder change palettes on a per-frame basis additionally, to improve color depth and update sizes.

If there is much changing stuff (> 66% of individual tiles change), it would probably be more size efficient to skip storing the quadtree, but still using the caching scheme (storing just the leaf cells), that could still prove worthwhile for reducing both bitrate and dma bandwidth.

Maybe additional compression would also make sense, if there is still CPU time left (LZ4 over everything, and / or a tile compression scheme, come to mind).

creaothceann
Posts: 270
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Non-MSU video player

Post by creaothceann » Mon Sep 21, 2020 1:03 am

psycopathicteen wrote:
Sun Sep 20, 2020 1:16 pm
24fps is not evenly dividing 60fps, so you will run into the 5kb vblank limit per frame. 24fps -> you'll sometimes have two and sometimes three frames to upload the next one.
frames 0-1: upload frame A
frame 2: upload end of frame A, and beginning of frame B.
frames 3-4: upload rest of frame B.
If it's animated at 12 fps then it'd fit evenly into 60 fps.

Another option is enabling interlaced mode and telecining the source material - that's how the industry dealt with converting Film to SD Video.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

calima
Posts: 1238
Joined: Tue Oct 06, 2015 10:16 am

Re: Non-MSU video player

Post by calima » Mon Sep 21, 2020 1:10 am

psycopathicteen wrote:
Sun Sep 20, 2020 1:16 pm
frames 0-1: upload frame A
frame 2: upload end of frame A, and beginning of frame B.
frames 3-4: upload rest of frame B.
No, it doesn't work like that. You can't upload the end of frame A at frame 2 if the timing happens to be such you already had to display it. And with only 64kb VRAM, you can't keep ahead enough either.

In general, the technique for dividing an image into tiles, then reducing the tile count lossily but keeping as much quality as possible is called vector quantization. Many early video codecs used it, cinepak, indeo, several game codecs like roq. You can look up VQ algoritms on arxiv.org and pick one to implement. When I was playing with VQ, I used the state of the art genetic one, getting similar psnr numbers as the paper. However it's really slow, taking several seconds to encode a 512x512 greyscale image, multithreaded on a 6-core. If you can deal with worse quality, it can be fast.

Cinepak also had optimizations like using a tileset for many frames, each frame only updating a part of the tileset, etc. You may want to clone cinepak for the most part.

psycopathicteen
Posts: 2963
Joined: Wed May 19, 2010 6:12 pm

Re: Non-MSU video player

Post by psycopathicteen » Mon Sep 21, 2020 5:59 am

psycopathicteen wrote:
Sun Sep 20, 2020 1:16 pm
frames 0-1: upload frame A
frame 2: upload end of frame A, and beginning of frame B.
frames 3-4: upload rest of frame B.
No, it doesn't work like that. You can't upload the end of frame A at frame 2 if the timing happens to be such you already had to display it. And with only 64kb VRAM, you can't keep ahead enough either.
What do you mean it doesn't work like that? If I had 13kB of CHR for frame A, and 13kB of CHR for frame B, I will still have less than half of VRAM being used. Even if I didn't use any kind of vector quantization, I will still have enough vram to fit 2 frames of CHR at 4bpp, plus a tile map.
In general, the technique for dividing an image into tiles, then reducing the tile count lossily but keeping as much quality as possible is called vector quantization. Many early video codecs used it, cinepak, indeo, several game codecs like roq. You can look up VQ algoritms on arxiv.org and pick one to implement. When I was playing with VQ, I used the state of the art genetic one, getting similar psnr numbers as the paper. However it's really slow, taking several seconds to encode a 512x512 greyscale image, multithreaded on a 6-core. If you can deal with worse quality, it can be fast.

Cinepak also had optimizations like using a tileset for many frames, each frame only updating a part of the tileset, etc. You may want to clone cinepak for the most part.
Know of any program I can download that supports that?

calima
Posts: 1238
Joined: Tue Oct 06, 2015 10:16 am

Re: Non-MSU video player

Post by calima » Mon Sep 21, 2020 9:54 am

Draw a horizontal line, and add 60fps points above and 24fps points below. Perhaps visualizing it like that makes it clear.

No ready program, you'll have to get your hands dirty ;) Unless you meant cinepak natively? In that case there's plenty of media players and encoders.

psycopathicteen
Posts: 2963
Joined: Wed May 19, 2010 6:12 pm

Re: Non-MSU video player

Post by psycopathicteen » Mon Sep 21, 2020 3:31 pm

I figured out how to do the stuff Tepples mentioned without taking up CPU performance. Use mode 1. BG1 is filled with 4x4 solid color squares of 16 colors, BG2 is filled with another set of 4x4 solid color squares. BG3 is filled with a 1bpp bitplane (the way the sPPU is set up makes this easy) with transparent or black pixels. BG1 and BG3 are on the main screen, while BG2 is on the subscreen, so if transparency is enabled on BG3, it should overlay BG2s colors on top of it.

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Non-MSU video player

Post by none » Tue Sep 22, 2020 1:56 pm

I've made some progress with lossy compression. The tree part is not finished yet, just the quantization / caching is working right now.

Here's a test video with 256x224 pixels, and 1400 frames compressed into 2 Megabytes. VRAM upload needed is mostly around 3 to 4 kilobytes per video frame.

Original:
Image

Encoded and converted back to gif:
Image

It suffers from heavy blockiness in some parts, but that somewhat improves when compressing to around 4 Megabytes.

The encoder is here, maybe it is of use to you with your project:

https://github.com/rmn0/tie

Post Reply