It is currently Sun Dec 17, 2017 10:15 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 66 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Thu Jul 13, 2017 1:04 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19355
Location: NE Indiana, USA (NTSC)
Would using a bucket sort, maintaining a display list for each texture and adding each quad to the appropriate texture, run faster than qsort? You might need two passes over the tilemap, one to count tiles using each texture and one to actually build the display lists.


Top
 Profile  
 
PostPosted: Fri Jul 14, 2017 1:08 pm 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
tepples wrote:
Would using a bucket sort, maintaining a display list for each texture and adding each quad to the appropriate texture, run faster than qsort? You might need two passes over the tilemap, one to count tiles using each texture and one to actually build the display lists.


Dunno, how much memory would take the bucket sort?

For the function i did 2 codes, the first one seems to run faster than qsort on layers like that:
Code:
// tile size: (32x32)
// 10 x 4
static uint16_t map2 [40] =
{
   106,107,106,107,0,106,107,106,107,0,
   108,109,108,109,0,108,109,108,109,0,
   110,0,110,0,0,110,0,110,0,0,
   111,0,111,0,0,111,0,111,0,0
};


Textures from 106 to 111 in consecutive way with not so many calls (0 is ignored), however the difference is 1 or 2fps.

Code:
void texture_list(int opt)
{
   int zx,zy;

   // optimized for layers
   if (opt==0)
   {   
      // check
      if (min_tex>max_tex) { min_tex=max_tex; }
   
      for(zx=min_tex;zx<max_tex+1;zx++)
      {   
         // load texture into TMEM
         if (last_texture!=zx)
         {
            rdp_load_texture(graph[zx]);
            last_texture=zx;
         }   
         
         for(zy=0;zy<num_tex;zy++)
         {   
            if (zx==tex_tile[zy].tex)
            {   
               // draw      
               rdp_draw_sprite(tex_tile[zy].x,tex_tile[zy].y,0);   
            }   
         }
      }
      
   }   
   else // optimized for main scroll
   {   
      
      qsort( tex_tile, num_tex, sizeof(tex_tile[0]), compare );
      
      // display
      for(zx=0;zx<num_tex;zx++)
      {   
   
         // load texture into TMEM
         if (last_texture!=tex_tile[zx].tex)
         {
            rdp_load_texture(graph[tex_tile[zx].tex]);
            last_texture=tex_tile[zx].tex;
         }
                           
         // draw      
         rdp_draw_sprite(tex_tile[zx].x,tex_tile[zx].y,0);      
         
      }
      
   }   

   // reset values
   min_tex=9999;
   max_tex=0;
   num_tex=0;
   texture_sort=0;
}


Last edited by BMBx64 on Fri Jul 14, 2017 7:40 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Jul 14, 2017 2:17 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19355
Location: NE Indiana, USA (NTSC)
BMBx64 wrote:
tepples wrote:
Would using a bucket sort, maintaining a display list for each texture and adding each quad to the appropriate texture, run faster than qsort? You might need two passes over the tilemap, one to count tiles using each texture and one to actually build the display lists.

Dunno, how much memory would take the bucket sort?

Unlike question inversion in some other languages, question inversion in English moves only the tensed verb: the "would" in "would take". This way, the listener can still tell subject and object apart: "Weena would eat..." becomes "What would Weena eat?" not "What would eat Weena?", which sounds a bit more cannibalistic to an English speaker. With that out of the way:

"how much memory would the bucket sort take?"
Enough to hold all the display list entries, plus as many pointers as there are textures. Each pointer points to the display list associated with a particular texture. For example, if there can be up to 300 tiles or portions thereof with four textures and 320 tiles on the screen (assuming a 304x224 canvas with partial tiles), you'd need four display list pointers plus enough memory for 300 quads.

How did I arrive at 300 tiles? Say you're drawing 16x16-pixel tiles to a 304x224-pixel canvas, with 8 pixels of black border on each side. This is the typical canvas size for Neo Geo and Nintendo 64 games so that they don't have to spend GPU time rendering so much of the part of the picture hidden by overscan. This is 14 rows of 19 tiles each. But when scrolling a tilemap, you need an additional row and column to account for tiles being half scrolled off, so 15 rows of 20 tiles, or 300 in all.


Top
 Profile  
 
PostPosted: Sun Jul 23, 2017 5:05 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
ALPHA BLENDING TEST
Small test that shows 32 levels of transparency in 1cycle mode (16bit mode / RDP).
Image

CONTROLS
A - Flip Sprite
Z - Enable/Disable Additive blending
L/R - Alpha control
Joy - Scroll
C buttons - X / Y Scale (zoom)

DOWNLOAD
https://mega.nz/#!g4ZlELCY!1Wbgx4wzpJalM798Ck0zXLHRK2Q6qNmYzyY9zFcOHvY (edit: fixed NTSC video)

New functions added to libdragon (i may post the new lib if anyone interested):

fixes x_scale, RGB scale support, no alpha support
Code:
void rdp_texture_1cycle( void );

same, with bilinear filter when zoomed
Code:
void rdp_texture_filter( void );

like 1cycle but enables alpha support
Code:
void rdp_alpha_blending( void );

like 1cycle with alpha support plus additive
Code:
void rdp_additive_blending( void );

controls alpha blending and rgb scale (tint)
Code:
void rdp_rgba_scale(uint8_t _r, uint8_t _g, uint8_t _b, uint8_t _alpha);


- Add have no color cap and needs to be controlled with rgba_scale.
- RDP commands used: Set Other Modes, Color Combiner, Set Prim Color.
- Small fixes on libdragon.

Thanks Krom for explaining to me how the RDP works.


Last edited by BMBx64 on Sat Jul 29, 2017 7:18 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sun Jul 23, 2017 9:11 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6540
Location: Seattle
BMBx64 wrote:
Thanks Krom for explaining to me how the RDP works.
I found a post on ElOtroLado.net containing your username and their username ... is that basically the same information?


Top
 Profile  
 
PostPosted: Sun Jul 23, 2017 10:04 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
This one?
https://www.elotrolado.net/hilo_hilo-de-detalles-y-curiosidades-de-n64_2171497

Yeah, similar info regarding libdragon, there is a bit of trivia and analysis of comercial games as well (framerate, debug, wireframes, poly count, etc).

Krom ASM examples can be found here:
https://github.com/PeterLemon/N64


Top
 Profile  
 
PostPosted: Sun Aug 20, 2017 5:42 pm 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
ADDED 4/8 BIT TEXTURES
This took a while, i had to update Sprite64 tool as well which i will post on the next days.

The implementation is not finished yet, i followed Krom's ASM examples to do it, for some reason working together with libdragon it fails at showing any texture above 1KB (2KB max, the next 2KB are for TLUT), so i can't do 64x64 4bit textures (2 pixels byte) or 64x32 8bit textures yet, but im working on fixing it.

I did only the RDP side, these textures won't be compatible with the software rendering of libdragon (graphics.c).

TEST 1: PALETTE EDITOR
This test uses a 4bit sprite (1cycle, 2x scaled), you can pick a color and replace any palette color (A button).
Image

By pressing L starts an automatic palette rotation, you can stop it at any time with R.
Image

Press Start to default palette.

I had problems with invalidate cache on this concrete test (palette won't refresh), the problem was solved replacing -O2 with -Os on compiler options.

DOWNLOAD
https://mega.nz/#!l4JFxIiB!jTS5TAQl8-0b5008yk7QQ392qOlYw0cbAK0QbAGOKSw

TEST 2: PERFORMANCE TEST
Same Goldenaxe test of the previous page, 16x16 4bit textures, 1 unique palette, since the whole map uses only 1 of the 4 Mega Drive palettes.
Image

4bit Mirrored (with the current optimization align)
x= 0 - 166fps
x= 194 - 161fps
x= 974 - 143fps
x= 1552 - 169fps

16bit Mirrored (same align)
x= 0 - 161fps
x= 194 - 156fps
x= 974 - 138fps
x= 1552 - 163fps

Overall seems to be an small improvement when all the tiles uses the same palette, could be quite interesting to check this map with 64x64 tiles.

DOWNLOAD
https://mega.nz/#!5x4zAYrT!6xAwSK76h8GaGyKfLo05vDUHjEU33MMmKpHDGAumUSc

NEW FUNCTIONS
I have organized a bit better all the functions i did on the RDP, they have changed a bit since the last post:

Code:
void rdp_send( void );
void rdp_command( uint32_t data );
void rdp_cp_sprite( int x, int y, int flags, int cp_x, int cp_y, int line );
void rdp_cp_sprite_scaled( int x, int y, float x_scale, float y_scale, int flags, int cp_x, int cp_y, int line );
void rdp_enable_filter( int type );
void rdp_enable_alpha( int type );
void rdp_enable_tlut( int type );
void rdp_texture_1cycle( void );
void rdp_additive_blending( void );
void rdp_intensify( void );
void rdp_color( void );
void rdp_rgba_scale(uint8_t _r, uint8_t _g, uint8_t _b, uint8_t _alpha);
void rdp_load_tlut(uint8_t _pal_bp, uint8_t _pal_num, uint16_t *_palette);


All the following functions works combined with a call to rdp_enable_texture_copy or rdp_texture_1cycle:

// 0 disable, 1 enable
void rdp_enable_filter( int type ); // point sampled or bilinear
void rdp_enable_alpha( int type );
void rdp_enable_tlut( int type );

These functions works combined with rdp_rgba_scale, to control RGB values:

void rdp_intensify( void );, RGB from 0 (normal) to 255 (white), can do this:
Image

void rdp_color( void ); shows the sprite silouette in 1 RGB color, can do this kind of effects:
Image

I will be adding more combiner options for different effects, i will update rdp.c and rdp.h soon, i wan't to fix the small bugs commented and few things more.

As well im interested on disabling atomic_prim on libdragon, i already did it, but then sometimes the sprites may miss 1 vertical or horizontal line.
Image


Top
 Profile  
 
PostPosted: Wed Aug 23, 2017 1:54 pm 
Offline

Joined: Mon Nov 26, 2012 1:03 pm
Posts: 2
Does libdragon support triangle rendering, shading, etc?


Top
 Profile  
 
PostPosted: Thu Aug 24, 2017 5:30 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
claws wrote:
Does libdragon support triangle rendering, shading, etc?

On the default libdragon you can render 2D Non shaded triangles, using blend color, function rdp_draw_filled_triangle.
Image

On my side i have added a function to select any from this list called rdp_triangle_setup( num );, 0 = flat, 1 = goraud, 2 = textured, etc
Image

Right now im trying to do textured triangles to rotate sprites, it will work when i update texture coefficients accordingly, im planning more effects with triangles.
Image

--
I have fixed as well 32bit mode, textures on this mode had to be set to 1cycle to be displayed with the RDP or system crashes (they worked on software render on libdragon examples, but thats pretty slow).

32bit mode supports textures of 4,8,16 and 32bits.
16bit mode supports textures of 4,8,16 and 32bits too (they are converted on the fly)

This is a screen build with 32bit textures of 16x16 32bit mode in cen64.
Image

16bit mode with 32bit textures, the colors are lowered as you can see banding on the sky.
Image

PERFORMANCE TEST
This test is mario scroll again with 3 different settings:

Same spot
16bit mode, 16x16 16bits copy texture = 333fps
16bit mode, 16x16 16bits 1cycle texture = 266fps
32bit mode, 16x16 16bits 1cycle texture = 211fps

1cycle texture slowdowns things a bit, 32bit mode is pretty slow as well, maybe for some special effects or intros, but i would stay at 16bit mode for 2D.

DOWNLOAD
https://mega.nz/#!5gIiAACS!V--UGTsV7FULDG0-E2OC_b2QO7HX02-ujwxZF7xzuDA

--

Other things to fix:
- Triple buffer is not working, you get full speed instead of sync video
- AA disabled causes glitches at 60hz (PAL seems fine)
- Fix loading speed, load 1200 tiles/files of 512bytes = 47 seconds
- 4, 8 and 32bit textures only uses half of TMEM (this is top priority)
- Palette colors are sent in a different way, to upload 16 colors you have to send 64 instead (4 times more)

* 16bit textures can use full 4KB of TMEM, surprisingly 16bit textures were the only ones supported on the RDP by libdragon.
* Both palette and texture problems are TMEM related.


Top
 Profile  
 
PostPosted: Thu Aug 24, 2017 10:00 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3166
Location: Nacogdoches, Texas
Amazing work as usual! :)

If only I were determined enough to make this much progress... :lol:


Top
 Profile  
 
PostPosted: Wed Aug 30, 2017 3:12 pm 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
Thx Espozo :beer:

32bit TMEM texture fix
Now the 4KB of TMEM can be used for 32bit textures (32x32)

It was a bug on the rdp_load_texture function of libdragon:
Code:
(((((real_width / 8) + round_amount) * sprite->bitdepth) & 0x1FF) << 9)

The fix (tile line param is the same one for 16bit and 32bit textures):
Code:
(((((cache.real_width  >> 3) + round_amount) << 1) & 0x1FF) << 9)


TEST
Image

Scroll of 32x32 tiles 32bit textures, 320x240x32bit, each tile have a different texture.

Performance:
16x16 = 51-55fps
32x32 = 72-76fps

No optimizations were possible besides disabling the framebuffer clear, repeating 32bit textures could be necessary to keep a good performance.

The test uses 300 tiles instead of 1200, that reduced loading times to 3 or 4 sec.

DOWNLOAD
https://mega.nz/#!w1IFWD4A!k4F3BoVzp1vQ-69SQ4Tfti1E5MTEotgGTYDrw-acxPM

rdp_set_blend_color fix
The function its necessary to color triangles, RDP command requires RGBA components:

Original function may provide wrong colors (specially on 16bit color mode), its called in combination with graphics_make_color :
Code:
void rdp_set_blend_color( uint32_t color )
{
    __rdp_ringbuffer_queue( 0xF9000000 );
    __rdp_ringbuffer_queue( color );
    __rdp_ringbuffer_send();
}

Fix (direct RGBA input):
Code:
void rdp_set_blend_color(uint8_t _r, uint8_t _g, uint8_t _b, uint8_t _alpha)
{
    __rdp_ringbuffer_queue( 0x39000000 );
    __rdp_ringbuffer_queue( _r << 24 | _g << 16 | _b << 8 | _alpha );
    __rdp_ringbuffer_send();
}

Color combiner fix
It seems textured rectangles and triangles have different color combinations, some works for rectangles while triangles looks like this:
Image

I found a compatible combination for both rectangles and triangles for every effect, now triangles can do alpha blending, intensify, etc just like rectangles.

Added rdp_enable_1primitive
This function enables or disables atomic prim since there's some performance impact on the fillrate.

Some glitches happens on concrete sizes / bitdepth textures, for example a 16x16 16bit texture is fine while 16x16 4bit may show a garbage line (or empty line) at the bottom, i think this function could be useful for tiles since you know all them will have the same size, but could be safe to keep enabled it for sprites because of them being variable.

Performance tests (no glitches in any of them):

Mario scroll
On - 333fps
Off - 342fps

Goldenaxe scroll
On - 167fps
Off - 172fps

Fillrate test
On - 1280 16x32 sprites at 50fps
Off - 1320 at 50fps

5-6 fps improvement or close to 1Mp/s fillrate gain like the book suggests.

--

To fix TLUT it seems necessary to load 4 times the same palette (and it works that way on libdragon), however Krom's ASM examples loads only 16 colors instead of 64 :?:
Image

Right now i can do a 64x32x4bit texture but 64x64 won't display, maybe the problem is related to the way the colors are loaded, my next goal is try to fix 4 and 8bit textures then release the current rdp.c.


Top
 Profile  
 
PostPosted: Thu Sep 07, 2017 4:40 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
ADDED FRAMEBUFFER EFFECTS
There are several games that uses different framebuffer effects, such as Mario Kart 64 for the background screen, i though it was a good idea have few functions to allow this:
Image

void rdp_buffer_screen( display_context_t disp, int texture_mode );

This function reads the buffer and generates 16bit textures on the fly, it reads only the necessary pixels to build it.

Have 3 texture modes:
0 - Full Screen on 1 texture of 32x32 (respect ratio of any resolution)
1 - Upper half on maximum texture size (around 64x32)
2 - Bottom half

Image

Mario Kart example have a very slow refresh rate, i could improve the function to select a refresh rate to increase performance at the cost of saving the textures on memory, an option to select a concrete area could be great too.

void rdp_buffer_copy( display_context_t disp, uint16_t x_buf, uint16_t y_buf, uint8_t width, uint8_t height );

This one is more flexible and allows a 1:1 buffer copy of any compatible texture size in any position of the screen.

Once the texture is loaded on TMEM can be draw as normal, any effect can be applied, on this example a row of textures are reversed with blue scale to simulate ice or water reflection.

Image

Both functions are only 16bit mode compatible at the moment, few more are planned to provide different effects.


Top
 Profile  
 
PostPosted: Sat Sep 09, 2017 3:43 pm 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
I made a github page, is not a clone / fork of the original libdragon github, im still learning how it works, so this is just a test but i did uploaded some files meanwhile.

Github:
https://github.com/conker64/libdragon

The interesting ones are rdp.c & rdp.h, there is one TLUT example as well, i will keep adding more examples.

On tools you can find an update of Sprite 64.
Image

For 4 and 8 bit use PNG TO TILEMAP of a max of 64x32x4bit or 32x32x8bit textures, they are not fixed yet and PNG TO SPRITE will attempt to reach the maximum size for each sprite.


Top
 Profile  
 
PostPosted: Sun Sep 17, 2017 10:50 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
ADDED FIRE EXAMPLE
Image

This example features few effects provided by the new libdragon functions.
- S deformation of concrete framebuffer area
- Blur using multiple alpha levels

CONTROLS
Hold Z - To do blur effect
Press A - To generate fire (up to 99, fire variables are shared with blur and recycled when maxed)
Press Start - Delete fire

DOWNLOAD (with source)
https://github.com/conker64/libdragon/tree/master/examples/fire

--
Working on more framebuffer tricks, this one needs to be cleaned up before uploading, since framebuffer is sent as texture for manipulation other effects can be applied like reverse flip, resize or mask with invisible colors.
Image

Other effects that im interested on replicate (such as expansive waves):
Image


Top
 Profile  
 
PostPosted: Fri Oct 20, 2017 3:45 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 24
ADDED DITHER
Another in build RDP setting.

- rdp_rgb_dither(num);
0 magic square
1 standard
2 random
3 disabled

- rdp_alpha_dither(num);
0 pattern
1 ~pattern
2 random
3 disabled

Random rgb dither applied on the sky surface, can be useful for film effect or Silent Hill filter.
Image

Random rgb dither + rdp_rgba_scale test.
Image

These new functions needs at least 1CYCLE mode since COPY bypasses most of the RDP.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 66 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Google Adsense [Bot] and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group