It is currently Thu Nov 23, 2017 9:57 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 63 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Fri Jun 09, 2017 2:25 am 
Offline

Joined: Thu Oct 05, 2006 6:29 am
Posts: 911
Quote:
I can whine at him to check the repository

I was saying that he's been pretty quick to respond in my experience, if BMBx64's concern was that he's completely abadoned dragonlib, so no need for nagging :wink:

These routines look like they could be useful to others, and I think it's far superior to have a single up-to-date repository on github, rather than a bunch of enhanced code-drops scattered on various forums.


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 6:07 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 23
lidnariq nice :) , do you know any values i could use for testing?

I regret not buying an Everdrive with USB since i have to do all the tests with SD.

--
mic_ i may try, if he is still active i have also a bunch of questions for him :wink:

Like all the commands to set the RDP, for some reason he did alpha blending with CPU instead of using color combiner on RDP, even by software the 16bit mode have no alpha blending support (just transparency, show pixel/not show), when by performance should be the prefered mode.

The games are set to run at 60fps by default, but it comes with frameskip, i would prefer a frameskip set to 0, specially for 2D, or a visible option to change that, i didnt figure yet the right value.

At least i found why the program syncs to 25fps when the audio buffer is filled, however i should use another way to play audio.
Code:
(audio.c)
#define CALC_BUFFER(x)  ( ( ( ( x ) / 25 ) >> 3 ) << 3 )

--
Right now im building a small tool, its a texture converter, plus other things.

Image
- Reads a folder and converts all PNG into N64 textures (sprite) in 1 step.
- Splits the sprite to fit TMEM in compatible texture sizes, manual setting is also available
- 2 modes, sprites and supports tilesets for large images (scroll)
- Can generate an array (code) for the tileset (will compare repeated tiles and exclude them)
- Have animator viewer and center align support (like the SOTN pic showed on the previous page)
- Uses hslices and vslices header for the center align X/Y (since i removed stride from libdragon, so i could use them to carry other data)

I may add 4bit or 8bit sprite support or texture compression when i advance a bit more on the lib, i may upload a beta of the tool on the next days.


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 9:50 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 584
Perhaps you could get one of the SD cards with wifi?


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 12:36 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6449
Location: UK (temporarily)
BMBx64 wrote:
lidnariq nice :) , do you know any values i could use for testing?
It's going to be so dependent on the specific hardware, so it's hard to say. On the bright side, if you're only using it to transfer texture data and not code, it doesn't matter if you get garbage out, because the CPU and GPU won't crash.

So ... you could just start with the defaults I specified and start decreasing them. I have no idea why it reserves 1 whole microsecond after it's loaded the address before it starts reading. Maybe making a tool to directly play around with the default values I specified and let you set them, and alternate between different textures every other vblank to let you watch for video corruption?

Oh, but don't change the "autoincrement_address_bits" register. That's a sharp-edged behavior of the memory you're reading from/writing to. Values that are too low will hurt performance, and values that are too high will just not work at all. The original N64 mask ROMs strictly support no larger value than "7". The Everdrive 64 might support more, but... unless you want to make software that can only run on the everdrive 64 you shouldn't go higher.

The math gets kinda gross for an unaligned transfer, so let's just do the math for a fully-aligned transfer.
It looks like the address-loading cycle (the duration that either ALE_H or ALE_L are high) takes a total of 21 FSB cycles. (I could be wrong, but it's my best guess given both my own low-precision logic analyzer trace (80ns sample time) and the 10ns one from crazynation there)
So each address load takes (22+time_after_address_latch) FSB cycles.
Assuming what I've called "dunno" is actually the amount of time /RD or /WR is high between I/O within the same block...
Each word read takes (time_during_read_or_write+1+dunno+1) FSB cycles.
Transferring 256 words in a single perfectly aligned block should thus take (22+time_after_address_latch)+256×(time_during_read_or_write+1+dunno+1).
With the defaults, that means transferring 256 words, or 512 bytes, should take 5974 FSB cycles, or 95584ns : 5.36MB/sec

... There's some time taken in the middle of a multiple-block transfer after the last /RD before it loads the next cycle also. I don't know what controls that at all. It looks like it's usually 500ns? But I don't know if that's fixed or a property of some other register.

Do you have access to some high-resolution timer? It might be interesting just to see if the constants I called out (52 FSB cycles to load an address? Extra gaps between every 64 transfers?) actually are a function of the register contents, or are truly fixed.


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 1:20 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19252
Location: NE Indiana, USA (NTSC)
I'm guessing the 1000 ns wait for seek might have been to support slower memory technologies with a wide row size. This way, the memory would spend 1000 ns reading an entire row into a 2048-bit buffer and stream 16 bits of that out per read clock.

You don't have to accommodate the limits of N64 mask ROMs if you're making your own memory. The only limit is how many bits of counter you can stuff into the memory controller CPLD on the cart. How close are we to a homebrew clone of the N64 key chip?

Does the N64 have an MMU? If so, did any games just mmap the cart, paging in data on demand rather than treating it like an 8 to 64 MB file system?


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 1:35 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6449
Location: UK (temporarily)
tepples wrote:
How close are we to a homebrew clone of the N64 key chip?
Done. There's already one for the ATtiny25 and there's ... uh, a public release of 99% of one for PICs. That author intentionally omitted the seed/boot sector checksums, but included everything else.

Quote:
Does the N64 have an MMU?
Yes, but it also has an MMU bypass. Half the memory map is an ordinary TLB; the other half is direct-mapped.
Quote:
If so, did any games just mmap the cart, paging in data on demand rather than treating it like an 8 to 64 MB file system?
The cart interface seems to be particularly bad at ordinary CPU requests. The entire structure is really built around "manually ask the RCP to transfer data between the cart and RDRAM"; there's even a warning in libdragon:
Code should never make raw 32-bit reads or writes in the cartridge domain as it could collide with an in-progress DMA transfer or run into caching issues.

(edit) The PIF IPL does do a bunch of raw 32-bit reads and writes, but it also busy-waits to make sure that any DMA is done before it starts that and a bunch of other things.


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 7:54 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19252
Location: NE Indiana, USA (NTSC)
lidnariq wrote:
tepples wrote:
did any games just mmap the cart, paging in data on demand rather than treating it like an 8 to 64 MB file system?
The cart interface seems to be particularly bad at ordinary CPU requests. The entire structure is really built around "manually ask the RCP to transfer data between the cart and RDRAM"

So is the ATA or SATA interface in a PC. Yet operating systems implement mmap() and the swap file by having the page fault handler start a DMA transfer of a 4096-byte chunk of a file into a page of RAM and then mapping that page of RAM to the page of virtual address space that was accessed. The thread that made the read gets blocked until the DMA completes.


Top
 Profile  
 
PostPosted: Fri Jun 09, 2017 8:02 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6449
Location: UK (temporarily)
That's an entirely different level of abstraction, though. That's "just" having a fault handler, and dispatching the DMA-based data transfers in response to a fault, instead of using CPU-based I/O.

(You're talking about the difference between read() and mmap(); I'm talking about the difference between IDE PIO and ATA UDMA)

It also assumes enough of a microkernel to support faults and (ideally) preemptable threads, which isn't yet part of libdragon.


Top
 Profile  
 
PostPosted: Tue Jun 20, 2017 5:24 pm 
Offline
User avatar

Joined: Sat Aug 11, 2007 9:36 am
Posts: 15
Great work so far. You can reprogram the PI timing registers at runtime. Commercial games do this when accessing 64DD and cart SRAM/Flashram data. Actually, sram/flash is bursted at a much faster speed than normal cart reads. Yes, you can reprogram cart domain to run faster but it only speeds up DMAs slightly, it doesn't affect execution. Also some backup devices won't support really fast timings. Cart ROM is basically bulk memory like a disk.
THere are 4 parameters:
1. Latency (from addres latch to first r/w)
2. Pulse width (rd/wr active low time)
3. Release width (delay after active time)
4. Page size (2^y bytes per address latch)

This is one of my checks in the MGC 2011 demo to see if it's running on an emulator or not.


You should join #n64dev irc on efnet, there is lots of stuff going on. libdragon has a lot of limitations and there's a replacement in the works -so far with threading, message queues, dynamic heap management, much faster context switchign etc.

_________________
retroactive.be


Top
 Profile  
 
PostPosted: Wed Jun 21, 2017 7:17 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 23
Sorry for the late reply.

calima wrote:
Perhaps you could get one of the SD cards with wifi?

I got a Toshiba Flashair last week but sadly didnt went well, so i returned it.
--

lidnariq thanks for the info, i will start the testings soon.
--

marshallh that sounds very interesting :D , i may look into that irc channel.
--

I have uploaded a small beta of "Sprite64", it needs some betatesting though, i will explain a bit all the features.

Image

Input: png folder (drop all the pngs here, will ignore any other file or subfolder)
Output: sprite folder (N64 sprite textures)

INCLUDE CODE
If enabled the tool will generate arrays of all the data generated.

INCLUDE PNG
Do a PNG replica of the converted textures as well, plus other features.

ZLIB
The textures compressed will have .zsprite extension instead of .sprite, it needs zlib on N64, not sure if i can find one already ported.

GROUP / SEPARATED
Group: Everything on the same folder, textures will be named in order, starting from 0 on sprites and 1 on tilemaps (0 is empty).
Separated: Every png conversion have his own folder (useful when many sprites)

OPTIMIZE
The tool will use the right TMEM size based on performance preference, an example:
Image

Libdragon limits:
- Horizontal size: textures have to be pair, the tool fixes uneven sizes by adding empty space.
- Vertical size: textures can be any of these sizes: 4,8,16,32,64,128,256
- Maximum size: 256 width or height

OPTIMIZE L/R
Based on the optimized textures generated the tool will attemp to find any empty area for the best result.
Image

Optimize L: Optimize from the left, this can cause alignment problems, but i did a workaround on libdragon.
Optimize R: Optimize from the right, no issues found.

Improvements:
Original texture of 68x69: 18,3KB (32bit)
Texture conversion (68x72, 9 steps * 8 height): 19,1KB
Optimize R: 14,8KB
Optimize L+R: 11,2KB
Zlib: 1,60KB

* Optimize works diferently on 16bit mode.

RGB
Will make transparent the color input for all the png files, it also supports transparent alpha channel.

PNG TO TILEMAP
Will convert a png into a tilemap while converts the tiles into N64 textures.

If include png is enabled will generate:
- A copy of the full map
- A copy of all the tiles
- A tileset of all the tiles generated

If include code is enabled will generate:
- Different arrays of the tilemap as .c files (code.c)

CUSTOM
Allows a custom valid size, even if its beyond the 4KB TMEM, in case we want to use software render.

CHECK TILE
Will check if the tile is repeated by mirroring and generate flags.c if any found.

SMB map example
Image

Tileset of 8x8 (but we want bigger textures on N64 for better performance)
Image

Tileset of 16x16, less tile match, but better performance.
Image

Goldenaxe II example
Image

Tileset of 16x16, deducted mirrored tiles
Image

- Tiles that are completely transparent will be deducted.
- If check tile and include code are disabled, all the tiles from the map will be generated. (since we don't know the array)

PNG TO SPRITE
The second step of the tool is animator viewer and CP editor.
Image

The gif is a bit self explanatory.

Tick is the delay of the animation.
View provides a ghost reference for faster align.
Rect shows all the rectangles and the number of textures generated for that concrete png.
BG changes the background, so we can test if a sprite have "hidden transparent pixels" inside.

Controls:
F1 - Normal window size
F2 - Double size
Left / Right - Anim left or right
Up / Down (mouse wheelup/wheeldown) - Zoom
Num keys - Input
Mouse for anything else

This will generate extra files if include code is enabled:
- animcp_save.dat, the session of the program in case we want to edit later.
- animcp.cp arrays of all the editable content
* CP x and CP y are embedded on sprite format by using hslices/vslices, so you don't need to save or remember them.

The downloads are attached, both are in WIP state.

Avoid rdp_cp_sprite_scaled since i didnt fix the libdragon yet, use rdp_cp_sprite instead.


Attachments:
rdp.rar [8.58 KiB]
Downloaded 38 times
Sprite64.rar [1.68 MiB]
Downloaded 36 times
Top
 Profile  
 
PostPosted: Fri Jun 23, 2017 8:32 pm 
Offline
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3114
Location: Nacogdoches, Texas
I'm so glad to see a fully featured tilemap on the N64. :) :beer: I'm assuming it just loads new data into the texture cache for every tile, but would it be possible to gain any speed by loading a set of 64 8x8 tiles, or 16 16x16 tiles, and filling all the spots that use these first? I have no idea how you'd reasonably do this, (as in not having the CPU search through each tilemap a ton of times) other than by making your tilemap just a giant linked list. What's the main bottleneck in making a tilemap on the N64 anyway?


Top
 Profile  
 
PostPosted: Sun Jun 25, 2017 3:06 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 23
Espozo wrote:
I'm so glad to see a fully featured tilemap on the N64. :) :beer: I'm assuming it just loads new data into the texture cache for every tile, but would it be possible to gain any speed by loading a set of 64 8x8 tiles, or 16 16x16 tiles, and filling all the spots that use these first? I have no idea how you'd reasonably do this, (as in not having the CPU search through each tilemap a ton of times) other than by making your tilemap just a giant linked list. What's the main bottleneck in making a tilemap on the N64 anyway?


Yeah, for backgrounds i use 1 call of 64x32 textures usually, for main scroll depending on tile coincidence, but i would prefer something like 32x32, i have 2 basic optimizations:
- Empty/transparent tiles are not processed (no rectangles are generated, no textures are check)
- Last tile texture check (if they are using the same texture)

For the Super Mario Bros map could be great replacing the blue background for transparent, then using the rdp in fillcolor mode to fill the tile of concrete color or even the whole screen, i believe is faster a 320x240 fill color call, than 8 or 9 calls per tile, i can do a test of that.

Libdragon can load 8 textures max into TMEM, but i think it was a bit buggy, i removed that feature long time ago.

If you do many calls the rdp queue bottlenecks the system, on a test was faster to draw a single pixel with the cpu than draw a single pixel with the rdp when you do unique calls.


Top
 Profile  
 
PostPosted: Fri Jun 30, 2017 4:19 am 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 23
So i did some tests with a tile scroll.

I've used the same SMB map i been showing (16x16 tiles to fill a 320x240 screen, instead 256x240), 16bit tex, with the following results:
Image

- 132 fps
- 172 fps (last texture check)
- 273 fps (remove blue sky texture and replace it by a full screen fillcolor rectangle)
- 318 fps (disable flush textures)

By looking at the results it seems the very basic texture check does a good performance boost, but this will be very dependant on the map, fillcolor is pretty fast, a better texture strategy could increase the fps.

DOWNLOAD
https://mega.nz/#!84AU1TwJ!jkMuXQgISfD36pZbtqtqPW0XtSLqqAVjUyxkmXVFk_E

For this second map i've used textures of 64x32 or 32x64, 7 scroll layers with relative speed, 16bit tex.
Image

Performance is not great, about 80fps, but no optimizations were made (disable flush texture 114fps), 39fps if all the layers were 16x16.

Maybe on this case some kind of checkerboard optimization could be good, to discard tiles that are fully covered by others, also a better texture strategy.

DOWNLOAD
https://mega.nz/#!g0BWnTbA!TgkCuPlWfxNHw3onUrUT10B-E9_TfTHikWUdi9I2yhU


Top
 Profile  
 
PostPosted: Sun Jul 09, 2017 7:11 pm 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 23
Testing some kind of scroll optimizations..

This map is a good example, have 6 scroll layers, but mostly covered by the main layer:
Image

Performance (same spot area):
- 90fps (main scroll - 16x16 tiles)
- 116fps (64x32 tiles)

Each tile checks a binary list to discard covered tiles.. (this could be really slow, im not sure if could be faster generating lists of every layer and then checking them in 1 step or checking each valid tile)
Image

Performance:
- 159fps (main scroll - 32x32 tiles)
- 178fps (64x32 tiles)

If 64x32 tiles are used this is less efective, but still faster in most of scroll areas (some others are slower), so this fluctuates between 178 and 130 on the worst case.
Image

These maps seems to load faster if there are less files to load, even if they weight more in size (16x16 306KB vs 64x32 536KB).

Controls
- Joystick
- Z to disable main layer

DOWNLOAD
https://mega.nz/#!t4RSQCpT!gEQYQ_SJGDnqKV7h3TvAtZJybtj6mz_zEWrDE8DRDJQ

I may post a playable demo of something soon :beer:


Top
 Profile  
 
PostPosted: Thu Jul 13, 2017 12:25 pm 
Offline
User avatar

Joined: Thu May 25, 2017 7:27 am
Posts: 23
Added display list for drawing scroll in texture order, uses qsort.
Image

Test:
- 104fps (16x16 tiles)
- 169fps (with texture order)

Normal tiles VS mirrored tiles: (to save space)

Normal
x= 0 - 169fps
x= 194 - 161fps
x= 974 - 144fps
x= 1552 - 171fps

Mirrored
x= 0 - 164fps
x= 194 - 158fps
x= 974 - 139fps
x= 1552 - 166fps

DOWNLOAD
https://mega.nz/#!R9R2RAQC!vLy-XwgGz6lR2irvO7twmrgMxHaSw4zGinRSJCzG2y0


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 63 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group