3DS reverse engineering

Discussion of development of software for any "obsolete" computer or video game system.
profi200
Posts: 64
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 » Mon May 17, 2021 3:28 pm

nocash wrote:
Mon May 17, 2021 7:28 am
Okay, now I got the delay reproduced using SDMMCCTL on 3DS. I don't know if there's a similar way to trigger the delay by software on DSi.
I got the same timings on 3DS, "400h SHL (0..14) HCLKs" for value 0..14. And "100h HCLKs" for value 15.
Status bit4 and bit5 get set after that delay.

Alongsides, I've noticed that Status bit9 and bit10 are also getting set about 2D000h..36000h HCLKs (=circa 3ms) after activating SDMMCCTL. That happens even without cart inserted, it's not useful, but it's interesting: Apparently, at that time, the data line pull-up resistors are receiving enough volts to be treated as "high", though the supply is probably still less than 3.3V at that point. So I've scope-checked the SD Slot 3.3V power pin on DSi and Old3DS...

On the DSi, it does increase almost linearily from 0V to 3V within 3ms, and then suddenly jumps from 3V to 3.3V. That's happening on power-up, before booting from SD/MMC, so software won't have to care about that.
On the 3DS, it does increase somewhat sine-like from 0V to 3.3V within 4.2ms (once when activating it via SDMMCCTL bit0=0?). Together with the 1ms card boot up time, the delay should be at least 5.2ms. So, yes, setting 9 (almost 8ms) sounds right, and 8 (almost 4ms) would be too short.

I didn't knew the toshsd.* source code. Yup it looks closer (or more complete, the other chips should have the Error bits, too, but the tmio source code didn't mind to define that bits).
Small addition to what i said above. A controller reset does not trigger the card detection timer but it does reset the whole reg so both timeout and card detection are set to 14. 15 is probably a test mode as said to test your driver for proper error handling (less useful for card detection).

Yeah, bit 9 and 10 are the DAT3 versions of 4 and 5. It's wonky in practice and Process9 does ignore these bits (9 is also getting masked). Something i have not tried with SDMMCCTL is if we can move the SD card port glitch-free from one controller to the other if we leave the power on. I also don't know how the power bits are implemented in hardware. Maybe it's a few more GPIOs that go to the PMIC or mosfets? I doubt the silicon is switching that load since SD cards can pull a few hundred mA.

Jumping from 3V right up to 3.3V is quite odd. The spec recommends a smooth ramp-up.

Toshiba seems to be making a huge number of variants of these controllers matching the requirements of the customer. I really think Nintendo could have chosen a better one. The power of 2 divider alone is a no-go (Why Toshiba? A integer clock divider is piss easy to implement.). But i guess they went with it for cost. But them implementing a whole bunch of per-CMD response types in hardware requires a lookup table and die space again. This is so inconsistent. And let's not even start with the 32 bit FIFO they duct taped on top...

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Mon May 17, 2021 5:39 pm

I've just tested which ARM register/bit is controlling the SD Slot's 3.3V power: It's done via SDMMCCTL.bit0=0 (and can be also re-disabled via bit0=1). I should have tested that when the console was still disassembled, the TP13 power supply does probably not come from the CPU directly, so there should be another TP testpoint with a "logic power on" signal passed from CPU to Powerman.

SDIO Wifi 3.3V uses the system wide 3.3V supply, and eMMC probably too (at least the eMMC pull-ups do so, I couldn't check the BGA supply pins underneath of the actual eMMC chip). Anyways, SDMMCCTL.bit1 and bit2 may enable something internally for the controllers, but they don't affect the external supply voltage.

For SDMMCCTL, how about if we rename it to SDMMC_CTL? Or SDMMC_CNT?
When comparing Old3DS and New3DSXL, I've noticed that Old3DS has bit3=1, but New3DSXL has bit3=0 (as initial power up values).
And bit7 is always 0 on Old3DS, but it's R/W on New3DSXL.
I don't really know what bit3, bit6, and bit7 are doing.

What makes timing setting 15 a test mode? To me, it's just the smallest delay setting. For the timeout it's nonsense because it does timeout before even starting. For card detect delay it could be more useful: If you know that power is already on, then the smallest delay would be best (eg. if you do really want to switch between ARM9 and ARM11 controllers).

For the problem with sending multiple write commands: That sounds like a State issue, http://problemkaputt.de/gbatek-dsi-sd-m ... -state.htm during writes, it's switching to "rcv" and "prg" state, and most commands (including further writes) will be ignored in those states. Polling status via CMD13 should solve that (wait until you are back in "tran" state).
The write duration may vary on slower cards or when replacing broken sectors. STOP_TRANSMISSION (at end of WRITE_MULTIPLE) does reply with R1b (status plus busy flag), maybe IRQ_STAT.bit29 can use that busy flag to know when the write has completed (that might also imply being back in "tran" state, not sure there).

EDIT: I am not using CMD13 after writes in my own code (though maybe I should do so).
What I am doing is - after the LAST block of READ/WRITE_MULTIPLE commands - wait for DATAEND flag or any ERROR bits to get set (IRQ_STAT bit2 or bit16-22).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

profi200
Posts: 64
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 » Tue May 18, 2021 3:21 am

nocash wrote:
Mon May 17, 2021 5:39 pm
I've just tested which ARM register/bit is controlling the SD Slot's 3.3V power: It's done via SDMMCCTL.bit0=0 (and can be also re-disabled via bit0=1). I should have tested that when the console was still disassembled, the TP13 power supply does probably not come from the CPU directly, so there should be another TP testpoint with a "logic power on" signal passed from CPU to Powerman.

SDIO Wifi 3.3V uses the system wide 3.3V supply, and eMMC probably too (at least the eMMC pull-ups do so, I couldn't check the BGA supply pins underneath of the actual eMMC chip). Anyways, SDMMCCTL.bit1 and bit2 may enable something internally for the controllers, but they don't affect the external supply voltage.

For SDMMCCTL, how about if we rename it to SDMMC_CTL? Or SDMMC_CNT?
When comparing Old3DS and New3DSXL, I've noticed that Old3DS has bit3=1, but New3DSXL has bit3=0 (as initial power up values).
And bit7 is always 0 on Old3DS, but it's R/W on New3DSXL.
I don't really know what bit3, bit6, and bit7 are doing.

What makes timing setting 15 a test mode? To me, it's just the smallest delay setting. For the timeout it's nonsense because it does timeout before even starting. For card detect delay it could be more useful: If you know that power is already on, then the smallest delay would be best (eg. if you do really want to switch between ARM9 and ARM11 controllers).

For the problem with sending multiple write commands: That sounds like a State issue, http://problemkaputt.de/gbatek-dsi-sd-m ... -state.htm during writes, it's switching to "rcv" and "prg" state, and most commands (including further writes) will be ignored in those states. Polling status via CMD13 should solve that (wait until you are back in "tran" state).
The write duration may vary on slower cards or when replacing broken sectors. STOP_TRANSMISSION (at end of WRITE_MULTIPLE) does reply with R1b (status plus busy flag), maybe IRQ_STAT.bit29 can use that busy flag to know when the write has completed (that might also imply being back in "tran" state, not sure there).

EDIT: I am not using CMD13 after writes in my own code (though maybe I should do so).
What I am doing is - after the LAST block of READ/WRITE_MULTIPLE commands - wait for DATAEND flag or any ERROR bits to get set (IRQ_STAT bit2 or bit16-22).
Dunno about eMMC and WiFi module. The eMMC i can get to not respond setting bit 1 of that reg. WiFi module is untested. The eMMC should have a power bit, because there are a few cases where this is needed (certain errors require power cycling). It would also be interesting for power saving.

I will see later if i can get a better name for this reg. I documented most bits here btw: https://www.3dbrew.org/wiki/CONFIG9_Reg ... 9_SDMMCCTL
Bit 7 i didn't know exists. Must be a small change on the upgraded "LGR" SoC.

Test mode because it doesn't really match the calculation and it's short enough to trigger errors.

I don't think that's it. The busy signal is exactly there for this so the card can signal when it finished the writes. Even then, if the card ignores the write cmd a timeout should occur but it does not.
This isn't documented (not even on gbatek). Bit 23 of the status is SD card busy (DAT0). It's 1 when the card is holding DAT0 low.

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Tue May 18, 2021 8:15 am

Just tried status bit23 (on DSi) with externally GNDing the data pins... yes, it's Data0, works here, too (and can be polled even without executing any command/transfer). But it's 0=low, 1=high (opposite of what you said). And it works for both eMMC and SD/MMC Slot (depending on which one is currently selected) (unlike the Data3 flag in bit10, which is always from SD/MMC Slot).

Test mode just sounds as if there's more going on, misleading people into thinking that they can unlock debug registers or execute built-in self tests.

Yeah, needing to poll the "state" via CMD13 may have been nonsense. It might also work with the busy bits, the trick is to know which ones. There are so many of them, bit0=CMDRESPEND, bit2=DATAEND, bit29=CMD_READY, bit30=CMD_BUSY. Maybe some of them might toggle twice (for the WRITE, and for STOP_TRANSMISSION).
I am using only bit0 (when sending any commands) and bit2 (after sending data, and before sending another command thereafter), the undocumented flags in bit29/bit30 are left unused. It's working well enough for the unlaunch installer, without being flooded by complaints about bricked consoles.

For SDMMCCTL:
Errors that require a power cycle might occur once in a million years (or so), and then it should be enough to recover by pushing the power button.
Power saving might be nice, unless the Wifi and eMMC can be switched into low-power modes anyways. On DSi, they did even have the SD/MMC slot permanently powered (even when running in NDS mode). Maybe they needed to add SD/MMC Slot power control on 3DS to save energy when somebody is using cheap power hungry memory cards, unfortunately, they seem to have also disabled the card detect switch in that state).
Yes, bit3 might be just some boring unused hardware, hard to say.
Bit6 might be wifi related, not sure if the theory about wifi pull-ups is right, or what pull-ups that's referring to, the SDIO data lines?
I've tried to toggle bit3,6,7 to see if that could magically switch SD slot power from 3.3V to 1.8V mode or so (it can't). I haven't tested if it could change the SDCLK rate. Other than that, I've no idea what the bit(s) could be good for.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

profi200
Posts: 64
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 » Fri May 21, 2021 6:57 am

nocash wrote:
Tue May 18, 2021 8:15 am
Just tried status bit23 (on DSi) with externally GNDing the data pins... yes, it's Data0, works here, too (and can be polled even without executing any command/transfer). But it's 0=low, 1=high (opposite of what you said). And it works for both eMMC and SD/MMC Slot (depending on which one is currently selected) (unlike the Data3 flag in bit10, which is always from SD/MMC Slot).

Test mode just sounds as if there's more going on, misleading people into thinking that they can unlock debug registers or execute built-in self tests.

Yeah, needing to poll the "state" via CMD13 may have been nonsense. It might also work with the busy bits, the trick is to know which ones. There are so many of them, bit0=CMDRESPEND, bit2=DATAEND, bit29=CMD_READY, bit30=CMD_BUSY. Maybe some of them might toggle twice (for the WRITE, and for STOP_TRANSMISSION).
I am using only bit0 (when sending any commands) and bit2 (after sending data, and before sending another command thereafter), the undocumented flags in bit29/bit30 are left unused. It's working well enough for the unlaunch installer, without being flooded by complaints about bricked consoles.

For SDMMCCTL:
Errors that require a power cycle might occur once in a million years (or so), and then it should be enough to recover by pushing the power button.
Power saving might be nice, unless the Wifi and eMMC can be switched into low-power modes anyways. On DSi, they did even have the SD/MMC slot permanently powered (even when running in NDS mode). Maybe they needed to add SD/MMC Slot power control on 3DS to save energy when somebody is using cheap power hungry memory cards, unfortunately, they seem to have also disabled the card detect switch in that state).
Yes, bit3 might be just some boring unused hardware, hard to say.
Bit6 might be wifi related, not sure if the theory about wifi pull-ups is right, or what pull-ups that's referring to, the SDIO data lines?
I've tried to toggle bit3,6,7 to see if that could magically switch SD slot power from 3.3V to 1.8V mode or so (it can't). I haven't tested if it could change the SDCLK rate. Other than that, I've no idea what the bit(s) could be good for.
Hmm, yeah i got bit 23 wrong. This bit should be useful for stuff like erase which can take multiple seconds if you erase large areas. The hardware timeout will not cut it here. I think the card/DAT3 detection can be changed to other ports of the same controller with reg 0xF8, 0xFA, 0xFC and 0xFE. Not sure.

I currently have this. It's pretty WIP but does work fine already:

Code: Select all

/*
 *   This file is part of open_agb_firm
 *   Copyright (C) 2021 derrek, profi200
 *
 *   This program is free software: you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation, either version 3 of the License, or
 *   (at your option) any later version.
 *
 *   This program is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

#include "types.h"
#include "hardware/toshsd.h"
#include "hardware/regs/toshsd.h"
#ifdef _3DS
#ifdef ARM9
#include "arm9/hardware/interrupt.h"
#include "arm9/hardware/cfg9.h"
#elif ARM11
#include "arm11/hardware/interrupt.h"
#endif // #ifdef ARM9
#elif TWL
// TODO: DSi IRQ stuff.
#endif // #ifdef _3DS


#ifdef _3DS
#ifdef ARM9
#define SLOT_PORT     (0u) // TODO: SD/MMC on controller 3 support.
#define eMMC_PORT     (1u)
#elif ARM11
#define SLOT_PORT     (2u)
#define eMMC_PORT     (3u) // Unused/not connected.
#endif // #ifdef ARM9

#elif TWL

// TODO: Slot IRQ number define.
#define SLOT_PORT     (0u)
#define eMMC_PORT     (1u)
#endif // #ifdef _3DS



static void toshsdIsr(UNUSED u32 id)
{
	Toshsd *const regs = getToshsdRegs(SLOT_PORT / 2u);
	regs->sd_status = ~(STATUS_INSERT | STATUS_REMOVE);
	// TODO: Some kind of event to notify the main loop.
}

void TOSHSD_init(void)
{
	// TODO: 3DS: Do we get controller 3 IRQs on the side the controller is NOT mapped to?
#ifdef _3DS
#ifdef ARM9
	IRQ_registerIsr(IRQ_SDIO_1, toshsdIsr); // TODO: SD/MMC on controller 3 support.
	IRQ_registerIsr(IRQ_SDIO_3, NULL);
	// IRQ_SDIO_1_ASYNC not needed.
	// IRQ_SDIO_3_ASYNC not needed.
#elif ARM11
	IRQ_registerIsr(IRQ_SDIO2, 14, 0, NULL);
	IRQ_registerIsr(IRQ_SDIO3, 14, 0, toshsdIsr);
	//IRQ_registerIsr(IRQ_SDIO2_IRQ, 14, 0, toshsdIsr); // TODO: Should we register this externally?
	// IRQ_SDIO3_IRQ not needed.
#endif // #ifdef ARM9
#elif TWL
	// TODO: DSi IRQ stuff.
#endif // #ifdef _3DS

	// Reset all controllers.
	for(u8 i = 0; i < 2; i++)
	{
		Toshsd *const regs = getToshsdRegs(i);
		// Setup 32 bit FIFO.
		regs->sd_fifo32_cnt   = FIFO32_CLEAR | FIFO32_ENABLE;
		regs->sd_blocklen32   = 512;
		regs->sd_blockcount32 = 1;
		regs->dma_ext_mode    = DMA_EXT_DMA_MODE;

		// Reset. Unlike similar controllers no delay is needed.
		regs->soft_rst = SOFT_RST_RST;
		regs->soft_rst = SOFT_RST_NORST;

		regs->sd_portsel                = PORTSEL_P0;
		regs->sd_blockcount             = 1;
		regs->sd_status_mask            = STATUS_MASK_DEFAULT;
		regs->sd_clk_ctrl               = SD_CLK_DIV_128;
		regs->sd_blocklen               = 512;
		regs->sd_option                 = OPTION_BUS_WIDTH1 | OPTION_UNK14 | 0xE9; // ~7 ms card detection time.
		regs->ext_card_detect_mask      = 0xDB;
		regs->ext_card_detect_dat3_mask = 0xDB;

		// SDIO and power init here?
	}

#ifdef _3DS
#ifdef ARM9
	// TODO: Make this more configurable.
	// Note: The power bits don't affect regular card detect. Port remapping does.
	// TODO: Can we switch controllers/ports glitch-free?
	getCfg9Regs()->sdmmcctl = SDMMCCTL_SD_TMIO1_SEL | SDMMCCTL_TMIO3_MAP11 | SDMMCCTL_UNKBIT6;
#endif // #ifdef ARM9
#endif // #ifdef _3DS
}

// TODO: Deinit function with(out) poweroff?

void TOSHSD_initPort(ToshsdPort *const port, u8 portNum)
{
	// Reset port state.
	port->portNum     = portNum;
	port->sd_clk_ctrl = SD_CLK_DIV_128;
	port->sd_blocklen = 512;
	port->sd_option   = OPTION_BUS_WIDTH1 | OPTION_UNK14 | 0xE9;
}

static void setPort(Toshsd *const regs, const ToshsdPort *const port)
{
	// TODO: Can we somehow prevent all these reg writes each time?
	//       Maybe some kind of dirty flag + active port check?
	regs->sd_portsel    = port->portNum % 2u;
	regs->sd_clk_ctrl   = port->sd_clk_ctrl;
	const u16 blocklen = port->sd_blocklen;
	regs->sd_blocklen   = blocklen;
	regs->sd_option     = port->sd_option;
	regs->sd_blocklen32 = blocklen;
}

bool TOSHSD_cardDetected(void)
{
	return getToshsdRegs(SLOT_PORT / 2u)->sd_status & STATUS_DETECT;
}

bool TOSHSD_cardSliderUnlocked(void)
{
	return getToshsdRegs(SLOT_PORT / 2u)->sd_status & STATUS_WP_UNLOCK;
}

// TODO: Clock in Hz?
void TOSHSD_setClock(ToshsdPort *const port, u16 clk)
{
	// On SD/MMC init we have to permanently turn on clock
	// for a while so this needs to immediately take effect.
	port->sd_clk_ctrl = clk;
	getToshsdRegs(port->portNum / 2u)->sd_clk_ctrl = clk;
}

static void getResponse(Toshsd *const regs, ToshsdPort *const port, u16 cmd)
{
	if((cmd & 0x700u) != CMD_RESP_R2)
	{
		port->resp[0] = regs->sd_resp[0];
	}
	else // 136 bit responses need special treatment...
	{
		u32 resp[4];
		for(u32 i = 0; i < 4; i++) resp[i] = regs->sd_resp[i];

		port->resp[0] = resp[3]<<8 | resp[2]>>24;
		port->resp[1] = resp[2]<<8 | resp[1]>>24;
		port->resp[2] = resp[1]<<8 | resp[0]>>24;
		port->resp[3] = resp[0]<<8; // TODO: Add the missing CRC7 and the always 1 bit?
	}
}

static void doCpuTransfer(Toshsd *const regs, u16 cmd, u32 *buf)
{
	const u32 blockLen = regs->sd_blocklen; // | TODO: GCC adds a weird & to one of these 2.
	u32 blockCount = regs->sd_blockcount;   // |
	vu32 *const fifo = getToshsdFifo(regs);
	if(cmd & CMD_DIR_R)
	{
		do
		{
			__wfi();
			if(regs->sd_fifo32_cnt & FIFO32_FULL) // RX ready.
			{
				//regs->sd_status = ~STATUS_RX_RDY; // Acknowledge.

				const u32 *const blockEnd = buf + (blockLen / 4);
				do
				{
					buf[0] = *fifo;
					buf[1] = *fifo;
					buf[2] = *fifo;
					buf[3] = *fifo;

					buf += 4;
				} while(buf < blockEnd);

				blockCount--;
			}
		// TODO: Check detect bit (needs insert/remove IRQ handling).
		// TODO: Use DATA_END IRQ instead of counter?
		} while((regs->sd_status & STATUS_MASK_ERR) == 0 && blockCount);
	}
	else
	{
		// TODO: Write first block ahead of time.
		// gbatek Command/Param/Response/Data at bottom of page.
		do
		{
			__wfi();
			if(!(regs->sd_fifo32_cnt & FIFO32_NOT_EMPTY)) // TX request.
			{
				//regs->sd_status = ~STATUS_TX_REQ; // Acknowledge.

				const u32 *const blockEnd = buf + (blockLen / 4);
				do
				{
					*fifo = buf[0];
					*fifo = buf[1];
					*fifo = buf[2];
					*fifo = buf[3];

					buf += 4;
				} while(buf < blockEnd);

				blockCount--;
			}
		// TODO: Check detect bit (needs insert/remove IRQ handling).
		// TODO: Use DATA_END IRQ instead of counter? This may be difficult for write
		//       if we want the last block to be processed in background.
		} while((regs->sd_status & STATUS_MASK_ERR) == 0 && blockCount);
	}
}

u32 TOSHSD_sendCommand(ToshsdPort *const port, u16 cmd, u32 arg)
{
	Toshsd *const regs = getToshsdRegs(port->portNum / 2u); // TODO: gcc generates a udf instruction for this line.

	// TODO: Handle this differently. The last block of the previous
	//       write transfer may still be in progress.
	// TODO: Why is waiting for CMD_BUSY getting cleared not enough?
	//       Hangs on 2 single block writes in a row otherwise.
	while((regs->sd_status & (1u<<29 | STATUS_CMD_BUSY)) != 1u<<29);

	setPort(regs, port);
	regs->sd_blockcount   = port->blocks;
	// sd_blockcount32 doesn't need to be set for the 32 bit FIFO to work.
	regs->sd_stop         = ((cmd & CMD_MBT) ? STOP_AUTO_STOP : 0); // TODO: Works with SDIO?
	regs->sd_arg          = arg;
	regs->sd_fifo32_cnt   = ((cmd & CMD_DIR_R) ? FIFO32_FULL_IE : FIFO32_NOT_EMPTY_IE) |
	                        FIFO32_CLEAR | FIFO32_ENABLE;
	regs->sd_cmd          = cmd; // Start

	u32 *buf = port->buf;
	// Check for data transfer and if buf is NULL (NULL means DMA).
	if((cmd & CMD_DT) && (buf != NULL))
		doCpuTransfer(regs, cmd, buf);

	// On multi-block read transfer response end fires
	// while reading the last block from FIFO
	// so we need to check before __wfi().
	while(!(regs->sd_status & STATUS_RESP_END)) __wfi();
	getResponse(regs, port, cmd);

	const u32 status = regs->sd_status & STATUS_MASK_ERR;
	regs->sd_status = STATUS_CMD_BUSY; // Acknowledge all but CMD busy.
	return status;
}

#ifdef ARM11
#include "arm11/fmt.h"
void TOSHSD_dbgPrint(ToshsdPort *const port)
{
	Toshsd *const regs = getToshsdRegs(port->portNum / 2u); // TODO: gcc generates a udf instruction for this line.

	ee_printf("Toshsd last cmd: %u\n"
	          " sd_status: 0x%lX\n"
	          " sd_err_status: 0x%lX\n",
	          regs->sd_cmd & 0xFFu,
	          regs->sd_status,
	          regs->sd_err_status);
	ee_printf(" sd_portsel: 0x%X\n"
	          " sd_clk_ctrl: 0x%X\n"
	          " sd_option: 0x%X\n"
	          " resp: ",
	          regs->sd_portsel,
	          regs->sd_clk_ctrl,
	          regs->sd_option);
	for(u32 i = 0; i < 4; i++) ee_printf("%08lX ", port->resp[i]);
	ee_puts("");
}
#endif
Eh, it's better to have the option than not in my opinion. eMMC does have a sleep mode but no current is still better than a little. I have no idea what these other bits really do. It's wild guesses. And unfortunately no 1.8V switch as far as i can tell. That would save some power. I don't fully understand how 1.8V mode works with SD/(e)MMC though. It seems only the signal voltage is switched and the rest is still 3.3V?

Btw bit 9 of sd_clk_ctrl is not just "clock freeze". It's a power saving feature. Clock will stop when the controller is idle and start when there is activity.

Here is the SD/(e)MMC part of my code in case there is something you didn't know about yet:

Code: Select all

#include <string.h>
#include "hardware/mmc/sdmmc.h"
#include "hardware/toshsd.h"
#ifdef _3DS
#ifdef ARM9
#include "arm9/hardware/timer.h"
#include "util.h" // wait_cycles()
#elif ARM11
#include "arm11/hardware/timer.h"
#endif // #ifdef ARM9
#elif TWL
	// TODO
#endif // #ifdef _3DS
#include "hardware/mmc/sd_spec.h"
#include "hardware/mmc/mmc_spec.h"


// Maximum clock according to the specs is 400 kHz at init.
// The controller can only do 261 or 523 kHz.
// In practice 523 kHz works very well (except for very old, 1 bit MMC)
// but if you want to play it safe comment out the FAST_INIT define.
//#define FAST_INIT          (1u)
#define IF_COND_ARG        (SD_CMD8_VHS_2_7_3_6V | SD_CMD8_CHK_PATT)
#define SD_OP_COND_ARG     (SD_ACMD41_XPC | SD_OCR_3_2_3_3V)          // We support 150 mA and 3.3V. Without HCS bit.
#define MMC_OP_COND_ARG    (/*MMC_OCR_SECT_MODE |*/ MMC_OCR_3_2_3_3V) // We support s̶e̶c̶t̶o̶r̶ a̶d̶r̶e̶s̶s̶i̶n̶g̶ a̶n̶d̶ 3.3V.
#define SD_OCR_VOLT_MASK   (SD_OCR_3_2_3_3V)                          // We support 3.3V only.
#define MMC_OCR_VOLT_MASK  (MMC_OCR_3_2_3_3V)                         // We support 3.3V only.

#ifdef _3DS
#ifdef ARM9
#define DEV_SLOT_PORT  (0u) // TODO: Support SD/MMC on controller 3.
#define DEV_eMMC_PORT  (1u)
#define DELAY_MULT     (1u) // Assumes ARM9 timer. Same speed as controller.
#elif ARM11
#define DEV_SLOT_PORT  (2u)
#define DEV_eMMC_PORT  (3u) // Not connected/accessible.
#define DELAY_MULT     (2u) // Assumes ARM11 timer. 2x controller speed.
#endif // #ifdef ARM9

#ifdef FAST_INIT
#define INIT_CLK       (1u<<5) // 523 kHz
#define INIT_DELAY     (DELAY_MULT * 128 * 74)
#else
#define INIT_CLK       (1u<<6) // 261 kHz
#define INIT_DELAY     (DELAY_MULT * 256 * 74)
#endif // #ifdef FAST_INIT

#define SDR12_CLK      (1u) // 16.756991 MHz
#define SDR25_CLK      (0u) // 33.513982 MHz

#elif TWL

#define DEV_SLOT_PORT  (0u)
#define DEV_eMMC_PORT  (1u)

#ifdef FAST_INIT
#define INIT_CLK       (1u<<4)         // 523 kHz
#define INIT_DELAY     (1u * 64 * 74)  // Assumes ARM9 timers. Same speed as controller.
#else
#define INIT_CLK       (1u<<5)         // 261 kHz
#define INIT_DELAY     (1u * 128 * 74) // Assumes ARM9 timers. Same speed as controller.
#endif // #ifdef FAST_INIT

#define SDR12_CLK      (0u) // 16.756991 MHz
#endif // #ifdef _3DS

#define CTYPE_NONE      (0u) // Unitialized/no card.
#define CTYPE_SDSC      (1u) // SDSC.
#define CTYPE_SDHC      (2u) // SDHC, SDXC.
#define CTYPE_MMC       (3u) // (e)MMC.
#define CTYPE_MMC_HC    (4u) // High capacity (e)MMC (>2 GB).


typedef struct
{
	ToshsdPort port;
	u8 cardType;
	u8 spec_vers;   // (e)MMC only SPEC_VERS from CSD. 0 for SD.
	u16 rca;        // Relative Card Address (RCA).
	// TODO: State?
	u16 ccc;        // SD/(e)MMC command class support. One per bit starting at 0.
	u32 sectors;    // Size in 512 byte units.

	// Cached card infos.
	u32 cid[4];     // Raw CID with the CRC zeroed out.
} SdmmcDev;

SdmmcDev g_devs[2] = {0};



/*static u32 sendCardStatus(ToshsdPort *const port, u32 rca, u32 *const statusOut)
{
	// Same CMD for SD/(e)MMC but the argument format differs slightly.
	const u32 res = TOSHSD_sendCommand(port, MMC_SEND_STATUS, rca);
	if(res == 0) *statusOut = port->resp[0];

	return res;
}*/

static u32 sdSendAppCmd(ToshsdPort *const port, u16 cmd, u32 arg, u32 rca)
{
	u32 res = TOSHSD_sendCommand(port, SD_APP_CMD, rca); // TODO: How do we handle the R1 response?
	if(res == 0)
	{
		res = TOSHSD_sendCommand(port, cmd, arg);
	}

	return res;
}

static u32 goIdleState(ToshsdPort *const port)
{
	// Enter idle state before we start the init procedure.
	// Works from all but inactive state. CMD is the same for SD/(e)MMC.
	// For (e)MMC there are optional init paths:
	// arg = 0x00000000 -> GO_IDLE_STATE.
	// arg = 0xF0F0F0F0 -> GO_PRE_IDLE_STATE.
	// arg = 0xFFFFFFFA -> BOOT_INITIATION.
	u32 res = TOSHSD_sendCommand(port, MMC_GO_IDLE_STATE, 0);
	if(res != 0) return SDMMC_ERR_GO_IDLE_STATE;

	return SDMMC_ERR_OK;
}

static u32 initIdleState(ToshsdPort *const port, u8 *const cardTypeOut)
{
	// Tell the card what interfaces and voltages we support.
	// Only SD v2 and up will respond. (e)MMC won't respond.
	u32 res = TOSHSD_sendCommand(port, SD_SEND_IF_COND, IF_COND_ARG);
	if(res == 0)
	{
		// If the card supports the interfaces and voltages
		// it should echo back the check pattern and set the
		// support bits. Since we don't support anything but
		// the standard SD interface at 3.3V we can check the
		// whole response at once.
		if(port->resp[0] != IF_COND_ARG) return SDMMC_ERR_IF_COND_RESP;
	}
	else if(res != TSD_ERR_CMD_TMOUT) return SDMMC_ERR_SEND_IF_COND;

	// Send the first app CMD. If this times out it's (e)MMC.
	// If previous CMD timed out tell the SD card we are a v1 host.
	const u32 opCondArg = SD_OP_COND_ARG | (res<<8 ^ SD_ACMD41_HCS);
	u8 cardType = CTYPE_SDSC;
	res = sdSendAppCmd(port, SD_APP_SD_SEND_OP_COND, opCondArg, 0);
	if(res != 0)
	{
		if(res == TSD_ERR_CMD_TMOUT) cardType = CTYPE_MMC;          // Continue with (e)MMC init.
		else                         return SDMMC_ERR_SEND_OP_COND; // Unknown error.
	}

	if(cardType == CTYPE_SDSC) // SD card.
	{
		// Loop until a timeout of 1 second or the card is ready.
		u32 tries = 199;
		u32 ocr;
		do
		{
			// Linux uses 10 ms but the card doesn't become ready faster
			// when polling with delay. Use 5 ms as compromise so not much
			// time is wasted when the card becomes ready in the middle of the delay.
			TIMER_sleepMs(5);

			res = sdSendAppCmd(port, SD_APP_SD_SEND_OP_COND, opCondArg, 0);
			if(res != 0) return SDMMC_ERR_SEND_OP_COND;

			ocr = port->resp[0];
		} while(--tries && !(ocr & SD_OCR_NOT_BUSY));

		// SD card didn't finish init within 1 second.
		if(tries == 0) return SDMMC_ERR_OP_COND_TMOUT;

		// TODO: From sd.c in Linux:
		// "Some SD cards claims an out of spec VDD voltage range.
		//  Let's treat these bits as being in-valid and especially also bit7."
		if(!(ocr & SD_OCR_VOLT_MASK)) return SDMMC_ERR_VOLT_UNSUPP;
		if(ocr & SD_OCR_CCS) cardType = CTYPE_SDHC;
	}
	else // (e)MMC
	{
		// Loop until a timeout of 1 second or the card is ready.
		u32 tries = 200;
		u32 ocr;
		do
		{
			res = TOSHSD_sendCommand(port, MMC_SEND_OP_COND, MMC_OP_COND_ARG);
			if(res != 0) return SDMMC_ERR_SEND_OP_COND;

			ocr = port->resp[0];
			if(!--tries || (ocr & MMC_OCR_NOT_BUSY)) break;

			// Linux uses 10 ms but the card doesn't become ready faster
			// when polling with delay. Use 5 ms as compromise so not much
			// time is wasted when the card becomes ready in the middle of the delay.
			TIMER_sleepMs(5);
		} while(1);

		// (e)MMC didn't finish init within 1 second.
		if(tries == 0) return SDMMC_ERR_OP_COND_TMOUT;

		// Check if the (e)MMC supports the voltage and if it's high capacity.
		if(!(ocr & MMC_OCR_VOLT_MASK)) return SDMMC_ERR_VOLT_UNSUPP; // Voltage not supported.
		// TODO: High capacity (e)MMC check.
	}

	*cardTypeOut = cardType;

	return SDMMC_ERR_OK;
}

static u32 initReadyState(SdmmcDev *const dev)
{
	ToshsdPort *const port = &dev->port;

	// SD card voltage switch sequence goes here if supported.

	// Get the CID. CMD is the same for SD/(e)MMC.
	u32 res = TOSHSD_sendCommand(port, MMC_ALL_SEND_CID, 0);
	if(res != 0) return SDMMC_ERR_ALL_SEND_CID;
	memcpy(dev->cid, port->resp, 16);

	return SDMMC_ERR_OK;
}

static u32 initIdentState(SdmmcDev *const dev, const u8 cardType, u32 *const rcaOut)
{
	ToshsdPort *const port = &dev->port;

	u32 rca;
	if(cardType < CTYPE_MMC)
	{
		// Ask the SD card to send its RCA.
		u32 res = TOSHSD_sendCommand(port, SD_SEND_RELATIVE_ADDR, 0);
		if(res != 0) return SDMMC_ERR_SET_SEND_RCA;

		rca = port->resp[0]>>16; // RCA in upper 16 bits.
	}
	else
	{
		// Set the RCA of the (e)MMC to 1. 0 is reserved and 1 seems
		// to also not be a good choice either since it's the default.
		// However Linux uses 1 so we will too.
		// A few extremely old, unbranded (but Nokia?) MMC's will time
		// out here for unknown reason. They won't work on DSi anyway (FAT12).
		// The RCA is in the upper 16 bits of the argument.
		u32 res = TOSHSD_sendCommand(port, MMC_SET_RELATIVE_ADDR, 1u<<16); // TODO: Should we check the R1 response?
		if(res != 0) return SDMMC_ERR_SET_SEND_RCA;

		rca = 1;
	}

	dev->rca = rca;
	*rcaOut = rca<<16;

	return SDMMC_ERR_OK;
}

// Based on code from linux/drivers/mmc/core/sd.c.
// Works only with u32[4] buffer.
#define UNSTUFF_BITS(resp, start, size)                     \
({                                                          \
	const unsigned int __size = size;                       \
	const u32 __mask = (__size < 32 ? 1u<<__size : 0u) - 1; \
	const unsigned int __off = 3 - ((start) / 32u);         \
	const unsigned int __shift = (start) & 31u;             \
	u32 __res;                                              \
	                                                        \
	__res = resp[__off]>>__shift;                           \
	if(__size + __shift > 32)                               \
		__res |= resp[__off - 1]<<((32 - __shift) % 32u);   \
	__res & __mask;                                         \
})

static void parseCsd(SdmmcDev *const dev, const u8 cardType)
{
	// Note: The MSBs are in csd[0].
	const u32 *const csd = dev->port.resp;

	// structure = 0 is CSD version 1.0.
	const u8 structure = UNSTUFF_BITS(csd, 126, 2); // [127:126]
	dev->spec_vers = UNSTUFF_BITS(csd, 122, 4);     // [125:122] All 0 for SD cards.
	u32 sectors;
	if(structure == 0 || cardType == CTYPE_MMC)
	{
		// Same calculation for SDSC and (e)MMC <=2 GB.
		// TODO: https://github.com/torvalds/linux/blob/master/drivers/mmc/core/sd.c#L121
		const u32 read_bl_len = UNSTUFF_BITS(csd, 80, 4);  // [83:80]
		const u32 c_size      = UNSTUFF_BITS(csd, 62, 12); // [73:62]
		const u32 c_size_mult = UNSTUFF_BITS(csd, 47, 3);  // [49:47]

		sectors = (c_size + 1) * (1u<<(c_size_mult + 2)) * (1u<<(read_bl_len - 9)); // "- 9" -> "/ 512"
	}
	else
	{
		// SD CSD version 3.0 format.
		// For version 2.0 this is 22 bits however the uppe bits
		// are reserved and zero filled so this is fine.
		const u32 c_size = UNSTUFF_BITS(csd, 48, 28); // [75:48]

		sectors = (c_size + 1) * 1024u;
	}
	// TODO: High capacity (e)MMC encodes the size in the ext CSD.
	dev->sectors = sectors;

	const u16 ccc = UNSTUFF_BITS(csd, 84, 12); // [95:84]
	dev->ccc = ccc;
}

static u32 initStandbyState(SdmmcDev *const dev, const u8 cardType, const u32 rca)
{
	ToshsdPort *const port = &dev->port;

	// Get the CSD. CMD is the same for SD/(e)MMC.
	u32 res = TOSHSD_sendCommand(port, MMC_SEND_CSD, rca);
	if(res != 0) return SDMMC_ERR_SEND_CSD;
	parseCsd(dev, cardType);

	// Select card and switch to transfer state.
	const u16 selCardCmd = (cardType < CTYPE_MMC ? SD_SELECT_CARD : MMC_SELECT_CARD);
	res = TOSHSD_sendCommand(port, selCardCmd, rca); // TODO: Should we check the R1 response?
	if(res != 0) return SDMMC_ERR_SELECT_CARD;

	// The SD card spec mentions that we should check the lock bit in the
	// response to CMD7 to identify cards requiring a password
	// to unlock which we don't support. Same seems to apply for (e)MMC.
	// Same bit for SD/(e)MMC R1 card status.
	if(port->resp[0] & MMC_R1_CARD_IS_LOCKED)
		return SDMMC_ERR_LOCKED;

	return SDMMC_ERR_OK;
}

static u32 initTranState(SdmmcDev *const dev, const u8 cardType, const u32 rca)
{
	ToshsdPort *const port = &dev->port;

	if(cardType < CTYPE_MMC)
	{
		// Remove DAT3 pull-up.
		u32 res = sdSendAppCmd(port, SD_APP_SET_CLR_CARD_DETECT, 0, rca); // arg = 0 removes the pull-up.
		if(res != 0) return SDMMC_ERR_SET_CLR_CD;

		// Switch to 4 bit bus mode.
		res = sdSendAppCmd(port, SD_APP_SET_BUS_WIDTH, 2, rca); // arg = 2 is 4 bit bus width.
		if(res != 0) return SDMMC_ERR_SET_BUS_WIDTH;
		TOSHSD_setBusWidth(port, 4);

#ifndef TWL
		// TODO: Is it faster to double the clock earlier or to run this CMD with 4 bit bus width?
		if(dev->ccc & 1u<<10) // Class 10 command support.
		{
			TOSHSD_setBlockLen(port, 64);
			alignas(4) u8 switchStat[64]; // MSB first and big endian.
			TOSHSD_setBuffer(port, (u32*)switchStat, 1);
			res = TOSHSD_sendCommand(port, SD_SWITCH_FUNC, 0x80FFFFF1);
			if(res != 0) return SDMMC_ERR_SWITCH_HS;
			TOSHSD_setBlockLen(port, 512);

			// [415:400] Support Bits of Functions in Function Group 1.
			if(switchStat[63 - 400 / 8] & 1u<<1) // Is group 1, function 1 "SDR25" supported?
			{
				// SDR25 (50 MHz) supported. Switch to highest supported clock.
				// Stop clock at idle. 33 MHz.
				TOSHSD_setClock(port, (1u<<9) | (1u<<8) | SDR25_CLK);
			}
		}
#endif
	}
	else
	{
		// Helpful sections of the spec:
		// A.8.2  Switching to high-speed mode
		// A.8.3  Changing the data bus width

		// Very old 1 bit bus MMC will timeout and set the SWITCH_ERROR bit
		// for these CMDs. Only try with (e)MMC spec >4.0.
		if(dev->spec_vers >= 4) // Version 4.1–4.2–4.3 or higher.
		{
			// Switch to 4 bit bus mode.
			u32 arg = MMC_SWITCH_ARG(MMC_SWITCH_ACC_WR_BYTE, 183, 1, 0);
			u32 res = TOSHSD_sendCommand(port, MMC_SWITCH, arg);
			if(res != 0) return SDMMC_ERR_SET_BUS_WIDTH;
			TOSHSD_setBusWidth(port, 4);

#ifndef TWL
			// Switch to high speed timing (52 MHz).
			arg = MMC_SWITCH_ARG(MMC_SWITCH_ACC_WR_BYTE, 185, 1, 0);
			res = TOSHSD_sendCommand(port, MMC_SWITCH, arg);
			if(res != 0) return SDMMC_ERR_SWITCH_HS;
			// Stop clock at idle. 33 MHz.
			TOSHSD_setClock(port, (1u<<9) | (1u<<8) | SDR25_CLK);
#endif

			// We also should check in the ext CSD the power budget for the card.
			// Nintendo seems to leave it on default (no change).
		}
	}

	// SD:     The description for CMD SET_BLOCKLEN says 512 bytes is the default.
	// (e)MMC: The description for READ_BL_LEN (CSD) says 512 bytes is the default.
	// So it's not required to set the block length?
	//u32 res = TOSHSD_sendCommand(port, MMC_SET_BLOCKLEN, 512);
	//if(res != 0) return SDMMC_ERR_SET_BLOCKLEN;

	return SDMMC_ERR_OK;
}

static inline u8 dev2portNum(u8 devNum)
{
	u8 portNum;
	switch(devNum)
	{
		default: ;
		case SDMMC_DEV_SLOT:
			portNum = DEV_SLOT_PORT;
			break;
		case SDMMC_DEV_eMMC:
			portNum = DEV_eMMC_PORT;
	}

	return portNum;
}

// TODO: In many places we also want to check the card's response.
u32 SDMMC_init(u8 devNum)
{
	if(devNum > SDMMC_DEV_eMMC) return SDMMC_ERR_INVAL_PARAM;

	SdmmcDev *const dev = &g_devs[devNum];
	ToshsdPort *const port = &dev->port;

	if(dev->cardType != CTYPE_NONE) return SDMMC_ERR_DOUBLE_INIT;

	// TODO: When does the card detection timer start? Does not restart on controller reset.
	TOSHSD_initPort(port, dev2portNum(devNum));
	TOSHSD_setClock(port, (1u<<8) | INIT_CLK); // Continuous clock, 261/523 kHz.
#ifdef _3DS
#ifdef ARM9
	// TODO: Use a timer instead? The delay is only a few hundred us though.
	wait_cycles(2 * INIT_DELAY); // CPU is 2x timer freqency.
#elif ARM11
	// TODO: Is it worth using a timer? The delay is only a few hundred us.
	TIMER_sleepTicks(INIT_DELAY);
#endif // #ifdef ARM9
#elif TWL
#error "SD/MMC necessary delay unimplemented."
#endif // #ifdef _3DS

	u32 res = goIdleState(port);
	if(res != 0) return res;

	// SD/(e)MMC now in idle state (idle).
	u8 cardType;
	res = initIdleState(port, &cardType);
	if(res != 0) return res;

	// Stop clock at idle. 261/523 kHz.
	TOSHSD_setClock(port, (1u<<9) | (1u<<8) | INIT_CLK);

	// SD/(e)MMC now in ready state (ready).
	res = initReadyState(dev);
	if(res != 0) return res;

	// SD/(e)MMC now in identification state (ident).
	u32 rca;
	res = initIdentState(dev, cardType, &rca);
	if(res != 0) return res;

	// Maximum at this point would be 25 MHz for SD and 20 for (e)MMC.
	// SD: We can increase the clock after end of identification state.
	// TODO: eMMC spec section 7.6
	// "Until the contents of the CSD register is known by the host,
	// the fPP clock rate must remain at fOD. (See Section 12.7 on page 176.)"
	// Since the absolute minimum clock rate is 20 MHz and we are in push-pull
	// mode already can we cheat and switch to 16 MHz before getting the CSD?
	// Note: This seems to be working just fine in all tests.
	// Stop clock at idle. 16 MHz.
	TOSHSD_setClock(port, (1u<<9) | (1u<<8) | SDR12_CLK);

	// SD/(e)MMC now in stand-by state (stby).
	res = initStandbyState(dev, cardType, rca);
	if(res != 0) return res;

	// SD/(e)MMC now in transfer state (tran).
	res = initTranState(dev, cardType, rca);
	if(res != 0) return res;

	// Useful state diagrams:
	// SD: 4.2.2 Operating Condition Validation
	// eMMC: 7.4.2 Operating voltage range validation

	dev->cardType = cardType;

	return SDMMC_ERR_OK;
}

// TODO: Is there any "best practice" way of deinitializing cards?
//       Kick the card back into idle state maybe?
//       Linux seems to deselect cards on "suspend".
u32 SDMMC_deinit(u8 devNum)
{
	if(devNum > SDMMC_DEV_eMMC) return SDMMC_ERR_INVAL_PARAM;

	g_devs[devNum].cardType = CTYPE_NONE;

	return SDMMC_ERR_OK;
}

u32 SDMMC_getCid(u8 devNum, u32 *const cidOut)
{
	if(devNum > SDMMC_DEV_eMMC) return SDMMC_ERR_INVAL_PARAM;

	if(cidOut != NULL) memcpy(cidOut, g_devs[devNum].cid, 16);

	return SDMMC_ERR_OK;
}

u32 SDMMC_getSectors(u8 devNum)
{
	if(devNum > SDMMC_DEV_eMMC) return 0;

	return g_devs[devNum].sectors;
}

u32 SDMMC_readSectors(u8 devNum, u32 sect, u32 *const buf, u16 count)
{
	if(devNum > SDMMC_DEV_eMMC || count == 0) return SDMMC_ERR_INVAL_PARAM;

	SdmmcDev *const dev = &g_devs[devNum];
	const u8 cardType = dev->cardType;
	if(cardType == CTYPE_NONE) return SDMMC_ERR_NO_CARD;

	ToshsdPort *const port = &dev->port;
	TOSHSD_setBuffer(port, buf, count);

	if(cardType == CTYPE_SDSC || cardType == CTYPE_MMC) sect *= 512;
	// Read a single 512 bytes block. Same CMD for SD/(e)MMC.
	// Read multiple 512 byte blocks. Same CMD for SD/(e)MMC.
	const u16 cmd = (count == 1 ? MMC_READ_SINGLE_BLOCK : MMC_READ_MULTIPLE_BLOCK);
	const u32 res = TOSHSD_sendCommand(port, cmd, sect);
	if(res != 0) return SDMMC_ERR_SECT_RW; // TODO: In case of errors check the card status.

	return SDMMC_ERR_OK;
}

u32 SDMMC_writeSectors(u8 devNum, u32 sect, const u32 *const buf, u16 count)
{
	if(devNum > SDMMC_DEV_eMMC || count == 0) return SDMMC_ERR_INVAL_PARAM;

	SdmmcDev *const dev = &g_devs[devNum];
	const u8 cardType = dev->cardType;
	if(cardType == CTYPE_NONE) return SDMMC_ERR_NO_CARD;

	ToshsdPort *const port = &dev->port;
	TOSHSD_setBuffer(port, (u32*)buf, count);

	if(cardType == CTYPE_SDSC || cardType == CTYPE_MMC) sect *= 512;
	// Write a single 512 bytes block. Same CMD for SD/(e)MMC.
	// Write multiple 512 byte blocks. Same CMD for SD/(e)MMC.
	const u16 cmd = (count == 1 ? MMC_WRITE_BLOCK : MMC_WRITE_MULTIPLE_BLOCK);
	const u32 res = TOSHSD_sendCommand(port, cmd, sect);
	if(res != 0) return SDMMC_ERR_SECT_RW; // TODO: In case of errors check the card status.

	return SDMMC_ERR_OK;
}

#ifdef ARM11
#include "arm11/fmt.h"
void SDMMC_printCardInfos(u8 devNum)
{
	if(devNum > SDMMC_DEV_eMMC) return;

	SdmmcDev *const dev = &g_devs[devNum];
	ToshsdPort *const port = &dev->port;

	ee_printf("Card infos:\n cardType: %u\n spec_vers: %u\n rca: 0x%X\n ccc: 0x%X\n sectors: %lu\n CID: ",
	          dev->cardType, dev->spec_vers, dev->rca, dev->ccc, dev->sectors);
	for(u32 i = 0; i < 4; i++)
	{
		ee_printf("%08lX", dev->cid[i]);
	}
	ee_printf("\n Bus width: %u bit\n", (port->sd_option & 1u<<15 ? 1u : 4u));
	const u32 clkSetting = port->sd_clk_ctrl & 0xFFu;
	ee_printf(" Clock: %lu Hz\n", TOSHSD_HCLK / (clkSetting ? clkSetting<<2 : 2u));
}
#endif

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Fri May 21, 2021 5:11 pm

profi200 wrote:
Fri May 21, 2021 6:57 am
bit 23... should be useful for stuff like erase which can take multiple seconds if you erase large areas. The hardware timeout will not cut it here.
Yeah, could be useful for that. A large timeout (combined with slower SD CLK) might also work. Then on the other hand, the erase commands are pretty useless (or unless you are planning to sell your old SD cards on a flea market and want to erase your private data first).
profi200 wrote:
Fri May 21, 2021 6:57 am
I think the card/DAT3 detection can be changed to other ports of the same controller with reg 0xF8, 0xFA, 0xFC and 0xFE. Not sure.
I have probably tried reading eMMC DAT3 from that registers. But yes, writing one of those bits to map eMMC DAT3 elsewhere is also a good idea/theory.
profi200 wrote:
Fri May 21, 2021 6:57 am
Eh, it's better to have the option than not in my opinion. eMMC does have a sleep mode but no current is still better than a little.
Well, yes, but it doesn't look as if Nintendo did do so. I've tested only the supply for Wifi, and vice-versa, tested only the pull-ups for eMMC, and neither can be switched off. So it's seems to be more a rhetorical question... hmmm, a power switch isn't 100% effective when switched on, and a sleep mode isn't 0% effective when off, it could be difficult to say what is really better.
profi200 wrote:
Fri May 21, 2021 6:57 am
I have no idea what these other bits really do. It's wild guesses.
Bit0,1,2 of SDMMCCTL are about right, the corresponding devices stop working when setting those bits. In case of bit1 and 2, they might just "power off" something internally (like stopping clocks or whatever).
Atheros Wifi does only need bit2 to be cleared, all other bits (including the New3DSXL extra bit7) seem to be don't care. So if bit6 is wifi related... it might just fine tune something that isn't really required, or affect only NDS-Wifi, or... there have been rumours about dev units having a separate Debug Wifi port or Debug SDIO or something... I don't know if that's really true(?)
profi200 wrote:
Fri May 21, 2021 6:57 am
And unfortunately no 1.8V switch as far as i can tell. That would save some power. I don't fully understand how 1.8V mode works with SD/(e)MMC though. It seems only the signal voltage is switched and the rest is still 3.3V?
Don't know much about that either, I think I had the impression that it's only the signals dropping to 1.8V and supply staying 3.3V, too.
For the Old3DS, I can confirm that that mixed voltage combination isn't supported (the pull-ups are wired directly to SD slot power).
profi200 wrote:
Fri May 21, 2021 6:57 am
Btw bit 9 of sd_clk_ctrl is not just "clock freeze". It's a power saving feature. Clock will stop when the controller is idle and start when there is activity.
Ah, okay. Then it's best to have that bit always set, or does that cause problems/slowdowns with some cards/commands?

Well, if some or all cards really needs 74 clks on power up, then that would be a case for needing bit9=0. Btw. would you first power up, and then start the clock (in a separate steps). Or simply enable the clock before power up? The latter would be a bit easier, but some cards might be confused if the clock is already running during power up.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

profi200
Posts: 64
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 » Sat May 22, 2021 7:26 am

nocash wrote:
Fri May 21, 2021 5:11 pm
Yeah, could be useful for that. A large timeout (combined with slower SD CLK) might also work. Then on the other hand, the erase commands are pretty useless (or unless you are planning to sell your old SD cards on a flea market and want to erase your private data first).
I think you are misunderstanding the purpose of erase. SD cards are complex beasts these days as the flash is getting cheaper but also increasingly shitty. They try to cram more and more bits into each flash cell which reduces the lifetime drastically. The workaround is controllers with complex flash management algorithms to distribute all writes equally across the whole flash. Erase speeds up writes but also tells the controller which areas are unused. For SSDs this is often called TRIM. There are multiple variants supported by SD cards by now which have different effects and give the controller more or less room to decide what to actually do and when. There is also secure erase which is what you are thinking of. It immediately erases everything including reserve flash pages which are not accessible otherwise unless the controller maps them in. It all appears like one continuous drive from the host perspective but there is a lot more going on under the hood.

This is several years old but still relevant and a good read: https://www.bunniestudios.com/blog/?p=3554
nocash wrote:
Fri May 21, 2021 5:11 pm
Bit0,1,2 of SDMMCCTL are about right, the corresponding devices stop working when setting those bits. In case of bit1 and 2, they might just "power off" something internally (like stopping clocks or whatever).
Atheros Wifi does only need bit2 to be cleared, all other bits (including the New3DSXL extra bit7) seem to be don't care. So if bit6 is wifi related... it might just fine tune something that isn't really required, or affect only NDS-Wifi, or... there have been rumours about dev units having a separate Debug Wifi port or Debug SDIO or something... I don't know if that's really true(?)
The debug WiFi stuff was a guess based on controller 3 usually being mapped to the ARM11 side and nwm module apparently references it in code so might be possible that they changed which port the WiFi module is connected to mid development.
nocash wrote:
Fri May 21, 2021 5:11 pm
Ah, okay. Then it's best to have that bit always set, or does that cause problems/slowdowns with some cards/commands?

Well, if some or all cards really needs 74 clks on power up, then that would be a case for needing bit9=0. Btw. would you first power up, and then start the clock (in a separate steps). Or simply enable the clock before power up? The latter would be a bit easier, but some cards might be confused if the clock is already running during power up.
You want to keep the clock running continuously until after the card reports it's not busy anymore in the SEND_OP_COND loop. The spec mentions it works without but then you have to poll the card at least every 50 ms. For me it has caused problems not letting clock run continuously (in idle state). I dunno if stopping clock in between has any negative performance impacts on init but polling faster or slower with continuous doesn't matter in the tests i have done (see code above in initIdleState()). In every other case i didn't notice any impact on performance. Many controllers actually have this power saving feature.

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Sun May 23, 2021 8:59 am

Yes, doing a large erase can be faster than several small sector erase/write, and could also affect the card lifetime. I am not so sure if that's really needed, or if the cards can automatically group continous writes into larger blocks in most cases. I've mostly considered the ERASE and ERASE_COUNT commands to be some weird relict from the original SD/MMC specs that are barely used in practice.

In an OS or other stuff with heavy writes, it might make sense to use ERASE, though that might require a lot of motivation, bug testing, problem fixing, benchmarking with big and small writes, on hundreds of different cards, write-behind caching, filesystem interactions, and so on, and I wouldn't be surprised if the result is that it's only slighter faster in a few situations on a few cards. Before going through that hazzle, it may be best to do some test writes on a bunch of "unformatted" cards without fllesystem.

On the DSi, Nintendo doesn't use any ERASE commands (tested when changing options in System Settings, it's merely doing raw WRITEs, and oddly, writing twice to the same cluster, so they don't even seem to have a write cache that could group "write data + append more data" into single writes) (or well, what they are doing is "write+readback+writeagain" so a possibly existing write cache might get confused by that).

During inactivity (when waiting for title selection), DSi System Menu is completely stopping the SD/MMC clocks via CLK_CTL.bit8=0. It doesn't do anything like polling the card every 50ms during that time. When starting a title it does just switch the clock back on and then resumes by issuing READ_MULTIPLE (there might be delay in that process, but it doesn't do anything like completely re-initializing the card).

For CLK_CTL bit9, the odd thing is that it freezes the CLK pin in it's current state (high or low). I don't know if that can cause major problems (or slightly affect power consumption). But it can be probably avoided by changing bit9 only when bit8 is off.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Sat May 29, 2021 1:37 pm

I am adding support for uploading GBA rom-images to 3DS via wifiboot. One problem is interlacing versus flickering sprites.
GBA and NDS interlace did draw each 2nd horizontal scanline at half intensity.
3DS interlace seems to be not updating each 2nd vertical scanline at all ???
That 3DS problem occurs both on upper screen (in 400pix mode), and lower screen (320pix).

For exmple, F-Zero does display flickering rocket engine flames when holding A-button. That flames appear as solid vertical stripes on 3DS, that looks very bad (although not sooo much worse as on GBA, the flames did look very distorted there, too).
In GT Advance, the car's black shadow seems to be occassionally toggling between yloc+0, andy loc+1, so the pixels at upper/lower edge would appear as smooth semi-transparent edges on GBA, but the 3DS is instead drawing solid dots in each 2nd pixel, resulting in not so smooth "zigzagged" edges..

Maybe the "ghosting" feature is intended to fix issues with flickering sprites? Or is there a better workaround, or am I doing something wrong? Being able to disable interlace would be neat (I don't really know what it is good for anyways, do LCD displays somehow require that stuff?)
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

profi200
Posts: 64
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 » Sat May 29, 2021 3:46 pm

Linux actually does have support for erase and have been using it for a while. It works good. I have a few older cards which either pretend they did something or they don't support erase at all. All the newer ones i have support it. Pretty useful if you can wipe the entire card in seconds and freshly format it. Linux also supports it for the fs (fstrim command).
nocash wrote:
Sun May 23, 2021 8:59 am
During inactivity (when waiting for title selection), DSi System Menu is completely stopping the SD/MMC clocks via CLK_CTL.bit8=0. It doesn't do anything like polling the card every 50ms during that time. When starting a title it does just switch the clock back on and then resumes by issuing READ_MULTIPLE (there might be delay in that process, but it doesn't do anything like completely re-initializing the card).

For CLK_CTL bit9, the odd thing is that it freezes the CLK pin in it's current state (high or low). I don't know if that can cause major problems (or slightly affect power consumption). But it can be probably avoided by changing bit9 only when bit8 is off.
I think you misunderstood. The 50 ms polling is only for the SEND_OP_COND part of the card init. After that part you can stop the clock freely and as long as you want. Didn't have issues with the controller stopping clock in either position.
nocash wrote:
Sat May 29, 2021 1:37 pm
I am adding support for uploading GBA rom-images to 3DS via wifiboot. One problem is interlacing versus flickering sprites.
GBA and NDS interlace did draw each 2nd horizontal scanline at half intensity.
3DS interlace seems to be not updating each 2nd vertical scanline at all ???
That 3DS problem occurs both on upper screen (in 400pix mode), and lower screen (320pix).

For exmple, F-Zero does display flickering rocket engine flames when holding A-button. That flames appear as solid vertical stripes on 3DS, that looks very bad (although not sooo much worse as on GBA, the flames did look very distorted there, too).
In GT Advance, the car's black shadow seems to be occassionally toggling between yloc+0, andy loc+1, so the pixels at upper/lower edge would appear as smooth semi-transparent edges on GBA, but the 3DS is instead drawing solid dots in each 2nd pixel, resulting in not so smooth "zigzagged" edges..

Maybe the "ghosting" feature is intended to fix issues with flickering sprites? Or is there a better workaround, or am I doing something wrong? Being able to disable interlace would be neat (I don't really know what it is good for anyways, do LCD displays somehow require that stuff?)
Are you referring to rapidly flickering graphics? The pixel response time of the old LCD displays is really shitty which many games abused to create transparency effects by flickering sprites/tiles rapidly. This doesn't work anymore on modern LCDs because they are too fast. What you want is blend the current frame with the previous frame (inter-frame blending) which will fix this. I have not implemented this yet in open_agb_firm. As for "ghosting" i'm not sure how that works. It smears the output so much that i believe they are blending the current GBA frame on top of the last (GPU rendered) frame so you get multiple passes of blending instead of just one pass (previous GBA frame with curent GBA frame).

lidnariq
Posts: 10659
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: 3DS reverse engineering

Post by lidnariq » Sat May 29, 2021 3:54 pm

profi200 wrote:
Sat May 29, 2021 3:46 pm
Are you referring to rapidly flickering graphics? The pixel response time of the old LCD displays is really shitty which many games abused to create transparency effects by flickering sprites/tiles rapidly. This doesn't work anymore on modern LCDs because they are too fast.
GBA's LCD is already too fast for any blurring in that effect. I posted an animated GIF (1/60th real speed) of the flashing column of light in Castlevania: Circle of the Moon in this post

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Sat May 29, 2021 10:09 pm

profi200 wrote:
Sat May 29, 2021 3:46 pm
I think you misunderstood. The 50 ms polling is only for the SEND_OP_COND part of the card init.
Ah, now I got it, thanks.

I think I have solved the interlace issue: My current theory is that each 2nd pixel is drawing darker/brighter in each 2nd frame (much alike GBA/NDS), but it seems happening only if the VCOM voltage isn't properly calibrated in MCU[03h] and MCU[04h].
After battery removal, the MCU boots up with uncalibrated values, so one needs to run the official firmware at least once to initialize that settings (or initialize it manually, using the values from "HWCAL" files, or "config" file (dunno which file is best, or if they are all containing the same values)).
Once when calibrated, all pixels seem to be drawn at same brightness, ie. the interlace effect is completely gone.
I don't remember, did the GBA/NDS consoles have VCOM potentiometers? Maybe calibrating that could eliminate interlace there, too...?
profi200 wrote:
Sat May 29, 2021 3:46 pm
As for "ghosting" i'm not sure how that works. It smears the output so much that i believe they are blending the current GBA frame on top of the last (GPU rendered) frame so you get multiple passes of blending instead of just one pass (previous GBA frame with curent GBA frame).
That sounds quite horrible. Maybe it's some anti-seizure feature. For two-frame flicker patterns it would be better to blend only the last two frames, but there can be also flicker patterns that extend through three or more frames, maybe they also wanted to reduce flickering in that cases as much as possible.

I've just looked into how to buy a GBA title from Nintendo's eshop... but it seems they have never ever sold any GBA titles?
The only exception seems to be that they had released 10 GBA titles as a "surprise gift" for people who had purchased an Old3DS (and connected to the eshop) during the first some months after the Old3DS release date.
That was before 3DS XL and 2DS and New3DS were released, so those GBA titles seem to exist for original Old3DS only (though the System Transfer tool might allow to transfer them from Old3DS to New3DS).

And I've experimented with the LGYFB scaling feature. Starting with the official 6:4 scaling:

Code: Select all

 dw 011011b     ;pattern
 dw 6-1         ;length
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 1st input pixel
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 2nd input pixel
 dw 0000h,2000h,4000h,0000h,2000h,4000h,    0,    0 ;<-- for 3rd input pixel
 dw 4000h,2000h,0000h,4000h,2000h,0000h,    0,    0 ;<-- for 4th input pixel
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 5th input pixel
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 6th input pixel
Then I've tried 3:2 instead of 6:4, using the settings below, which seem to produce the same effect...

Code: Select all

   dw 011b        ;pattern
   dw 3-1         ;length
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 1st input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 2nd input pixel
   dw 0000h,2000h,4000h,    0,    0,    0,    0,    0 ;<-- for 3rd input pixel
   dw 4000h,2000h,0000h,    0,    0,    0,    0,    0 ;<-- for 4th input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 5th input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 6th input pixel
In both of the above cases, the hardware does output 2 sharp pixels, followed by 1 heavily blurred pixel.
So I've tried to smoothen that by using 3 slightly blurred pixels...

Code: Select all

   dw 011b        ;pattern
   dw 3-1         ;length
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 1st input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 2nd input pixel
   dw 0000h,0fffh,3000h,    0,    0,    0,    0,    0 ;<-- for 3rd input pixel
   dw 3000h,3000h,0fffh,    0,    0,    0,    0,    0 ;<-- for 4th input pixel
   dw 0fffh,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 5th input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 6th input pixel
That's looking more or less okay... it's smoother... but less sharp.

In the above table, I had originally used values 3000h+1000h, but for whatever reason the DMA didn't start with that settings.
It did start working when using 3000h+0FFFh instead. I've no idea why. 3000h+1000h didn't exceed the max brightness, it's about same as 4000h+0 or 2000h+2000h (which are both working), and even if it would overflow max brightness: The hardware can saturate slight overflows (or output unsaturated wrong pixels in case of heavy overflows).

And finally, I had high hopes about using perfectly sharp 3:1 scaling at 800pix horizontal resolution... but the LGYFB_SIZE register allows to output only max 512pix per line, crap.
With that 512pix limit, the only way to get reasonably sharp images seems to be software scaling or GPU scaling, and one can more or less forget about the LGYFB scaling hardware. For 3:1 scaling, one could also simply DMA each line thrice (but I guess one would first have to store the unscaled LGYFB data in memory, then use another DMA for scaling from memoty to vram).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

profi200
Posts: 64
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 » Sun May 30, 2021 6:36 am

lidnariq wrote:
Sat May 29, 2021 3:54 pm
GBA's LCD is already too fast for any blurring in that effect. I posted an animated GIF (1/60th real speed) of the flashing column of light in Castlevania: Circle of the Moon in this post
Hmm, ok. That is pretty much impossible to simulate with the 3DS LCDs. They can only do 60 Hz so we can't split one frame into multiple to simulate this effect.
nocash wrote:
Sat May 29, 2021 10:09 pm
Ah, now I got it, thanks.

I think I have solved the interlace issue: My current theory is that each 2nd pixel is drawing darker/brighter in each 2nd frame (much alike GBA/NDS), but it seems happening only if the VCOM voltage isn't properly calibrated in MCU[03h] and MCU[04h].
After battery removal, the MCU boots up with uncalibrated values, so one needs to run the official firmware at least once to initialize that settings (or initialize it manually, using the values from "HWCAL" files, or "config" file (dunno which file is best, or if they are all containing the same values)).
Once when calibrated, all pixels seem to be drawn at same brightness, ie. the interlace effect is completely gone.
I don't remember, did the GBA/NDS consoles have VCOM potentiometers? Maybe calibrating that could eliminate interlace there, too...?

That sounds quite horrible. Maybe it's some anti-seizure feature. For two-frame flicker patterns it would be better to blend only the last two frames, but there can be also flicker patterns that extend through three or more frames, maybe they also wanted to reduce flickering in that cases as much as possible.

I've just looked into how to buy a GBA title from Nintendo's eshop... but it seems they have never ever sold any GBA titles?
The only exception seems to be that they had released 10 GBA titles as a "surprise gift" for people who had purchased an Old3DS (and connected to the eshop) during the first some months after the Old3DS release date.
That was before 3DS XL and 2DS and New3DS were released, so those GBA titles seem to exist for original Old3DS only (though the System Transfer tool might allow to transfer them from Old3DS to New3DS).

And I've experimented with the LGYFB scaling feature. Starting with the official 6:4 scaling:

Code: Select all

 dw 011011b     ;pattern
 dw 6-1         ;length
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 1st input pixel
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 2nd input pixel
 dw 0000h,2000h,4000h,0000h,2000h,4000h,    0,    0 ;<-- for 3rd input pixel
 dw 4000h,2000h,0000h,4000h,2000h,0000h,    0,    0 ;<-- for 4th input pixel
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 5th input pixel
 dw 0000h,0000h,0000h,0000h,0000h,0000h,    0,    0 ;<-- for 6th input pixel
Then I've tried 3:2 instead of 6:4, using the settings below, which seem to produce the same effect...

Code: Select all

   dw 011b        ;pattern
   dw 3-1         ;length
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 1st input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 2nd input pixel
   dw 0000h,2000h,4000h,    0,    0,    0,    0,    0 ;<-- for 3rd input pixel
   dw 4000h,2000h,0000h,    0,    0,    0,    0,    0 ;<-- for 4th input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 5th input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 6th input pixel
In both of the above cases, the hardware does output 2 sharp pixels, followed by 1 heavily blurred pixel.
So I've tried to smoothen that by using 3 slightly blurred pixels...

Code: Select all

   dw 011b        ;pattern
   dw 3-1         ;length
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 1st input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 2nd input pixel
   dw 0000h,0fffh,3000h,    0,    0,    0,    0,    0 ;<-- for 3rd input pixel
   dw 3000h,3000h,0fffh,    0,    0,    0,    0,    0 ;<-- for 4th input pixel
   dw 0fffh,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 5th input pixel
   dw 0000h,0000h,0000h,    0,    0,    0,    0,    0 ;<-- for 6th input pixel
That's looking more or less okay... it's smoother... but less sharp.

In the above table, I had originally used values 3000h+1000h, but for whatever reason the DMA didn't start with that settings.
It did start working when using 3000h+0FFFh instead. I've no idea why. 3000h+1000h didn't exceed the max brightness, it's about same as 4000h+0 or 2000h+2000h (which are both working), and even if it would overflow max brightness: The hardware can saturate slight overflows (or output unsaturated wrong pixels in case of heavy overflows).

And finally, I had high hopes about using perfectly sharp 3:1 scaling at 800pix horizontal resolution... but the LGYFB_SIZE register allows to output only max 512pix per line, crap.
With that 512pix limit, the only way to get reasonably sharp images seems to be software scaling or GPU scaling, and one can more or less forget about the LGYFB scaling hardware. For 3:1 scaling, one could also simply DMA each line thrice (but I guess one would first have to store the unscaled LGYFB data in memory, then use another DMA for scaling from memoty to vram).
That's odd. Never had this problem with my LCDs but i usually boot Horizon OS at least once.

Yeah, it really does look that bad. I do have the official Ambassador GBA games and they don't look good with all the filtering/blending they do. Could not find pictures which show this effect but it really does look like multi stage blending and the setting in the footer is the transparence for the blend passes.

Regarding range limitations i documented this odd behavior here. It makes no sense at all. I was getting imprecise data aborts (enable via cpsie a) when i tried to set a value in that matrix out of range. The scale matrix actually works a little different than what you documented on gbatek. It can also blend previous and following pixels. I asked Sono a good while ago about this. Unfortunately he left homebrew development entirely a few days ago.

Yeah, 3x integer scale horizontally can only be done with the GPU. mGBA recently added this but it doesn't really look impressive unfortunately. Scaling artefacts are much more visible vertically on this LCD.

nocash
Posts: 1398
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash » Sun May 30, 2021 12:50 pm

profi200 wrote:
Sun May 30, 2021 6:36 am
Regarding range limitations i documented this odd behavior here. It makes no sense at all.
Oh, you came across that, too! Glad that you've confirmed it (I was quite afraid that I was only dreaming it and talking nonsense).
With your table, it makes sense to have larger scale values in the middle of the array, it doesn't make sense to implement hardware error checking for that... well, unless they have actually used smaller multipliers that cannot multiply bigger numbers for some array entries.
profi200 wrote:
Sun May 30, 2021 6:36 am
I was getting imprecise data aborts (enable via cpsie a) when i tried to set a value in that matrix out of range.
Good to know. I was aware of the "CPSIE A" opcode, but I didn't know what "A" is meant to do.
Basically, it's external aborts (eg. from DMA's), as opposed to the internal data/prefetch aborts (from CPU itself)?
As far as I understand, "CPSIE A" does clear CPSR bit8 to enable those imprecise aborts...
I couldn't find out what happens then... does it trigger one of the standard exception/interrupt vectors? With some status flag to indicate that the vector was triggered due to imprecise abort?
profi200 wrote:
Sun May 30, 2021 6:36 am
The scale matrix actually works a little different than what you documented on gbatek. It can also blend previous and following pixels.
I've just called them "1st,2nd,3rd,4th,5th,6th input pixels" instead of "in[-3,-2,-1,0,1,2]". The numbering doesn't matter so much.
I could/should try to add some (hopefully not too confusing) notes on "which entry is the 'current' pixel" and "which entries are padded on screen edges".
profi200 wrote:
Sun May 30, 2021 6:36 am
Scaling artefacts are much more visible vertically on this LCD.
To me, it does look as if they are equally visible horizontally and vertically (when using the low resolution 400x240 pixel mode).
But it can depend on the source data (if the source contains many horizontal or vertical lines with high contrast).
And, it can also depend on how the source pixels are aligned to the scale pattern, especially when using two sharp pixels and one blurred pixel (eg. 4000h, 4000h, 2000h+2000h), and not so much when slightly blurring all pixels (eg. 3000h+FFFh, FFFh+3000h, 3000h+FFFh).

In your lgyfb source code,

Code: Select all

#define REG_LGYFB_TOP_DITHPATT0 ((vu32*)(LGYFB_TOP_REGS_BASE + 0x100)) // 2 u32 regs with 4x2 pattern bits (mask 0xCCCC) each.
#define REG_LGYFB_TOP_DITHPATT1 ((vu32*)(LGYFB_TOP_REGS_BASE + 0x108)) // 2 u32 regs with 4x2 pattern bits (mask 0xCCCC) each.
I am quite sure that it should be:

Code: Select all

#define REG_LGYFB_TOP_DITHER0 ((vu32*)(LGYFB_TOP_REGS_BASE + 0x100)) // u32 reg with 4x2 pattern bits (mask 0xCCCC)
#define REG_LGYFB_TOP_DITHER1 ((vu32*)(LGYFB_TOP_REGS_BASE + 0x108)) // u32 reg with 4x2 pattern bits (mask 0xCCCC)
#define REG_LGYFB_TOP_DITHER2 ((vu32*)(LGYFB_TOP_REGS_BASE + 0x110)) // u32 reg with 4x2 pattern bits (mask 0xCCCC)
#define REG_LGYFB_TOP_DITHER3 ((vu32*)(LGYFB_TOP_REGS_BASE + 0x118)) // u32 reg with 4x2 pattern bits (mask 0xCCCC)
There are four dithering registers, at 8-byte aligned addresses (and same for bottom screen LGYFB_BOT registers, too).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

lidnariq
Posts: 10659
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: 3DS reverse engineering

Post by lidnariq » Sun May 30, 2021 12:57 pm

profi200 wrote:
Sun May 30, 2021 6:36 am
lidnariq wrote:
Sat May 29, 2021 3:54 pm
GBA's LCD is already too fast for any blurring in that effect. I posted an animated GIF (1/60th real speed) of the flashing column of light in Castlevania: Circle of the Moon in this post
Hmm, ok. That is pretty much impossible to simulate with the 3DS LCDs. They can only do 60 Hz so we can't split one frame into multiple to simulate this effect.
My point was just that the GBA does some weird interlacing-like behavior, and there's no interframe blurring other than that. (The "rolling" down the screen is just reflecting how most LCDs work, much like the SloMo Guys' recording of an HDTV)

Post Reply