3DS reverse engineering

Discussion of development of software for any "obsolete" computer or video game system. See the WSdev wiki and ObscureDev wiki for more information on certain platforms.
profi200
Posts: 66
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 »

nocash wrote: Mon Dec 02, 2019 5:40 pm
profi200 wrote: Sat Nov 30, 2019 6:16 pmFigured out automatic output switching. Bank 0x64 reg 0x45 bit 4 controls the current output and bit 5 selects between manual mode (1) or automatic (0). There is a catch though. Bit 4 must match the current GPIO state or it will be stuck despite enabling auto mode.
Ah, okay, but once when auto-mode is initialized with correct values... it does then start to work automatically, without further manual updating?
After you had posted that, I have checked if any registers change on headphone insert/eject... TSC[64h:2Ch] bit0 seems to be 1=connected, and bit1 sometimes gets set too, but not always.
I guess I should also do some headphone tests on DSi to see if it does switch speakers/headphones automatically, I haven't tested much of the DSi sound hardware yet; because I always had the volume set to zero for avoiding the System Menu music.

Looking at the code comments that you had added recently, I figured out that you had derived most of the TSC initialization constants from the HWCAL0/1.dat file's entries at file offset 2E0h and up:

Code: Select all

Codec (CDC) (aka TSC)
  2E0h 1    u8 DriverGainHP       ;TSC[65h:0Ch].bit3-7
  2E1h 1    u8 DriverGainSP       ;TSC[65h:12h].bit2-7 and TSC[65h:13h].bit2-7
  2E2h 1    u8 AnalogVolumeHP     ;TSC[65h:16h] and TSC[65h:17h]
  2E3h 1    u8 AnalogVolumeSP     ;TSC[65h:1Bh] and TSC[65h:1Ch]
  2E4h 1    s8 ShutterVolume0     ;TSC[00h:41h] and TSC[00h:42h] ;\maybe camera
  2E5h 1    s8 ShutterVolume1     ;TSC[64h:7Bh]                  ;/sound fx?
  2E6h 1    u8 MicrophoneBias     ;TSC[65h:33h]
  2E7h 1    u8 QuickCharge (???)  ;TSC[65h:42h].bit0-1
  2E8h 1    u8 PGA_GAIN (mic)     ;TSC[65h:41h].bit0-5
  2E9h 3    u8 reserved[3]
  2ECh 1Eh  s16 FilterHP32[3*5]   ;TSC[0Bh:02h..1Fh] and TSC[0Bh:42h..5Fh]
  30Ah 1Eh  s16 FilterHP47[3*5]   ;TSC[0Bh:20h..3Dh] and TSC[0Bh:60h..7Dh]
  328h 1Eh  s16 FilterSP32[3*5]   ;TSC[0Ch:02h..1Fh] and TSC[0Ch:42h..5Fh]
  346h 1Eh  s16 FilterSP47[3*5]   ;TSC[0Ch:20h..3Dh] and TSC[0Ch:60h..7Dh]
  364h 38h  s16 FilterMic32[1+2+ (1+4)*5] ;TSC[05h:08h..3Fh], TSC[05h:48h..7Fh]
  39Ch 38h  s16 FilterMic47[1+2+ (1+4)*5] ;(...same as above...?)
  3D4h 38h  s16 FilterFree [1+2+ (1+4)*5] ;TSC[08h:xxh, 09h;xxh, 0Ah:xxh]
  40Ch 1    u8 AnalogInterval          ;\
  40Dh 1    u8 AnalogStabilize         ;
  40Eh 1    u8 AnalogPrecharge         ;
  40Fh 1    u8 AnalogSense             ; maybe TSC[67h:xxh] ?
  410h 1    u8 AnalogDebounce          ;
  411h 1    u8 Analog_XP_Pullup        ;
  412h 1    u8 YM_Driver               ;/
  413h 1    u8 reserved
  414h 2    Checksum?
  416h 2    Zero
Good to know by itself, and also good because the names for the .dat file entries do imply the TSC register purposes.
I am not sure what ShutterVolume means; my only idea would be that it might override slider-volume when playing camera shutter sounds and other alerts(?)
And also unsure what the AnalogXxx values mean, AnalogDebounce sounds touchscreen related, or maybe microphone related.
What does FilterMic47 do? You seem to leave that entry unused, apart from the comment on saying that it is same as FilterMic32... does that mean one of the filters initialized with FilterMic32 is actually FilterMic47?
Oh, and one general thing: The DSi doesn't seem to use any miniDSP instructions anywhere (I don't even know if the TSC chip does actually support them). Would be good to know if the 3ds is using miniDSP code somewhere.

Here is a summary of the used TSC registers, based on your code and on the HWCAL entry names, and with some notes on some further "unused" registers with nonzero values.

Code: Select all

3DS TSC, Register Summary
-------------------------

Page Selection
  TSC[xxh:00h]=page    ;Page (each TSC SPI bus probably has own page+index?)

Page 00h-01h (DSi Registers)
  TSC[00h:02h]=read    ;DSi Undocumented status (reserved bits)
  TSC[00h:03h]=read    ;DSi Overtemperature OT Flag (reserved bits)
  TSC[00h:0Bh]=87h     ;DSi DAC NDAC Value
  TSC[00h:39h]=66h     ;DSi ADC DC Measurement 1 (reset=00h, ORed with 66h)
  TSC[00h:3Fh]=D4h     ;DSi DAC Data-Path Setup  (reset=D4h, ORed with C0h)
  TSC[00h:40h]=00h     ;DSi DAC Volume Control
  TSC[00h:41h]=FDh     ;DSi DAC Left Volume Control  ;\aka 3DS     ;HWCAL[2E4h]
  TSC[00h:42h]=FDh     ;DSi DAC Right Volume Control ;/ShutterVol0 ;HWCAL[2E4h]
  TSC[01h:2Fh]=2Bh     ;DSi MIC PGA
  TSC[01h:30h]=40h     ;DSi P-Terminal ADC Channel Fine-Gain Input (reset=40h)
  TSC[01h:31h]=40h     ;DSi M-Terminal ADC Input Selection         (reset=40h)
The 3DS does usually access only the registers mentioned above (but there are
many more DSi-style registers in page 00h,01h,03h; see DSi chapter for
details).

Page 04h-0Ch (DSi Coefficient RAM)
  TSC[04h:08h-0Dh]=... ;DSi Some coeff's (7Fh,E1h,80h,1Fh,7Fh,C1h)
  TSC[05h:08h-3Fh]=... ;3DS FilterMic32                      ;HWCAL[364h-39Bh]
  TSC[05h:48h-7Fh]=... ;3DS FilterMic32, too? Or Mic47?      ;HWCAL[xxxh..]
  TSC[08h:0Ch-3Dh]=... ;3DS FilterFreeB   ;\                 ;HWCAL[3DAh-40Bh]
  TSC[08h:4Ch-7Dh]=... ;3DS FilterFreeB'  ; initialized for  ;HWCAL[3DAh-40Bh]
  TSC[09h:02h-07h]=... ;3DS FilterFreeA   ; non-GBA only     ;HWCAL[3D4h-3D9h]
  TSC[09h:08h-0Dh]=... ;3DS FilterFreeA'  ;/                 ;HWCAL[3D4h-3D9h]
  TSC[0Ah:02h-07h]=... ;3DS FilterFreeA''                    ;HWCAL[3D4h-3D9h]
  TSC[0Ah:0Ch-3Dh]=... ;3DS FilterFreeB''                    ;HWCAL[3DAh-40Bh]
  TSC[0Bh:02h-1Fh]=... ;3DS FilterHP32                       ;HWCAL[2ECh-309h]
  TSC[0Bh:20h-3Dh]=... ;3DS FilterHP47                       ;HWCAL[30Ah-327h]
  TSC[0Bh:42h-5Fh]=... ;3DS FilterHP32'                      ;HWCAL[2ECh-309h]
  TSC[0Bh:60h-7Dh]=... ;3DS FilterHP47'                      ;HWCAL[30Ah-327h]
  TSC[0Ch:02h-1Fh]=... ;3DS FilterSP32                       ;HWCAL[328h-345h]
  TSC[0Ch:20h-3Dh]=... ;3DS FilterSP47                       ;HWCAL[346h-363h]
  TSC[0Ch:42h-5Fh]=... ;3DS FilterSP32'                      ;HWCAL[328h-345h]
  TSC[0Ch:60h-7Dh]=... ;3DS FilterSP47'                      ;HWCAL[346h-363h]
The above coefficient RAM pages exists on DSi, too. However, the DSi is usually
initializing only those in page 04h.
Unknown how the 3DS is using the extra coefficients... does it use miniDSP
instructions for that?

Page 64h (3DS Sound/Microphone Config)
  TSC[64h:01h]=01h     ;3DS Software Reset (?)
  TSC[64h:22h]=18h     ;3DS ? (reset=00h, ORed with 18h, later bit2=cleared)
  TSC[64h:25h]=read    ;3DS status, wait for bit3,7
  TSC[64h:26h]=read    ;3DS status, wait for bit3,7
  TSC[64h:2Ch]         ;unused, but nonzero  ;bit0,1=headphone connect status
  TSC[64h:30h]         ;unused, but nonzero
  TSC[64h:31h]=00h/44h ;3DS ? (reset=00h) (GBA:00h, Other:44h)
  TSC[64h:43h]=11h/91h ;3DS set to 11h, later toggles bit=0 then bit7=1
  TSC[64h:44h]         ;unused, but nonzero
  TSC[64h:45h]=20h/30h ;3DS Speaker off (reset=00h, later=20h, 30h=speakerOff)
  TSC[64h:75h]         ;unused, but nonzero
  TSC[64h:76h]=14h/D4h ;3DS ? (reset=14h, ORed with C0h)
  TSC[64h:77h]=0Ch/00h ;3DS ? (reset=0Ch, later clear bit2,3 after coeff init)
  TSC[64h:78h]=00h     ;3DS ?
  TSC[64h:7Ah]=00h     ;3DS ?
  TSC[64h:7Bh]=ECh     ;3DS ShutterVolume1                   ;HWCAL[2E5h]
  TSC[64h:7Ch]=0Ah     ;3DS ? (reset=0Ah, later clears bit0)

Page 65h (3DS Sound/Microphone Gains)
  TSC[65h:0Ah]=0Ah     ;3DS ?
  TSC[65h:0Bh]=1Ch/3Ch ;3DS ?  ... depends on TSC[00h:02h..03h]
  TSC[65h:0Ch]=04h     ;3DS DriverGainHP                     ;HWCAL[2E0h]*8+4
  TSC[65h:11h]=10h/D0h ;3DS ? (reset=00h, ORed with 10h, later ORed with C0h)
  TSC[65h:12h]=06h     ;3DS DriverGainSP    ;\maybe left?    ;HWCAL[2E1h]*4+2
  TSC[65h:13h]=06h     ;3DS DriverGainSP'   ;/      right?   ;HWCAL[2E1h]*4+2
  TSC[65h:16h]=00h     ;3DS AnalogVolumeHP  ;\maybe left?    ;HWCAL[2E2h]
  TSC[65h:17h]=00h     ;3DS AnalogVolumeHP' ;/      right?   ;HWCAL[2E2h]
  TSC[65h:1Bh]=07h     ;3DS AnalogVolumeSP  ;\maybe left?    ;HWCAL[2E3h]
  TSC[65h:1Ch]=07h     ;3DS AnalogVolumeSP' ;/      right?   ;HWCAL[2E3h]
  TSC[65h:33h]=03h     ;3DS MicrophoneBias                   ;HWCAL[2E6h]
  TSC[65h:41h]=00h+wait;3DS PGA_GAIN (mic)      (bit0-5)     ;HWCAL[2E8h]
  TSC[65h:42h]=02h+wait;3DS QuickCharge (what?) (bit0-1)     ;HWCAL[2E7h]
  TSC[65h:47h,4Bh,4Ch,4Dh,4Eh,52h,53h]  ;unused, but nonzero
  TSC[65h:77h]=94h/95h ;3DS ? (reset=94h, ORed with 01h)
  TSC[65h:78h]         ;unused, but nonzero
  TSC[65h:7Ah]=01h     ;3DS ?

Page 67h,FBh (3DS Touchscreen/Circle Pad)
  TSC[67h:17h]=43h     ;3DS ?
  TSC[67h:19h]=69h     ;3DS ?
  TSC[67h:1Bh]=80h     ;3DS ?
  TSC[67h:24h]=98h/18h ;3DS bit7=0=touchscreen.on  ;bit2=1=has new touchdata?
  TSC[67h:25h]=43h/53h ;3DS bit5-2=0100b=touchscreen.on
  TSC[67h:26h]=00h/ECh ;3DS bit7=1=touchscreen.on  ;bit1=1=had old touchdata?
  TSC[67h:27h]=11h     ;3DS ?
  TSC[FBh:01h]=read    ;3DS fifo 26x16bit; 5xTSC.x, 5xTSC.y, 8xCPAD.y, 8xCPAD.x
There are many more unused/zero registers, I haven't tried yet if any of them are R/W, or if there are other ways to get nonzero values into them.

Edit: Are that "TIMER_sleepMs(nn)" based on timings from nintendo? I guess some of the delays might be there to avoid minor speaker noise during init, and some delays might be actually required for working initialization.
The old TSC datasheet mentions a 100us delay between SoftReset and Coefficient RAM initialization, something like that might be needed here, too. I have some doubts about needing a 40ms delay after SoftReset though.
Yes, fully automatic.

Shutter volume is for the camera app probably. When you take a picture it plays this insanely loud shutter sound. Unknown how this is triggered/used.

The data for filterMic32/47 is the same but it is actually stored twice in the fallback calibration data struct. They may be different in the actual calibration for a 3DS. What i'm writing is the fallback calibration data which is hardcoded and only used if loading the calibration data from eMMC (config savegame which contains a copy of the data from the HWCAL files) fails.

There is no code uploading any other data than these filters. So no, i don't think these miniDSP instructions are ever used (if supported at all).

These sleeps are 1:1 what Nintendo does in their code. A few of them are probably only there to free some CPU time for other threads and others are required. Keep in mind this is some weird compatibility bodged CODEC/touchscreen chip. Not an actual TSC2117 so it may behave completely different at least when using the new regs.

Updated the code a final time. Some bug fixes and i swapped val and mask params for the mask reg function to match Nintendos code:
https://gist.github.com/profi200/492664 ... adbffb4aaf
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

Copy/paste for people without github...

Code: Select all

/*
 *   This file is part of fastboot 3DS
 *   Copyright (C) 2019 Sergi Granell (xerpi), Paul LaMendola (paulguy), derrek, profi200
 *
 *   This program is free software: you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation, either version 3 of the License, or
 *   (at your option) any later version.
 *
 *   This program is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

// Based on code from https://github.com/xerpi/linux_3ds/blob/master/drivers/input/misc/nintendo3ds_codec_hid.c

#include "types.h"
#include "arm11/hardware/spi.h"
#include "arm11/hardware/timer.h"
#include "arm11/hardware/gpio.h"


typedef struct
{
	u8 driverGainHP;
	u8 driverGainSP;
	u8 analogVolumeHP;
	u8 analogVolumeSP;
	s8 shutterVolume[2];
	u8 microphoneBias;
	u8 quickCharge;
	u8 PGA_GAIN; // microphone gain
	u8 reserved[3];
	s16 filterHP32[15]; // 3 * 5
	s16 filterHP47[15];
	s16 filterSP32[15];
	s16 filterSP47[15];
	s16 filterMic32[28]; // (1+2)+((1+4)*5)
	s16 filterMic47[28];
	s16 filterFree[28];
	u8 analogInterval;
	u8 analogStabilize;
	u8 analogPrecharge;
	u8 analogSense;
	u8 analogDebounce;
	u8 analog_XP_Pullup;
	u8 YM_Driver;
	u8 reserved2;
} CodecCal;


alignas(4) static CodecCal fallbackCal =
{
	0u,
	1u,
	0u,
	7u,
	{0xFD, 0xEC},
	3u,
	2u,
	0u,
	{0, 0, 0},
	{32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32736, 49168, 0, 16352, 0},
	{32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32745, 49164, 0, 16361, 0},
	{32767, 38001, 22413, 30870, 36440, 51536, 30000, 51536, 0, 0, 32736, 49168, 0, 16352, 0},
	{32767, 36541, 25277, 31456, 35336, 51134, 30000, 51134, 0, 0, 32745, 49164, 0, 16361, 0},
	{32767, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0},
	{32767, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0},
	{32767, 0, 0, 52577, 56751, 32767, 8785, 12959, 52577, 56751, 32767, 8785, 12959, 52577, 56751, 32767, 8785, 12959, 32767, 0, 0, 0, 0, 32767, 0, 0, 0, 0},
	1u,
	9u,
	4u,
	3u,
	0u,
	6u,
	1u,
	0u
};



static void codecSwitchBank(u8 bank)
{
	static u8 curBank = 0x63;
	if(bank != curBank)
	{
		alignas(4) u8 inBuf[4];
		inBuf[0] = 0; // Write
		inBuf[1] = bank;
		NSPI_writeRead(NSPI_DEV_CTR_CODEC, (u32*)inBuf, NULL, 2, 0, true);

		curBank = bank;
	}
}

static void codecReadRegBuf(u8 bank, u8 reg, u32 *buf, u32 size)
{
	codecSwitchBank(bank);

	alignas(4) u8 inBuf[4];
	inBuf[0] = reg<<1 | 1u;
	NSPI_writeRead(NSPI_DEV_CTR_CODEC, (u32*)inBuf, buf, 1, size, true);
}

static u8 codecReadReg(u8 bank, u8 reg)
{
	alignas(4) u8 outBuf[4];
	codecReadRegBuf(bank, reg, (u32*)outBuf, 1);

	return outBuf[0];
}

static void codecWriteRegBuf(u8 bank, u8 reg, u32 *buf, u32 size)
{
	codecSwitchBank(bank);

	alignas(4) u8 inBuf[4];
	inBuf[0] = reg<<1; // Write
	NSPI_writeRead(NSPI_DEV_CTR_CODEC, (u32*)inBuf, NULL, 1, 0, false);
	NSPI_writeRead(NSPI_DEV_CTR_CODEC, buf, NULL, size, 0, true);
}

static void codecWriteReg(u8 bank, u8 reg, u8 val)
{
	codecSwitchBank(bank);

	alignas(4) u8 inBuf[4];
	inBuf[0] = reg<<1; // Write
	inBuf[1] = val;
	NSPI_writeRead(NSPI_DEV_CTR_CODEC, (u32*)inBuf, NULL, 2, 0, true);
}

static void codecMaskReg(u8 bank, u8 reg, u8 val, u8 mask)
{
	u8 data = codecReadReg(bank, reg);
	data = (data & ~mask) | (val & mask);
	codecWriteReg(bank, reg, data);
}

// Helpers
static void codecSwapCalibrationData(CodecCal *cal)
{
	u16 *tmp = cal->filterHP32;
	for(int i = 0; i < 15; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}

	tmp = cal->filterHP47;
	for(int i = 0; i < 15; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}

	tmp = cal->filterSP32;
	for(int i = 0; i < 15; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}

	tmp = cal->filterSP47;
	for(int i = 0; i < 15; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}

	tmp = cal->filterMic32;
	for(int i = 0; i < 28; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}

	tmp = cal->filterMic47;
	for(int i = 0; i < 28; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}

	tmp = cal->filterFree;
	for(int i = 0; i < 28; i++)
	{
		tmp[i] = __builtin_bswap16(tmp[i]);
	}
}

static void codecMaskWaitReg(u8 bank, u8 reg, u8 val, u8 mask)
{
	for(u32 i = 0; i < 64; i++) // Some kind of timeout? No error checking.
	{
		codecMaskReg(bank, reg, val, mask);
		if((codecReadReg(bank, reg) & mask) == val) break;
	}
}

static void codecEnableTouchscreen(void)
{
	codecMaskReg(0x67, 0x26, 0x80, 0x80);
	codecMaskReg(0x67, 0x24, 0, 0x80);
	codecMaskReg(0x67, 0x25, 0x10, 0x3C);
}

static void codecDisableTouchscreen(void)
{
	codecMaskReg(0x67, 0x26, 0, 0x80);
	codecMaskReg(0x67, 0x24, 0x80, 0x80);
}

static void codecLegacyStuff(bool enabled)
{
	if(enabled)
	{
		*((vu16*)0x10141114) |= 2u;
		*((vu16*)0x10141116) |= 2u;
		codecMaskReg(0x67, 0x25, 0x40, 0x40);
	}
	else
	{
		codecMaskReg(0x67, 0x25, 0, 0x40);
		*((vu16*)0x10141114) &= ~2u;
	}
}


void CODEC_init(void)
{
	static bool inited = false;
	if(inited) return;
	inited = true;


	NSPI_init();

	// TODO: Load calibration from HWCAL files on eMMC.
	CodecCal *const cal = &fallbackCal;
	codecSwapCalibrationData(cal); // Come the fuck on. Why is this not stored in the correct endianness?

	// General codec reset + init?
	*((vu8*)0x10141220) = 2;
	codecWriteReg(0x64, 1, 1);
	TIMER_sleepMs(40);
	codecSwitchBank(0); // What? Dummy switch after reset?
	codecWriteReg(0x64, 0x43, 0x11);
	codecMaskReg(0x65, 0x77, 1, 1);
	codecMaskReg(0, 0x39, 0x66, 0x66);
	codecWriteReg(0x65, 0x7A, 1);
	codecMaskReg(0x64, 0x22, 0x18, 0x18);
	GPIO_config(GPIO_2_0, GPIO_IRQ_ENABLE | GPIO_EDGE_RISING | GPIO_INPUT); // Headphone jack IRQ
	//codecMaskReg(0x64, 0x45, (*((vu8*)0x10147010) & 1u)<<4 | 1u<<5, 0x30); // GPIO bitmask 8.
	codecMaskReg(0x64, 0x45, 0, 0x30); // With automatic output switching
	codecMaskReg(0x64, 0x43, 0, 0x80);
	codecMaskReg(0x64, 0x43, 0x80, 0x80);
	codecWriteReg(0, 0xB, 0x87);
	codecMaskReg(0x64, 0x7C, 0, 1);

	// sub_3257FC()
	codecMaskReg(0x64, 0x22, 0, 4);
	// In AgbBg this is swapped at runtime.
	alignas(4) static const u16 unkData1[3] = {0xE17F, 0x1F80, 0xC17F};
	codecWriteRegBuf(4, 8, (u32*)unkData1, 6);
	codecWriteRegBuf(5, 8, (u32*)cal->filterMic32, 56);
	codecWriteRegBuf(5, 0x48, (u32*)cal->filterMic47, 56);
	codecMaskReg(1, 0x30, 0x40, 0xC0);
	codecMaskReg(1, 0x31, 0x40, 0xC0);
	codecWriteReg(0x65, 0x33, cal->microphoneBias);
	codecMaskWaitReg(0x65, 0x41, cal->PGA_GAIN, 0x3F);
	codecMaskWaitReg(0x65, 0x42, cal->quickCharge, 3);
	codecWriteReg(1, 0x2F, 0x2Bu & 0x7Fu);
	codecMaskReg(0x64, 0x31, 0x44, 0x44); // AgbBg uses val = 0 here
	codecWriteReg(0, 0x41, cal->shutterVolume[0]);
	codecWriteReg(0, 0x42, cal->shutterVolume[0]);
	codecWriteReg(0x64, 0x7B, cal->shutterVolume[1]);

	// Sound stuff starts here
	GPIO_config(GPIO_4_0, GPIO_OUTPUT);
	GPIO_write(GPIO_4_0, 1); // GPIO bitmask 0x40
	TIMER_sleepMs(10); // Fixed 10 ms delay when setting this GPIO.
	*((vu16*)0x10145000) = 0xE800u; // 47.61 kHz. codec module writes 0xC800 instead.
	*((vu16*)0x10145002) = 0xE000u;
	codecMaskReg(0x65, 0x11, 0x10, 0x1C);
	codecWriteReg(0x64, 0x7A, 0);
	codecWriteReg(0x64, 0x78, 0);
	{ // This code block is missing in AgbBg but present in codec module.
		const bool flag = (~codecReadReg(0, 0x40) & 0xCu) == 0;
		codecMaskReg(0, 0x3F, 0, 0xC0);
		codecWriteReg(0, 0x40, 0xC);
		for(u32 i = 0; i < 100; i++) // Some kind of timeout? No error checking.
		{
			if(!(~codecReadReg(0x64, 0x26) & 0x44u)) break;
			TIMER_sleepMs(1);
		}
		codecWriteRegBuf(9, 2, (u32*)cal->filterFree, 6);
		codecWriteRegBuf(8, 0xC, (u32*)&cal->filterFree[3], 50);
		codecWriteRegBuf(9, 8, (u32*)cal->filterFree, 6);
		codecWriteRegBuf(8, 0x4C, (u32*)&cal->filterFree[3], 50);
		if(!flag)
		{
			codecMaskReg(0, 0x3F, 0xC0, 0xC0);
			codecWriteReg(0, 0x40, 0);
		}
	}
	{
		const bool flag = (~codecReadReg(0x64, 0x77) & 0xCu) == 0;
		codecMaskReg(0x64, 0x77, 0xC, 0xC);
		for(u32 i = 0; i < 100; i++) // Some kind of timeout? No error checking.
		{
			if(!(~codecReadReg(0x64, 0x26) & 0x88u)) break;
			TIMER_sleepMs(1);
		}
		codecWriteRegBuf(0xA, 2, (u32*)cal->filterFree, 6);
		codecWriteRegBuf(0xA, 0xC, (u32*)&cal->filterFree[3], 50);
		if(!flag) codecMaskReg(0x64, 0x77, 0, 0xC);
	}

	codecWriteRegBuf(0xC, 2, (u32*)cal->filterSP32, 30);
	codecWriteRegBuf(0xC, 0x42, (u32*)cal->filterSP32, 30);
	codecWriteRegBuf(0xC, 0x20, (u32*)cal->filterSP47, 30);
	codecWriteRegBuf(0xC, 0x60, (u32*)cal->filterSP47, 30);
	codecWriteRegBuf(0xB, 2, (u32*)cal->filterHP32, 30);
	codecWriteRegBuf(0xB, 0x42, (u32*)cal->filterHP32, 30);
	codecWriteRegBuf(0xB, 0x20, (u32*)cal->filterHP47, 30);
	codecWriteRegBuf(0xB, 0x60, (u32*)cal->filterHP47, 30);
	codecMaskReg(0x64, 0x76, 0xC0, 0xC0);
	TIMER_sleepMs(10);
	for(u32 i = 0; i < 100; i++) // Some kind of timeout? No error checking.
	{
		if(!(~codecReadReg(0x64, 0x25) & 0x88u)) break;
		TIMER_sleepMs(1);
	}
	codecWriteReg(0x65, 0xA, 0xA);

	codecMaskReg(0, 0x3F, 0xC0, 0xC0);
	codecWriteReg(0, 0x40, 0);
	codecMaskReg(0x64, 0x77, 0, 0xC);

	u8 val;
	if((codecReadReg(0, 2) & 0xFu) <= 1u && ((codecReadReg(0, 3) & 0x70u)>>4 <= 2u))
	{
		val = 0x3C;
	}
	else val = 0x1C;
	codecWriteReg(0x65, 0xB, val);

	codecWriteReg(0x65, 0xC, (cal->driverGainHP<<3) | 4);
	codecWriteReg(0x65, 0x16, cal->analogVolumeHP);
	codecWriteReg(0x65, 0x17, cal->analogVolumeHP);
	codecMaskReg(0x65, 0x11, 0xC0, 0xC0);
	codecWriteReg(0x65, 0x12, (cal->driverGainSP<<2) | 2);
	codecWriteReg(0x65, 0x13, (cal->driverGainSP<<2) | 2);
	codecWriteReg(0x65, 0x1B, cal->analogVolumeSP);
	codecWriteReg(0x65, 0x1C, cal->analogVolumeSP);
	TIMER_sleepMs(38);
	GPIO_write(GPIO_4_0, 0); // GPIO bitmask 0x40
	TIMER_sleepMs(18); // Fixed 18 ms delay when unsetting this GPIO.

	// Circle pad
	codecWriteReg(0x67, 0x24, 0x98);
	codecWriteReg(0x67, 0x26, 0x00);
	codecWriteReg(0x67, 0x25, 0x43);
	codecWriteReg(0x67, 0x24, 0x18);
	codecWriteReg(0x67, 0x17, cal->analogPrecharge<<4 | cal->analogSense);
	codecWriteReg(0x67, 0x19, cal->analog_XP_Pullup<<4 | cal->analogStabilize);
	codecWriteReg(0x67, 0x1B, cal->YM_Driver<<7 | cal->analogDebounce);
	codecWriteReg(0x67, 0x27, 0x10u | cal->analogInterval);
	codecWriteReg(0x67, 0x26, 0xEC);
	codecWriteReg(0x67, 0x24, 0x18);
	codecWriteReg(0x67, 0x25, 0x53);
	// Not needed?
	//I2C_writeReg(I2C_DEV_CTR_MCU, 0x26, I2C_readReg(I2C_DEV_CTR_MCU, 0x26) | 0x10);


	codecEnableTouchscreen();

	//static const u8 sliderBounds[2] = {0xE, 0xF6}; // Volume slider 0% and 100% offset
	//I2C_writeRegBuf(I2C_DEV_CTR_MCU, 0x58, sliderBounds, 2);
	*((vu16*)0x10103000) = 0x8000; // CSND master volume 0-0x8000
	*((vu16*)0x10103002) = 1u<<15 | 1u<<14;

	*((vs16*)0x10103502) = -(s16)(67027964u / 16360);
	*((vu16*)0x10103504) = 0x8000; // Volume right 0-0x8000
	*((vu16*)0x10103506) = 0x8000; // Volume left 0-0x8000
	*((vu16*)0x10103508) = 0x8000; // Cap volume right? 0-0x8000
	*((vu16*)0x1010350A) = 0x8000; // Cap volume left? 0-0x8000
	*((vu32*)0x1010350C) = 0; // Start address
	*((vu32*)0x10103510) = 0; // Size
	*((vu32*)0x10103514) = 0; // Loop restart address
	*((vu32*)0x10103518) = 0; // Start IMA-ADPCM state
	*((vu32*)0x1010351C) = 0; // Loop Restart IMA-ADPCM state
	*((vu16*)0x10103500) = 1u<<15 | 1u<<14 | 3u<<12 | 1u<<10 | 3u  /*| 3u<<8*/; // Control

	/**((vu16*)0x10103502) = -(s16)((67027964u / 22050) & 0xFFFFu);
	*((vu16*)0x10103504) = 0x8000;
	*((vu16*)0x10103506) = 0x8000;
	*((vu16*)0x10103508) = 0x8000;
	*((vu16*)0x1010350A) = 0x8000;
	*((vu32*)0x1010350C) = (u32)mario; // Start address
	*((vu32*)0x10103510) = 43622; // Size
	*((vu32*)0x10103514) = 0; // Loop restart address
	*((vu32*)0x10103518) = 0;
	*((vu32*)0x1010351C) = 0;
	*((vu16*)0x10103500) = 1u<<15 | 1u<<14 | 1u<<12 | 2u<<10;*/
}

bool touchscreenState = false;
bool legacySwitchState = false;

void CODEC_deinit(void)
{
	GPIO_write(GPIO_4_0, 1); // GPIO bitmask 0x40
	TIMER_sleepMs(10); // Fixed 10 ms delay when setting this GPIO.
	legacySwitchState = (codecReadReg(0x67, 0x25) & 0x40u) != 0;
	if(!legacySwitchState) codecLegacyStuff(true);
	codecMaskReg(0x67, 0x25, 0, 3);
	touchscreenState = (codecReadReg(0x67, 0x24)>>7) == 0;
	codecDisableTouchscreen();
	codecMaskReg(0x64, 0x76, 0, 0xC0);
	TIMER_sleepMs(30);
	for(u32 i = 0; i < 100; i++)
	{
		if(!(codecReadReg(0x64, 0x25) & 0x88u)) break;
		TIMER_sleepMs(1);
	}
	codecMaskReg(0x64, 0x22, 2, 2);
	TIMER_sleepMs(30);
	for(u32 i = 0; i < 64; i++)
	{
		if(codecReadReg(0x64, 0x22) & 1u) break;
		TIMER_sleepMs(1);
	}
	*((vu16*)0x10145000) &= ~0x8000u;
	*((vu16*)0x10145002) &= ~0x8000u;
	*((vu8* )0x10141220) = 0;
	GPIO_write(GPIO_4_0, 0); // GPIO bitmask 0x40
}

void CODEC_wakeup(void)
{
	GPIO_write(GPIO_4_0, 1); // GPIO bitmask 0x40
	TIMER_sleepMs(10); // Fixed 10 ms delay when setting this GPIO.
	*((vu8* )0x10141220) = 2u;
	*((vu16*)0x10145000) |= 0x8000u;
	*((vu16*)0x10145002) |= 0x8000u;
	codecMaskReg(0x64, 0x45, 0, 0x30); // Output select automatic
	codecMaskReg(0x64, 0x43, 0, 0x80);
	codecMaskReg(0x64, 0x43, 0x80, 0x80);
	codecMaskReg(0x64, 0x22, 0, 2);
	TIMER_sleepMs(40);
	for(u32 i = 0; i < 40; i++)
	{
		if(!(codecReadReg(0x64, 0x22) & 1u)) break;
		TIMER_sleepMs(1);
	}
	codecMaskReg(0x64, 0x76, 0xC0, 0xC0);
	TIMER_sleepMs(10);
	for(u32 i = 0; i < 100; i++)
	{
		if(!(~codecReadReg(0x64, 0x25) & 0x88u)) break;
		TIMER_sleepMs(1);
	}
	codecMaskReg(0x67, 0x25, 3, 3);
	codecLegacyStuff(legacySwitchState);
	if(touchscreenState) codecEnableTouchscreen();
	GPIO_write(GPIO_4_0, 0); // GPIO bitmask 0x40
	TIMER_sleepMs(18); // Fixed 18 ms delay when unsetting this GPIO.
}

void CODEC_getRawAdcData(u32 buf[13])
{
	//codecSwitchBank(0x67);
	// This reg read seems useless and doesn't affect funtionality.
	//codecReadReg(0x26);

	codecReadRegBuf(0xFB, 1, buf, 52);
}
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

"unkData1[3] = {0xE17F, 0x1F80, 0xC17F};"
Going by the TSC2117 datasheet that is IIR filter for microphone autogain. The same constants (7Fh,E1h,80h,1Fh,7Fh,C1h) are also used in DSi's init function. The three 16bit values are big-endian (defining big-endian 7FE1h as little-endian E17Fh in source code... well, that works, but at least I would have it bundled with a programming caution in source code).

I've tested the headphone (HP) registers, and they are for Left and Right.

Code: Select all

  TSC[65h:12h]=06h     ;3DS DriverGainSP    ;\maybe left?    ;HWCAL[2E1h]*4+2
  TSC[65h:13h]=06h     ;3DS DriverGainSP'   ;/      right?   ;HWCAL[2E1h]*4+2
  TSC[65h:16h]=00h     ;3DS AnalogVolumeHP Left  (0..7Eh?)   ;HWCAL[2E2h]   <-- this tested
  TSC[65h:17h]=00h     ;3DS AnalogVolumeHP Right (0..7Eh?)   ;HWCAL[2E2h]   <-- this tested
  TSC[65h:1Bh]=07h     ;3DS AnalogVolumeSP  ;\maybe left?    ;HWCAL[2E3h]
  TSC[65h:1Ch]=07h     ;3DS AnalogVolumeSP' ;/      right?   ;HWCAL[2E3h]
I couldn't test the speaker (SP) registers, but there's a good chance that they may turn out to be for Left and Right, too.
The AnalogVolumeHP registers seem to be 7bit (with perhaps something else in bit7). 00h is loudest, and 7Eh is (near?) silent. Using value 7Fh on one side causes both sides (left and right) to go off.

And, comparing the filter coefficients with datasheet and info on the hwcal file... the picture might look as so...

Code: Select all

  TSC[04h:08h-0Dh]=... ;DSi Mic Autogain ;IIR, as DSi (7Fh,E1h,80h,1Fh,7Fh,C1h)
  TSC[05h:08h-3Fh]=... ;3DS FilterMic32  ;IIR+Biquad A,B,C,D,E;HWCAL[364h-39Bh]
  TSC[05h:48h-7Fh]=... ;3DS FilterMic32, too? Or Mic47?       ;HWCAL[xxxh..]
  TSC[08h:0Ch-3Dh]=... ;3DS FilterFreeB  ;Biquad ;\initialized;HWCAL[3DAh-40Bh]
  TSC[08h:4Ch-7Dh]=... ;3DS FilterFreeB' ;Biquad ; for        ;HWCAL[3DAh-40Bh]
  TSC[09h:02h-07h]=... ;3DS FilterFreeA  ;IIR.L  ; non-GBA    ;HWCAL[3D4h-3D9h]
  TSC[09h:08h-0Dh]=... ;3DS FilterFreeA' ;IIR.R  ;/only       ;HWCAL[3D4h-3D9h]
  TSC[0Ah:02h-07h]=... ;3DS FilterFreeA'';IIR?            ;\  ;HWCAL[3D4h-3D9h]
  TSC[0Ah:0Ch-3Dh]=... ;3DS FilterFreeB'';Biquad?         ;/  ;HWCAL[3DAh-40Bh]
  TSC[0Bh:02h-1Fh]=... ;3DS FilterHP32   ;Biquad.L A,B,C  ;\  ;HWCAL[2ECh-309h]
  TSC[0Bh:20h-3Dh]=... ;3DS FilterHP47   ;Biquad.L D,E,F  ;/  ;HWCAL[30Ah-327h]
  TSC[0Bh:42h-5Fh]=... ;3DS FilterHP32'  ;Biquad.R A,B,C  ;\  ;HWCAL[2ECh-309h]
  TSC[0Bh:60h-7Dh]=... ;3DS FilterHP47'  ;Biquad.R D,E,F  ;/  ;HWCAL[30Ah-327h]
  TSC[0Ch:02h-1Fh]=... ;3DS FilterSP32   ;Biquad.L A,B,C  ;\  ;HWCAL[328h-345h]
  TSC[0Ch:20h-3Dh]=... ;3DS FilterSP47   ;Biquad.L D,E,F  ;/  ;HWCAL[346h-363h]
  TSC[0Ch:42h-5Fh]=... ;3DS FilterSP32'  ;Biquad.R A,B,C  ;\  ;HWCAL[328h-345h]
  TSC[0Ch:60h-7Dh]=... ;3DS FilterSP47'  ;Biquad.R D,E,F  ;/  ;HWCAL[346h-363h]
That is, the datasheet has "IIR" filters with three 16bit values.
And, the datasheet has "Biquad A,B,C,D,E" filters with five 16bit values (for each of the five biquads, so 5x5 values in total) for microphone.
And, the datasheet has "Biquad A,B,C,D,E,F" filters with five 16bit values (for each of the six biquads, so 6x5 values in total) for sound.
That does match up with what the 3ds init is doing.

The 3ds seems to have separate sound filters for speaker (SP) and headphone (HP), the datasheet didn't mention that feature.

The 3dbrew "hardware calibration" wiki page uses those weird filter names like FilterSP32 and FilterSP47, with suffix -32 and -47.
Looking at gbatek DSi specs, I think 32 and 47 are meant to be corresponding to this table...

Code: Select all

  TSC[8:42h..4Bh]  C33..C37       N0,N1,N2,D1,D2 for Right Biquad A
  TSC[8:4Ch..55h]  C38..C42       N0,N1,N2,D1,D2 for Right Biquad B
  TSC[8:56h..5Fh]  C43..C47       N0,N1,N2,D1,D2 for Right Biquad C
  TSC[8:60h..69h]  C48..C52       N0,N1,N2,D1,D2 for Right Biquad D
  TSC[8:6Ah..73h]  C53..C57       N0,N1,N2,D1,D2 for Right Biquad E
  TSC[8:74h..7Dh]  C58..C62       N0,N1,N2,D1,D2 for Right Biquad F
Well, the register page number is a bit different. And 32 and 47 would match only if the coefficients were renumbered from C1-C255 to C0-C254. And I have no idea why 3brew is listing Biquad A,B,C and Biquad D,E,F as separate calibration entries (instead of one large entry for Biquad A,B,C,D,E,F).

Is there some reason for The 3dbrew "hardware calibration" having 32 and 47 described like that? Are there official sources using the same names... or is the naming with 32 and 47 just some sort of a reverse-engineering theory?

PS. Good to know that there is a master volume! I thought the register had only bit15 R/W, but that was caused by writing 8001h..FFFFh being automatically replaced by 8000h as max volume (as for the channel volume registers).
And good which calibration entries go to which TSC registers in register page 67h.

PPS. Are the FilterMic47 values verfied to be used & written to TSC[05h:48h-7Fh]? I guess that could be tested by patching the hwcal values (to make them different than FilterMic32, and then check which values are written).
I have no good idea what FilterMic32 and FilterMic47 are good for... they might be left and right... but then, sound output seems be using the same values for left+right... not to mention that the microphone is mono anyways.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
profi200
Posts: 66
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 »

nocash wrote: Mon Dec 09, 2019 6:29 pm And, comparing the filter coefficients with datasheet and info on the hwcal file... the picture might look as so...

Code: Select all

  TSC[04h:08h-0Dh]=... ;DSi Mic Autogain ;IIR, as DSi (7Fh,E1h,80h,1Fh,7Fh,C1h)
  TSC[05h:08h-3Fh]=... ;3DS FilterMic32  ;IIR+Biquad A,B,C,D,E;HWCAL[364h-39Bh]
  TSC[05h:48h-7Fh]=... ;3DS FilterMic32, too? Or Mic47?       ;HWCAL[xxxh..]
  TSC[08h:0Ch-3Dh]=... ;3DS FilterFreeB  ;Biquad ;\initialized;HWCAL[3DAh-40Bh]
  TSC[08h:4Ch-7Dh]=... ;3DS FilterFreeB' ;Biquad ; for        ;HWCAL[3DAh-40Bh]
  TSC[09h:02h-07h]=... ;3DS FilterFreeA  ;IIR.L  ; non-GBA    ;HWCAL[3D4h-3D9h]
  TSC[09h:08h-0Dh]=... ;3DS FilterFreeA' ;IIR.R  ;/only       ;HWCAL[3D4h-3D9h]
  TSC[0Ah:02h-07h]=... ;3DS FilterFreeA'';IIR?            ;\  ;HWCAL[3D4h-3D9h]
  TSC[0Ah:0Ch-3Dh]=... ;3DS FilterFreeB'';Biquad?         ;/  ;HWCAL[3DAh-40Bh]
  TSC[0Bh:02h-1Fh]=... ;3DS FilterHP32   ;Biquad.L A,B,C  ;\  ;HWCAL[2ECh-309h]
  TSC[0Bh:20h-3Dh]=... ;3DS FilterHP47   ;Biquad.L D,E,F  ;/  ;HWCAL[30Ah-327h]
  TSC[0Bh:42h-5Fh]=... ;3DS FilterHP32'  ;Biquad.R A,B,C  ;\  ;HWCAL[2ECh-309h]
  TSC[0Bh:60h-7Dh]=... ;3DS FilterHP47'  ;Biquad.R D,E,F  ;/  ;HWCAL[30Ah-327h]
  TSC[0Ch:02h-1Fh]=... ;3DS FilterSP32   ;Biquad.L A,B,C  ;\  ;HWCAL[328h-345h]
  TSC[0Ch:20h-3Dh]=... ;3DS FilterSP47   ;Biquad.L D,E,F  ;/  ;HWCAL[346h-363h]
  TSC[0Ch:42h-5Fh]=... ;3DS FilterSP32'  ;Biquad.R A,B,C  ;\  ;HWCAL[328h-345h]
  TSC[0Ch:60h-7Dh]=... ;3DS FilterSP47'  ;Biquad.R D,E,F  ;/  ;HWCAL[346h-363h]
That is, the datasheet has "IIR" filters with three 16bit values.
And, the datasheet has "Biquad A,B,C,D,E" filters with five 16bit values (for each of the five biquads, so 5x5 values in total) for microphone.
And, the datasheet has "Biquad A,B,C,D,E,F" filters with five 16bit values (for each of the six biquads, so 6x5 values in total) for sound.
That does match up with what the 3ds init is doing.

The 3ds seems to have separate sound filters for speaker (SP) and headphone (HP), the datasheet didn't mention that feature.
Because it's not a TSC2117. It has many differences but the regs in the lower banks are pretty much the same for compatibility.

And regarding the sound filters (3 * 5): Someone tested them and confirms you get 3 biquad IIR filters which seem to be in the order of "B0 B1 B2 A0 A1" or A1 and A2 according to the following: https://en.wikipedia.org/wiki/Digital_biquad_filter
(I don't know anything about these filters and basically just redirect findings i got from someone on IRC.)
nocash wrote: Mon Dec 09, 2019 6:29 pm The 3dbrew "hardware calibration" wiki page uses those weird filter names like FilterSP32 and FilterSP47, with suffix -32 and -47.
Looking at gbatek DSi specs, I think 32 and 47 are meant to be corresponding to this table...

Code: Select all

  TSC[8:42h..4Bh]  C33..C37       N0,N1,N2,D1,D2 for Right Biquad A
  TSC[8:4Ch..55h]  C38..C42       N0,N1,N2,D1,D2 for Right Biquad B
  TSC[8:56h..5Fh]  C43..C47       N0,N1,N2,D1,D2 for Right Biquad C
  TSC[8:60h..69h]  C48..C52       N0,N1,N2,D1,D2 for Right Biquad D
  TSC[8:6Ah..73h]  C53..C57       N0,N1,N2,D1,D2 for Right Biquad E
  TSC[8:74h..7Dh]  C58..C62       N0,N1,N2,D1,D2 for Right Biquad F
Well, the register page number is a bit different. And 32 and 47 would match only if the coefficients were renumbered from C1-C255 to C0-C254. And I have no idea why 3brew is listing Biquad A,B,C and Biquad D,E,F as separate calibration entries (instead of one large entry for Biquad A,B,C,D,E,F).

Is there some reason for The 3dbrew "hardware calibration" having 32 and 47 described like that? Are there official sources using the same names... or is the naming with 32 and 47 just some sort of a reverse-engineering theory?
I don't know what 32 and 47 mean exactly but my best guess is 2 different sets of filters for 32 and 47 kHz.
nocash wrote: Mon Dec 09, 2019 6:29 pm PS. Good to know that there is a master volume! I thought the register had only bit15 R/W, but that was caused by writing 8001h..FFFFh being automatically replaced by 8000h as max volume (as for the channel volume registers).
And good which calibration entries go to which TSC registers in register page 67h.
That volume control is only for CSND though. It has zero effect on anything else (tested and confirmed).

nocash wrote: Mon Dec 09, 2019 6:29 pm PPS. Are the FilterMic47 values verfied to be used & written to TSC[05h:48h-7Fh]? I guess that could be tested by patching the hwcal values (to make them different than FilterMic32, and then check which values are written).
I have no good idea what FilterMic32 and FilterMic47 are good for... they might be left and right... but then, sound output seems be using the same values for left+right... not to mention that the microphone is mono anyways.
They are written. See the source you posted a copy of above. I only removed that because it was identical to filterMic32 to save some space. And 32/47 may again refer to samplerate (for the sound backend).
Also gists are visible to everyone. No account needed but i won't keep them forever.

Btw i added the GPIO to IRQ ID mapping i got from the other gist that i decompiled from AgbBg: https://github.com/derrekr/fastboot3DS/ ... h#L95-L111
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

profi200 wrote: Sun Dec 15, 2019 2:22 pmBecause it's not a TSC2117. It has many differences but the regs in the lower banks are pretty much the same for compatibility.
I know that. I just meant that the IIR and Biquad values are more or less resembling what is described in the datasheet (I didn't meant to say that your code is wrong because it doesn't match up 100% with the datasheet).
For the TSC2117 chip, the coefficient RAM can contain IIR, Biquad, or FIR filters in different combinations. That is, depending on the PRB_xx mode selection in TSC[0:3Ch] and TSC[0:3Dh].
I guess that PRB_xx modes might refer to pre-programmed miniDSP filters in ROM (possibly with different pre-programmed ROM code in DSi and/or 3DS).
profi200 wrote: Sun Dec 15, 2019 2:22 pmI don't know what 32 and 47 mean exactly but my best guess is 2 different sets of filters for 32 and 47 kHz.
Good point, that might well explain why they came up with those numbers... Okay, I've tried to test that...

Code: Select all

  TSC[0Bh:02h-1Fh]=... ;3DS FilterHP32 Biquad A,B,C  ;-no effect
  TSC[0Bh:20h-3Dh]=... ;3DS FilterHP47 Biquad D,E,F  ;-Left Headphone
  TSC[0Bh:42h-5Fh]=... ;3DS FilterHP32 Biquad A,B,C  ;-no effect
  TSC[0Bh:60h-7Dh]=... ;3DS FilterHP47 Biquad D,E,F  ;-Right Headphone
Hmmm, so far, only HP47 is used - no matter if selecting 32kHz or 47kHz in the SNDEXCNT register (10145000h).
For using HP32 one must probably set a TSC bit somewhere... or change the PRB_xx setting in TSC[0:3Ch] (or in an equivalent 3DS-register in page 64h).

Headphone output (both Left+Right) is also affected by these registers:

Code: Select all

  TSC[0Ah:02h-07h]=... ;3DS FilterFreeA IIR
  TSC[0Ah:0Ch-3Dh]=... ;3DS FilterFreeB Biquad A,B,C,D,E
I have no way to test HP32, and SP32.L/R, and SP47.L/R. But I guess that they are also combined with FilterFree settings (in page 08h, 09h, or 0Ah).

Microphone input... I am confident that I can receive something (because the buffer overrun flag in MIC_CNT gets set after a while). But, I am still having two problems (probably unrelated to each other):
1) I do only receive FFFFh's in MIC_DATA register (even when setting all Mic IIR/Biquad's to zero).
2) I haven't found any NDMA startup mode or XDMA/CDMA peripheral ID for microphone.

On the DSi, the microphone is unmuted as so:

Code: Select all

  TSC[1:2Eh]=03h  ;MICBIAS=AVDD
  TSC[0:51h]=80h  ;ADC Digital Mic, on
  TSC[0:52h]=00h  ;ADC Digital Volume Control Fine Adjust, unmute
  TSC[1:2Fh]=37h  ;MIC PGA=27.5dB (or use other value, if desired)
Your code seems to be already doing something equivalent to Microphone Bias and Mic PGA. So the missing part would be the mute & unmute flags in TSC[0:51h] and TSC[0:52h]. Those registers don't seem to work on 3DS though, so there are probably mute/unmute flags for 3DS elsewhere, in page 64h presumably.

For NDMA/CDMA/XDMA, I think that I have found all startup modes & peripheral IDs for everything... except microphone. Maybe microphone DMA requires an extra enable bit somewhere... or maybe microphone DMA is supported on Teak DSP side only?
profi200 wrote: Sun Dec 15, 2019 2:22 pmAlso gists are visible to everyone. No account needed but i won't keep them forever.
Yes, github pages are probably visible to everyone else. Their https stuff just doesn't let me view them on my old PC, so I posted them here using a tablet (I could also PM or email them to my PC, but I thought that it would be no problem... because the pages were already public).
profi200 wrote: Sun Dec 15, 2019 2:22 pmBtw i added the GPIO to IRQ ID mapping i got from the other gist that i decompiled from AgbBg: https://github.com/derrekr/fastboot3DS/ ... h#L95-L111
I've had a look (using the tablet thingy), it looks more or less same as the irq list in gbatek, or did I miss some differences?
What is nice is that you have confirmed the GPIO irqs in range 68h..73h (some of them were only guessed in gbatek, because they didn't respond to switching the GPIO lines to output direction) (so, with your info, I can remove some question marks).
Gbatek does also have some more interrupt numbers (eg. the CDMA event/fault interrupts at 30h..3Bh).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
profi200
Posts: 66
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 »

nocash wrote: Fri Dec 20, 2019 2:33 am Microphone input... I am confident that I can receive something (because the buffer overrun flag in MIC_CNT gets set after a while). But, I am still having two problems (probably unrelated to each other):
1) I do only receive FFFFh's in MIC_DATA register (even when setting all Mic IIR/Biquad's to zero).
2) I haven't found any NDMA startup mode or XDMA/CDMA peripheral ID for microphone.

On the DSi, the microphone is unmuted as so:

Code: Select all

  TSC[1:2Eh]=03h  ;MICBIAS=AVDD
  TSC[0:51h]=80h  ;ADC Digital Mic, on
  TSC[0:52h]=00h  ;ADC Digital Volume Control Fine Adjust, unmute
  TSC[1:2Fh]=37h  ;MIC PGA=27.5dB (or use other value, if desired)
Your code seems to be already doing something equivalent to Microphone Bias and Mic PGA. So the missing part would be the mute & unmute flags in TSC[0:51h] and TSC[0:52h]. Those registers don't seem to work on 3DS though, so there are probably mute/unmute flags for 3DS elsewhere, in page 64h presumably.

For NDMA/CDMA/XDMA, I think that I have found all startup modes & peripheral IDs for everything... except microphone. Maybe microphone DMA requires an extra enable bit somewhere... or maybe microphone DMA is supported on Teak DSP side only?
To power on the MIC codec module does this (which is basically the same as for DSi mode):

"cdc:MIC" cmd 0x00030040 "SetPower" code.

Code: Select all

	codecWriteReg(1, 0x2E, 3);

	codecMaskReg(0, 0x51, 0x80, 0x80);
	for(u32 i = 0; i < 100; i++)
	{
		if(codecReadReg(0, 0x24) & 0x40u) break;
		TIMER_sleepMs(1);
	}

	codecMaskReg(0, 0x52, 0, 0x80);
As for DMA: There probably is no DMA startup mode but i have not reverse engineered the mic module at all.
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

profi200 wrote: Fri Dec 20, 2019 9:23 amTo power on the MIC codec module does this (which is basically the same as for DSi mode)...
Thanks. Hmmm, I thought I had already done that...
Ah, no, stupid mistake: I had two sound test functions, and put the microphone init into the wrong one.
Okay, now it's working: I am getting noisy random values in MIC_DATA.
Though, oddly, that noisy random values do still arrive even when setting all MicFilter values to all zeroes.

EDIT: Or maybe my writes to MicFilter coefficient RAM didn't get through (for the HP filters, I had to "unlock" the RAM via TSC[64h:76h].bit6-7 and issue a delay before writing the HP filter values... maybe the Mic filters require something similar).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

Btw. is it already known how to enable the Infrared LED for the New3DS internal camera?

If it isn't... is there a known way to see if the LED is on or off (other than wiring a multimeter to the LED pins)?
- Does the LED emit visible light through the 'black' plastic cover?
- If not, does it emit visible light when removing the 'black' plastic cover?
- Does the camera capture pictures with different brightness when covering the LED?
I can't easily test that with my broken console and japanese gui... but I guess everyone else could easily see if the LED emits visible light (best try in a dark place, of course) (either when using the camera, or when playing a stereoscopic game with head-tracking).

And then there are that "Programmable Infrared Transmitter (PIT)" entries in the HWCAL file. Whatever that is...
- IrDA receiver/transmitter related? The IrDA chip doesn't seem to have calibration registers though.
- IrDA might be wired to an external LED amplifier, maybe that can be calibrated somehow.
- Or maybe related to the IR LED for the camera (or the other way around: the IR sensitivity of the camera).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

Hmmm, okay, it's probably not so realistic to get such questions answered, and nintendo probably didn't document the LED light's visibility in the user manual : /

But here is a more official topic: I've extracted the ARM VFP floating point specs from the official ARM specs (getting the original 340 pages down to 6 pages):

Code: Select all

ARM Vector Floating-point Unit (VFP)
------------------------------------

The VFP unit exists on 3DS ARM11.

--> ARM VFP Floating Point Registers
--> ARM VFP Floating Point Control/Status Registers
--> ARM VFP Floating Point Opcode Encoding
--> ARM VFP Floating Point Maths Opcodes
--> ARM VFP Floating Point Load/Store Opcodes

Floating Point
The floating point hardware is called VFPv2 (Vector Floating-point).
 ARM DDI 0100I ARM Architecture Reference Manual (for ARMv6 wirh VFPv2)
 ARM DDI 0360F ARM11 MPCore r2p0, contains more (mostly useless) VFPv2 info
The Fxxxx floating point opcodes are aliases for CP10/CP11 copressor numbers;
CP10 used for single, and CP11 for double precision instructions.

ARM VFP Floating Point Registers
--------------------------------

Floating point Registers
Registers S0-S31 can contain Single-precision float values, or 32bit Integers
(for conversion to/from float format), or a pair of two Single-precision
registers can contain one Double-precision float value.
  Scalar Bank       Vector bank 1       Vector bank 2       Vector bank 3
  S1:S0    D0       S9:S8    D4         S17:S16  D8         S25:S24  D12
  S2:S3    D1       S11:S10  D5         S19:S18  D9         S27:S26  D13
  S5:S4    D2       S13:S12  D6         S21:S20  D10        S29:S28  D14
  S7:S6    D3       S15:S14  D7         S23:S22  D11        S31:S30  D15
The VFP supports "Scalar" and "Vector" modes (and a mixed "Vector/Scalar"
mode).
The "Vector" mode can perform simultaneous operations on up to 8 singles, or up
to 4 doubles (via Vector LEN and STRIDE selected in FPSCR register).
The registers are organized in "banks", and vectors cannot cross banks (eg.
using operand S23 with LEN=3, STRIDE=2 would use registers S23,S17,S19).

Scalar Mode, Fd=Fm <op> Fn
The "Scalar" mode performs operations on 1 single or double. This done in any
of following situations:
 - When FPSCR register is set to Vector LEN=1 (and STRIDE=1), or
 - When Destination is S0..S7 or D0..D3 (scalar bank), or
 - When using FCMP comparision opcodes, or
 - When using FCVT or FxxTOxx conversion opcodes, or
 - When using FMxxRxx register transfer opcodes, or
 - When using FLDxx/FSTxx load/store (whereof, FLDM/FSTM can transfer multiple
   registers in vector-like fashion; regardless of LEN/STRIDE settings)

Vector Mode, Fd[LEN]=Fm[LEN] <op> Fn[LEN]
The vector mode does merely perform the selected operation on all array
elements, this is correct for cases like Vector+Vector addition, but incorrect
for Vector*Vector multiplication (to get the final result one must manually
compute the sum of the results).
 - When FPSCR register is set to Vector LEN=2..8 (and STRIDE=1..2), and
 - When Source and Destination are S8..S31 or D4..D15 (vector banks), and
 - When using FADD, FSUB, FDIV, FCPY, FABS, FNEG, FSQRT, or FxMxx multiply

Mixed Mode, Fd[LEN]=Fm <op> Fn[LEN]
This allows to add/multiply/etc. all elements of a vector by a scalar value.
This is done when combining vectors operands with the following:
 - When Source operand Fm is S0..S7 or D0..D3 (scalar bank), and
 - othersise same conditions as for Vector mode

Integer Format (S0..S31 aka I0..I31)
  31-0   Integer (signed or unsigned, depending on FxxTOxx opcode)
The VFP can't do integer maths, however, one can load/store integer values in
S0..S31, and then use the FxxTOxx opcodes to convert integers to/from float
format. The integers are always 32bit (no matter if converting Single/Double
precision float values).

Single Precision Registers (S0..S31)
  31     1bit  Sign (0=Positive, 1=Negative)
  30-23  8bit  Exponent (01h..FEh=for 2^(N-7Fh), or 00h/FFh=Special)
  22-0   23bit Fraction (0..7FFFFFh)

Double Precision Registers (D0..D15)
  63     1bit  Sign (0=Positive, 1=Negative)
  62-52  11bit Exponent (001h..7FEh=for 2^(N-3FFh), or 000h/7FFh=Special)
  51-0   52bit Fraction (0..FFFFFFFFFFFFFh)

Exponent 01h..FEh (Single) or 001h..7FEh (Double):
  Sign * 2^(exponent-7Fh) * (1.fraction)    ;Single
  Sign * 2^(exponent-3FFh) * (1.fraction)   ;Double
Exponent 00h (Single) or 000h (Double), aka Small Numbers and Zero:
  Sign * 2^(-7Eh) * (0.fraction)            ;Single
  Sign * 2^(-3FEh) * (0.fraction)           ;Double
  The above includes 0 being encoded as fraction=0, the sign bit is ignored
  for cases like "compare +/-0", but the sign is used for "divide by +/-0".
  Small numbers in 0.fraction format may require extra clock cycles for
  counting leading zeroes; unknown if that problem does actually exist on
  ARM hardware, however, the "flush to zero" feature (see FPSCR.bit24) can
  be used to avoid that issue; 0.fraction will be then replaced by 0.000.
Exponent FFh (Single) or 7FFh (Double), aka NaN's and Infinite:
  fraction=000000h          or 0000000000000h                 +/-Infinite
  fraction=000001h..3FFFFFh or 0000000000001h..7FFFFFFFFFFFFh +/-Signaling NaNs
  fraction=400000h          or 8000000000000h                 +/-Default NaN
  fraction=400000h..7FFFFFh or 8000000000000h..FFFFFFFFFFFFFh +/-Quite NaNs
  NaNs (Not a Number) can be used for abstract non-numeric expressions; this
  isn't useful for normal maths, but may be useful if a database contains
  entries like "Weight=UNKNOWN". If so, one may handle the NaN before passing
  it to the floating point unit, or otherwise the hardware will either trigger
  an exception (Signaling NaNs) or leave the NaN unchanged (Quite NaNs),
  eg. "UNKNOWN*2+3 = UNKNOWN", or replace it by Default NaN (if FPSCR.bit25=1).
  Different NaNs can be compared using integer comparisons, float comparisions
  of NaNs have "unordered" results (even when comparing a NaN with itself).

ARM VFP Floating Point Control/Status Registers
-----------------------------------------------

FPSID Register (Floating Point System ID) (R)
  31-24  Implementor code (41h=ARM)
  23     Hardware/software implementation (0=Hardware, 1=Software)
  22-21  FSTMX/FLDMX format  (0=Format 1, Other=Reserved)
  20     Supported Precision (0=Single and Double, 1=Single only)
  19-16  Architecture version number (0=VFPv1, 1=VFPv2, 2-15=Reserved)
  15-8   Primary part number of VFP implementation (20h=VFP11) ;\Implementation
  7-4    Variant number                            (0Bh=MPCore); defined
  3-0    Revision number of the part               (04h=Fourth);/
New3DS: 410120b4h = VFPv2 D variant (with single AND double precision).

FPSCR Register (Floating Point Status/Control Register for user-level) (R/W)
  31     N Flag (1=Comparision result is Less Than)
  30     Z Flag (1=Comparision result is Equal)
  29     C Flag (1=Comparision result is Equal, Greater Than, or Unordered)
  28     V Flag (1=Comparision result is Unordered)
         Note: Use FMSTAT opcode to transfer above flags to ARM CPSR flags
  27-26  Unused (0)
  25     Default Nan mode   (XXX see page C2-16) (0=Disable, 1=Enable)
  24     Flush-to-zero mode (XXX see page C2-14) (0=Disable, 1=Enable)
  23-22  Rounding mode (0=To Nearest, 1=Up, 2=Down, 3=Towards Zero)
  21-20  Vector Stride (0/3 = 1/2 Singles; or 0/3 = 1/2 Doubles) (1/2=Reserved)
  19     Unused (0)
  18-16  Vector Len    (0..7 = 1..8 Singles; or 0..3 = 1..4 Doubles)
  15     Trap Enable Input Denormal (aka Subnormal)     ;\
  14-13  Unused (0)                                     ;
  12     Trap Enable Inexact                            ; Trap Enable aka
  11     Trap Enable Underflow                          ; Exception Enable
  10     Trap Enable Overflow                           ;
  9      Trap Enable Division by Zero                   ;
  8      Trap Enable Invalid Operation                  ;/
  7      Cumulative Exception Input Denormal            ;\
  6-5    RES                                            ;
  4      Cumulative Exception Inexact                   ; Cumulative what...?
  3      Cumulative Exception Underflow                 ;
  2      Cumulative Exception Overflow                  ;
  1      Cumulative Exception Division by Zero          ;
  0      Cumulative Exception Invalid Operation         ;/

FPEXC Register (Floating Point Exception Register for system-level) (R/W)
  31     Exception Flag ... long blurb replated to process swap code
  30     Enable Floating Point Instructions (0=Disable, 1=Enable)
  29-0   Sub-architecture defined (see below for mpcore)
 Extra mpcore bits:
  29     Unused (0)
  28     FPINST2 instruction valid flag
  27-11  Unused (0)
  10-8   VECITR Number of remaining iterations after exception (0..6=1..7, 7=0)
  7      INV Input exception flag
  6-4    Unused (0)
  3      UFC Potential Underflow Flag
  2      OFC Potential Overflow Flag
  1      Unused (0)
  0      IOC Potential invalid operation flag
The exception handler must clear bit31 and bit28.

FPINST - Floating-Point Instruction Register, Privileged 0xEE000A00 (R/W)
Contains the opcode that has triggered the exception. The Cond field in
bit28-31 is changed to 0Eh (Always), and the Fd:D, Fn:N, Fm:M are changed to
indicated the fault-location within a vector (with FPEXC.bit8-10 indicating the
remaining unprocessed elements of the vector).

FPINST2 - Floating-Point Instruction Register 2, Privileged UNP (R/W)
If FPEXC.bit28=1, then this register contains another float opcode (that was
prefetched, but not yet executed). The Cond field in bit28-31 is changed to 0Eh
(Always). The exception handler should handle the failed FPINST opcode, then
try to execute prefetched FPINST2 opcode, and then return from exception.

MVFR0, Media and VFP Feature Register 0, Any 0x11111111 (R)
  31-28  VFP hardware support level when user traps are disabled
         (01h=In MPCore processors when Flush-to-Zero and Default_NaN and
         Round-to-Nearest are all selected in FPSCR, the coprocessor does not
         require support code. Otherwise floating-point support code is
         required)
  27-24  Support for short vectors           (01h=Yes)
  23-20  Support for hardware square root    (01h=Yes)
  19-16  Support for hardware divide         (01h=Yes)
  15-12  Support for software/user traps     (01h=Yes/support code is required)
  11-8   Support for double precision VFP    (01h=Yes, v2)
  7-4    Support for single precision VFP    (01h=Yes, v2)
  3-0    Support for the media register bank (01h=Yes/support 16, 64bit regs)

MVFR1 - Media and VFP Feature Register 1, Any 0x00000000 (R)
  31-28  Reserved
  11-8   Support for media extension, single precision floating-point (00h=No)
  7-4    Support for media extension, integer instructions            (00h=No)
  3-0    Support for media extension, load/store instructions         (00h=No)

ARM VFP Floating Point Opcode Encoding
------------------------------------

Comparision of normal ARM copro opcodes and VFP opcodes
  |..3 ..................2 ..................1 ..................0|
  |1_0_9_8_7_6_5_4_3_2_1_0_9_8_7_6_5_4_3_2_1_0_9_8_7_6_5_4_3_2_1_0|
  |_Cond__|1_1_0_0_0_1_0|L|__Rn___|__Rd___|__CP#__|_CPopc_|__CRm__| 2reg normal
  |_Cond__|1_1_0_0_0_1_0|L|__Rn___|__Rd___|__CP#__|0|0|M|1|__Fm___| 2reg on VFP
  |_Cond__|1_1_0|P|U|N|W|L|__Rn___|__CRd__|__CP#__|____Offset_____| Mem normal
  |_Cond__|1_1_0|P|U|D|W|L|__Rn___|__Fd___|__CP#__|____Offset_____| Mem on VFP
  |_Cond__|1_1_1_0|_CPopc_|__CRn__|__CRd__|__CP#__|_CP__|0|__CRm__| CDP normal
  |_Cond__|1_1_1_0|p|D|q|r|__Fn___|__Fd___|__CP#__|N|s|M|0|__Fm___| CDP on VFP
  |_Cond__|1_1_1_0|CPopc|L|__CRn__|__Rd___|__CP#__|_CP__|1|__CRm__| 1reg normal
  |_Cond__|1_1_1_0|CPopc|L|__Fn___|__Rd___|__CP#__|N|0_0|1|0_0_0_0| 1reg on VFP

  Cond             = Condition
  L                = Load/Store direction for memory/register transfers
  Fm:M, Fn:N, Fd:D = Float Registers S0..S31 (or D0..D15, with LSB=0)
  Rd, Rn           = ARM Registers
  PUW, pqrs, CPopc = Opcode bits
  CP#              = Coprocessor number (0Ah=Single-, 0Bh=Double-Precision)
  Offset           = Address step, implies number of registers for FLDM/FSTM

ARM VFP Floating Point Maths Opcodes
------------------------------------

VFP data-processing primary opcodes
  pqrs cp10/cp11                  Instruction functionality
  0000 FMAC{S|D}{cond} Fd,Fn,Fm   Fd = +(Fn*Fm)+Fd   ;Multiply, Add
  0001 FNMAC{S|D}{cond} Fd,Fn,Fm  Fd = -(Fn*Fm)+Fd   ;Multiply, Negate, Add
  0010 FMSC{S|D}{cond} Fd,Fn,Fm   Fd = +(Fn*Fm)-Fd   ;Multiply, Subtract
  0011 FNMSC{S|D}{cond} Fd,Fn,Fm  Fd = -(Fn*Fm)-Fd   ;Multiply, Negate, Sub
  0100 FMUL{S|D}{cond} Fd,Fn,Fm   Fd = +(Fn*Fm)      ;Multiply
  0101 FNMUL{S|D}{cond} Fd,Fn,Fm  Fd = -(Fn*Fm)      ;Multiply, Negate
  0110 FADD{S|D}{cond} Fd,Fn,Fm   Fd = Fn+Fm         ;Add
  0111 FSUB{S|D}{cond} Fd,Fn,Fm   Fd = Fn-Fm         ;Sub
  1000 FDIV{S|D}{cond} Fd,Fn,Fm   Fd = Fn/Fm         ;Divide
  1001 -Undefined-
  1010 -Undefined-
  1011 -Undefined-
  1100 -Undefined-
  1101 -Undefined-
  1110 -Undefined-
  1111 -Extension instructions-

VFP data-processing extension opcodes
  Fn   N cp10/cp11                  Instruction functionality
  0000 0 FCPY{S|D}{cond} Fd,Fm      Fd = Fm        ;Copy
  0000 1 FABS{S|D}{cond} Fd,Fm      Fd = abs(Fm)   ;Absolute
  0001 0 FNEG{S|D}{cond} Fd,Fm      Fd = -Fm       ;Negate
  0001 1 FSQRT{S|D}{cond} Fd,Fm     Fd = sqrt(Fm)  ;Square root
  001x x -Undefined-
  0100 0 FCMP{S|D}{cond} Fd,Fm      Fd-Fm     ;Compare
  0100 1 FCMPE{S|D}{cond} Fd,Fm     Fd-Fm     ;Compare, exception on quiet NaNs
  0101 0 FCMPZ{S|D}{cond} Fd        Fd-0      ;Compare
  0101 1 FCMPEZ{S|D}{cond} Fd       Fd-0      ;Compare, exception on quiet NaNs
  0110 x -Undefined-
  0111 0 -Undefined-
  0111 1 FCVT{DS|SD}{cond} Fd,Fm    Single <--> Double-precision conversion
  1000 0 FUITO{S|D}{cond} Fd,Im     Unsigned integer --> float
  1000 1 FSITO{S|D}{cond} Fd,Im     Signed integer --> float
  1001 x -Undefined-
  101x x -Undefined-
  1100 0 FTOUI{S|D}{cond} Id,Fm     Float --> unsigned integer
  1100 1 FTOUIZ{S|D}{cond} Id,Fm    Float --> unsigned integer, round to zero
  1101 0 FTOSI{S|D}{cond} Id,Fm     Float --> signed integer
  1101 1 FTOSIZ{S|D}{cond} Id,Fm    Float --> signed integer, round to zero
  111x x -Undefined-
cp10: FCVTDS Dd,Sm  ;Double <-- Single
cp11: FCVTSD Sd,Dm  ;Single <-- Double

Nocash syntax
The useless {S|D} and {DS|SD} suffixes are ommitted. FCVT is renamed to FMOV.
F{UI|SI}TO{UI|SI}{Z} is renamed to FMOV{UI|SI}{Z}, with operand I0..I31 for the
integer register.

ARM VFP Floating Point Load/Store Opcodes
-----------------------------------------

VFP single register transfer instructions
  cp   opcode L Instruction name   Instruction functionality
  cp10 000    0 FMSR{cond} Sn,Rd   Sn = Rd        ;\Single-Precision or Integer
  cp10 000    1 FMRS{cond} Rd,Sn   Rd = Sn        ;/
  cp10 111    0 FMXR{cond} sys,Rd  Reg(Fn,N) = Rd ;\SystemReg (FPSID, etc.)
  cp10 111    1 FMRX{cond} Rd,sys  Rd = Reg(Fn,N) ;/  ;<-- or FMSTAT{cond}
  cp11 000    0 FMDLR{cond} Dn,Rd  Dn.31-0 = Rd   ;\LSW of Double-Precision
  cp11 000    1 FMRDL{cond} Rd,Dn  Rd = Dn.31-0   ;/
  cp11 001    0 FMDHR{cond} Dn,Rd  Dn.63-32 = Rd  ;\MSW of Double-Precision
  cp11 001    1 FMRDH{cond} Rd,Dn  Rd = Dn.63-32  ;/
  other's       -Undefined-
System Register encodings:
  Fn   N  System register
  0000 0  FPSID (New3DS: 410120b4h = VFPv2 with single AND double precision)
  0001 0  FPSCR      ;(FMSTAT opcode encodes as FMRX R15,FPSCR)
  0110 0? MVFR1                       ;\mpcore only
  0111 0? MVFR0                       ;/
  1000 0  FPEXC
  1001 0? FPINST                      ;\mpcore only
  1010 0? FPINST2                     ;/

VFP two register transfer instructions (VFPv2 and above)
  cp   L Instruction name               Instruction functionality
  cp10 0 FMSRR{cond} {Sm,Sm+1},Rd,Rn    Fm = Rn, (Fm+1) = Rd    ;XXX swapped?
  cp10 1 FMRRS{cond} Rd,Rn,{Sm,Sm+1}    Rn = Fm, Rd = (Fm+1)    ;XXX swapped?
  cp11 0 FMDRR{cond} Dm,Rd,Rn           Fm[31:0] = Rd, Fm[63:32] = Rn
  cp11 1 FMRRD{cond} Rd,Rn,Dm           Rd = Fm[31:0], Rn = Fm[63:32]

VFP load and store instructions
  PUW  L=0/1,cp10/cp11                             Registers transferred
  000  -Two-register transfer instructions-        -
  001  -Undefined-                                 -
  010  FSTM|FLDMIA{S|D|X}{<cond>} Rn,{Fd,Fd+1,..}  Multiple Registers
  011  FSTM|FLDMIA{S|D|X}{<cond>} Rn!,{Fd,Fd+1,..} Multiple Registers Increment
  100  FST|FLD{S|D}{<cond>} Fd, [Rn{,-offs*4}]     One register, -offs
  101  FSTM|FLDMDB{S|D|X}{<cond>} Rn!,{Fd,Fd+1,..} Multiple Registers Decrement
  110  FST|FLD{S|D}{<cond>} Fd, [Rn{,+offs*4}]     One register, +offs
  111  -Undefined-                                 -
FSTM/FLDM do transfer multiple words (with offs containing the number of words
to be transferred, 1..32 for {S}, or an even number 2..32 for {D}.
VFP load/store multiple addressing modes
  Non-stacking mnemonic   Stacking mnemonic
  FLDMIA{S|D|X}           FLDMFD{S|D|X}     FPOP{S|D|X}
  FLDMDB{S|D|X}           FLDMEA{S|D|X}
  FSTMIA{S|D|X}           FSTMEA{S|D|X}
  FSTMDB{S|D|X}           FSTMFD{S|D|X}     FPUSH{S|D|X}

Nocash syntax
The useless {S|D} suffixes are ommitted, the weird {X} suffix is kept used to
preserve weirdness.
Fancy {} brackets are omitted, LDM/STM must use square [Rn] brackets, the
register list in LDM/STM is specified as Fx-Fz (rather than Fx,Fy,Fz).
All FMxxxx opcodes are renamed to FMOV{LSW|MSW}.

Weird STM/LDM{X} - for registers with unknown precision
The weird {X} mode is same as {D}, but with offset.bit0=1 (ie. with offs=3..33
instead of 2..32; and thereby actually transferring an unused dummy word).
The weird {X} mode is/was intended for registers with unknown content (eg. when
pushing/popping registers without knowing if they contain integer/single/double
precision values; which might be a problem with internal accumulators in the
VFP unit).
The weird {X} mode was declared as "deprecated in ARMv6" in DDI 0100I, but
later re-declared as required for "compatibility with future VFP
implementations" in DDI 360F. However, unknown if there are/were/will be any
such implementations that do require it.
For now, it should be best to use {D} mode instead of weird {X}. Probably even
{S} should also work the same (if the endianess-based word-order doesn't
matter).
I hope that I have covered the most important stuff in there. What is yet missing are details about timings and possible data hazards.

The floating point hardware is quite simple in terms of merely supporting Add, Sub, Mul, Div, Sqrt (without functions for Sine or Exponent etc). The special feature is support for what they call "Vector" maths (performing multiple Additions or Multiplications on small arrays of numbers). In case of multiplications that doesn't really produce a correct result:

Code: Select all

  Sum             = Vector(x,y,z,w) * Vector(x,y,z,w)  ;Wanted result for vector maths
  Vector(x,y,z,w) = Vector(x,y,z,w) * Vector(x,y,z,w)  ;Actual ARM VFP hardware result
Ie. to get the correct result, one needs to manually compute the sum of the resulting x,y,z,w values.

The hardware allows to select LEN=1..8 and STRIDE=1..2 for the vectors. The LEN is the number of elements, eg. 4 for Vector(x,y,z,w). And STRIDE is the register step, specifying register S16 with LEN=4 and STRIDE=1 would use S16,S17,S18,S19 for the vector, and STRIDE=2 would use S16,S18,S20,S22.

An 8-dimensional vector with LEN=8 is rarely needed. Except, it could be useful to perform two calculations on two different 4-dimensional vectors.

---

What I am unsure about is the optimal way to calculate the sum of the multiplication results. With the STRIDE feature, something like this would be optimal:

Code: Select all

  x1 y1 z1 w1 x2 y2 z2 w2
  p1 q1 r1 s1 p2 q2 r2 s2
  |  |  |  |  |  |  |  |    <-- multiply with len=8, stride=1
  x1 y1 z1 w1 x2 y2 z2 w2
  |_/   |_/   |_/   |_/     <-- add with len=4, stride=2
  |     |     |     |
  |____/      |____/        <-- add with len=2, stride=4 (but, stride>2 is
  |           |                                           not supported?)
  n1          n2
But the specs seem to say that only STRIDE=1 and STRIDE=2 are being supported? It's a bit weird because the STRIDE setting is encoded in a 2bit field (but nethertheless seems to support only two settings).

Next idea would be this:

Code: Select all

  x1 y1 z1 w1 x2 y2 z2 w2
  p1 q1 r1 s1 p2 q2 r2 s2
  |  |  |  |  |  |  |  |    <-- multiply with len=8, stride=1
  x1 y1 z1 w1 x2 y2 z2 w2
  |_/   |_/   |_/   |_/     <-- add with len=4, stride=2
  |.... | ... | ... | ....
  |____/|____/|____/|____/  <-- add with len=4, stride=2 (maybe allowed if
  |     |     |     |                                     dest=other bank?)
  n1   junk   n2    junk
I am not sure if it's allowed to use "overlapping" source registers for different additions, and the junk results are a bit dirty.

And another idea...

Code: Select all

  x1 y1 z1 w1 x2 y2 z2 w2
  p1 q1 r1 s1 p2 q2 r2 s2
  |  |  |  |  |  |  |  |    <-- multiply with len=8, stride=1
  x1 y1 z1 w1 x2 y2 z2 w2
  |_/   |_/   |_/   |_/     <-- add with len=4, stride=2
  |     |     |     |
  |____/      |     |       <-- add with dest=scalar
  |           |     |
  |           |____/        <-- add with dest=scalar
  |   _______/
  |  |
  n1 n2
That's quite dumb and requires one extra ADD opcode, the advantage is that one can use the extra ADD to store the result in a continous set of registers. So maybe that's the best way to go... or are there better ways?

Implementing "Vector=Vector*Matrix" or "Matrix=Matrix*Matrix" would work similar (needing to repeat the above steps several times). One problem is that one can't store all operands in the 32 hardware registers.
One could store one 4x4 matrix in 16 registers, and then load the other operands on the fly. I guess that could work well (without needing to repeatedly load the same source operands into hardware registers several times).

Anyways, seeing some sample code for VFP matrix maths would be interesting!
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

I have rev-engineered the PXI_SYNC register aka IPC_SYNC register, with new descriptions for bit29 and bit30:

Code: Select all

10008000h - ARM9 - PXI_SYNC9
10163000h - ARM11 - PXI_SYNC11
  0-7   R    Data received from remote SYNC bit8-15
  8-15  W    Data sent to remote SYNC bit0-7 (CAUTION: write-only, unlike NDS!)
  16-22 -    Unused (0)
  23    -    Unused (0)   ;<-- reportedly "?" whatever that means, if anything?
  24-28 -    Unused (0)
  29    -    PXI_SYNC11: Unused (0)
  30    W    PXI_SYNC11: Send IRQ to ARM9 IF.bit12 (0=No change, 1=Yes)
  29    W    PXI_SYNC9: Send IRQ to ARM11 IRQ 50h  (0=No change, 1=Yes)
  30    W    PXI_SYNC9: Send IRQ to ARM11 IRQ 51h  (0=No change, 1=Yes)
  31    R/W  Enable IRQ from remote CPU            (0=Disable, 1=Enable)
The major missing part is bit23. It isn't R/W (neither on ARM9 nor ARM11 side).
But 3dbrew says that it is "?" which would suggest that... I don't know what it might suggest... the bit being Read-only or Write-only on ARM9 or ARM11 or both?
Is there anything known if/how/when/where bit23 is being used what for?
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

RSA hardware is now more or less rev-engineered on hardware level. The old docs did have "big-endian" confused with "little-endian" byte order, plus unspecified "normal" and "reversed" word orders, and very little info on data aborts and slotcnt/slotsize registers... I have rewritten pretty much everything in the RSA chapter (except for some obscure notes at the bottom of the document, like "if 2 divides mod", I am not sure what that could mean... or if it is important for anything... it was probably only relevant for early speculations on how to dump the bootrom keys).

I have also measured some RSA timings for 100h-byte and 80h-byte keys. The hardware acceleration is quite slow:
200ms for 100h-byte private keys doesn't seem to be fit for use... even when doing only a single calcultation.
3.3ms for 100h-byte public keys might be usable... but I wouldn't be surprised if nintendo is doing dozens or hundreds of public key checks during booting... which might explain why the 3ds is booting like crap : /

Now I am wondering if the ARM11 would be faster when doing RSA by software without using the RSA hardware. Hmmm, the RSA software function that I am using on PC contains a 64bit/32bit divide opcode... that could have worked on DSi, but 3DS seems to have erased all integer division support (and the new floating point unit supports only max 32bit integers)... so porting that code would require some code changes.

Code: Select all

3DS Crypto - RSA Registers (ARM9)
  1000B000h 4    R/W RSA_CNT        Control/status and keyslot select
  1000B0F0h 4    ?   RSA_UNKNOWN    Unknown
  1000B1x0h 4    R/W RSA_SLOTCNT_x  Keyslot 0..3 control/status (x=0..3)
  1000B1x4h 4    R   RSA_SLOTSIZE_x Keyslot 0..3 size/status    (x=0..3)
  1000B200h 4    W   RSA_EXPFIFO    Exponent (10001h, or private key) ;\for
  1000B204h FCh  W   RSA_EXPFIFO    Mirrors of above                  ; current
  1000B400h 100h R/W RSA_MOD        Modulus (public key)              ;/keyslot
  1000B800h 100h R/W RSA_DATA       Incoming Data and Result

1000B000h - ARM9 - RSA_CNT (R/W)
  0     Start/Busy (0=Idle/Ready, 1=Enable/Busy)
  1     IRQ Enable (0=Disable, 1=Enable, set ARM9 IF.bit22 when Ready)
  2-3   Unused (0)
  4-5   Keyslot    (0..3=Key 0-3)   ;for Start/Busy and RSA_MOD,EXPFIFO
  6-7   Unused (0)
  8     Byte order (0=Little endian, 1=Big Endian) ;for RSA_MOD,DATA,EXPFIFO
  9     Word order (0=Little endian, 1=Big Endian) ;for RSA_MOD,DATA
  10-31 Unused (0)

1000B0F0h - ARM9 - RSA_UNKNOWN (R and/or W)
  0-28  Unknown/unused?   (always zero)
  29    Unknown/readonly? (always set)
  30-31 Unknown/unused?   (always zero)
The bootroms writes zero in RSA IRQ init, so some bits might be write-only?
Writing seems to have no effect on anything though.

1000B100h+(0..3)*10h - ARM9 - RSA_SLOTCNT_0..3 (R/W)
  0     Read: Exponent Status (0=Bad=LessThan4orOdd, 1=Good=4orMoreAndEven) (R)
        Write: Clear RSA_SLOTSIZE/EXPFIFO (0=Clear, 1=No Change)            (W)
  1     Disable RSA_EXPFIFO Writes        (0=Normal, 1=DataAbort)         (R/W)
  2     Disable RSA_MOD Reads             (0=Normal, 1=DataAbort)         (R/W)
  3-30  Unused (0)
  31    Disable RSA_SLOTCNT_x Writes      (0=Normal, 1=Disable/permanent) (R/w)
Bit1,2 can be changed only if Status Bit0 was set (Good=4orMoreAndEven).

1000B104h+(0..3)*10h - ARM9 - RSA_SLOTSIZE_0..3 (R)
  0-31  Number of words written to EXPFIFO (range 0..40h)
Indicates the EXP size (and implied: the MOD and DATA sizes). Whilst that
number is important, programmers should usually know how many words they had
used, without needing to use the SLOTSIZE register.

1000B200h - ARM9 - RSA_EXPFIFO - Exponent; usually 10001h, or Private Key (W)
1000B204h..1000B2FFh - ARM9 - RSA_EXPFIFO mirrors (W)
  0-31  FIFO, in current byte-order, to be written MSW first (max 40h words)
The number of words written to EXPFIFO does imply the size of the MOD and DATA
values (that makes sense for private exponents, but the public exponent 10001h
must be padded with leading zeroes, and then followed by 10001h in last word
(aka 01000100h for big-endian).
The RSA hardware wants the number of words to be an even number in range of
4..40h, aka a multiple of 8 bytes in range 10h..100h (and if so, sets
SLOTCNT.bit0).

1000B400h..1000B4FFh - ARM9 - RSA_MOD - Modulus; Public Key (R/W)
  100h-byte area, in currently selected byte/word-order, for current keyslot
The upper bits are unused/don't care when SLOTSIZE<40h.

1000B800h..1000B8FFh - ARM9 - RSA_DATA (100h bytes) (R/W)
  100h-byte area, in currently selected byte/word-order
Contains the data that is to be encrypted/decrypted, and contains the result
after completion.
The upper bits are unused/don't care when SLOTSIZE<40h.

RSA Timing Examples (versus 67MHz timer)
  0.9ms (E475h clks)   Public key, 80h-bytes (DSi-style)
  3.3ms (3571Dh clks)  Public key, 80h-bytes zeropadded to 100h-bytes size
  3.3ms (3574Ah clks)  Public key, 100h-bytes (3DS-style)
  200ms (CE59A5h clks) Private key, 100h-bytes (3DS-style)

Invalid Operations with/without Data Abort
  Reading MOD or DATA when busy (and maybe also on writing?)  --> Data Abort
  Reading MOD when disabled in SLOTCNT                        --> Data Abort
  Writing EXPFIFO when disabled in SLOTCNT                    --> Data Abort
  Writing more than 40h words to EXPFIFO                      --> Data Abort
  Reading EXPFIFO or reading unused registers like 1000B5xxh  --> Returns Zero

Keyslot 0..3 usage
During boot, the bootrom uses the following PUBLIC keys (100h byte modulus,
with exponent 10001h):
  Slot 0 uninitialized                     (unused)
  Slot 1 retail=FFFFB1E0h, debug=FFFFC4E0h (for FIRM from eMMC)
  Slot 2 retail=FFFFB2E0h, debug=FFFFC5E0h (for FIRM from Wifi-Flash/NDS-Cart)
  Slot 3 retail=FFFFB0E0h, debug=FFFFC3E0h (for NCSD from eMMC)
After boot, the bootrom does replace the above keys by four PRIVATE keys (with
100h byte modulus, and 100h byte exponent):
  Slot 0 retail=FFFFB3E0h, debug=FFFFC6E0h ;\Hardware slots (modulus+exponent)
  Slot 1 retail=FFFFB5E0h, debug=FFFFC8E0h ; (the modulus are also stored in
  Slot 2 retail=FFFFB7E0h, debug=FFFFCAE0h ; RAM at ITCM+3D00h+(0..3)*100h)
  Slot 3 retail=FFFFB9E0h, debug=FFFFCCE0h ;/
  Slot 4 retail=FFFFBBE0h, debug=FFFFCEE0h ;\
  Slot 5 retail=FFFFBDE0h, debug=FFFFD0E0h ; RAM slots (modulus+exponent are
  Slot 6 retail=FFFFBFE0h, debug=FFFFD2E0h ; stored at ITCM+4100h+(0..3)*200h)
  Slot 7 retail=FFFFC1E0h, debug=FFFFD4E0h ;/
Later on, the firmware does replace some slots by other keys:
  Slot 0 Arbitrary (uh?)
  Slot 1 CXI access desc (following the exheader, uh?)
  Slot 2 Unused (contains the private key from bootrom)
  Slot 3 Unused (contains the private key from bootrom)

RSA Basics and Differences to DSi
The 3DS RSA hardware is more or less same as the DSi RSA BIOS functions, see
there for general info about RSA maths and RSA padding:
--> BIOS RSA Functions (DSi only)
Apart from being hardware-implemented, the 3DS is usually using 100h-bytes
(instead 80h), with SHA256 signatures (instead SHA1), and with OpenPGP headers
(instead raw padding).

Blurb
Writing to RSA_MOD does not change the exponent written with RSA_EXPFIFO. An
attack based on the Pohlig-Hellman algorithm exists to "read" the contents of
RSA_EXPFIFO as a result (see 3DS System Flaws).

RSA Overview
The RSA module is essentially a hardware-accelerated modular exponentiation
engine. It is specially optimized for RSA applications, so its behavior can be
incoherent when RSA's invariants are broken.
The PKCS (uh?) message padding must be manually checked by software, as
hardware will only do raw RSA operations.

Observed edge cases
  "if 2 divides mod, output == 0"
uh, how to "divide" a "mod" ?
uh, for mod2, remainder SHOULD be 0 or 1 (or is it ALWAYS 0 here?)
uh, also, MOD may be required to be bigger than DATA?
PS. Does anyone know what "PKCS" means? I guess it refers to RSA-padding... or padding with OID values (which I have called "OpenPGP" in gbatek, but I am not sure if that is the official name, or who came up with those OID values originally).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
PSI
Posts: 17
Joined: Mon May 13, 2019 5:32 pm

Re: 3DS reverse engineering

Post by PSI »

PKCS means Public-Key Cryptography Standards. It's the format used for RSA signatures in the 3DS, which consists of padding, some metadata, and an SHA-256 hash. Probably all of this is useful to avoid brute-force, though I'm not a crypto expert.

The public keys should be more important in everyday 3DS operation. You can use a public key to verify a signature is correct, and it doesn't hurt for those keys to be public as you can't make your own signatures with them.
profi200
Posts: 66
Joined: Fri May 10, 2019 4:48 am

Re: 3DS reverse engineering

Post by profi200 »

nocash wrote: Fri Dec 20, 2019 12:32 pm Btw. is it already known how to enable the Infrared LED for the New3DS internal camera?

If it isn't... is there a known way to see if the LED is on or off (other than wiring a multimeter to the LED pins)?
- Does the LED emit visible light through the 'black' plastic cover?
- If not, does it emit visible light when removing the 'black' plastic cover?
- Does the camera capture pictures with different brightness when covering the LED?
I can't easily test that with my broken console and japanese gui... but I guess everyone else could easily see if the LED emits visible light (best try in a dark place, of course) (either when using the camera, or when playing a stereoscopic game with head-tracking).

And then there are that "Programmable Infrared Transmitter (PIT)" entries in the HWCAL file. Whatever that is...
- IrDA receiver/transmitter related? The IrDA chip doesn't seem to have calibration registers though.
- IrDA might be wired to an external LED amplifier, maybe that can be calibrated somehow.
- Or maybe related to the IR LED for the camera (or the other way around: the IR sensitivity of the camera).
You can see the IR LED with pretty much any crap camera. In the dark there is also a visible glow (these LEDs emit a little bit of visible light). No idea how to enable it.

nocash wrote: Fri Dec 27, 2019 8:33 am I have rev-engineered the PXI_SYNC register aka IPC_SYNC register, with new descriptions for bit29 and bit30:

Code: Select all

10008000h - ARM9 - PXI_SYNC9
10163000h - ARM11 - PXI_SYNC11
  0-7   R    Data received from remote SYNC bit8-15
  8-15  W    Data sent to remote SYNC bit0-7 (CAUTION: write-only, unlike NDS!)
  16-22 -    Unused (0)
  23    -    Unused (0)   ;<-- reportedly "?" whatever that means, if anything?
  24-28 -    Unused (0)
  29    -    PXI_SYNC11: Unused (0)
  30    W    PXI_SYNC11: Send IRQ to ARM9 IF.bit12 (0=No change, 1=Yes)
  29    W    PXI_SYNC9: Send IRQ to ARM11 IRQ 50h  (0=No change, 1=Yes)
  30    W    PXI_SYNC9: Send IRQ to ARM11 IRQ 51h  (0=No change, 1=Yes)
  31    R/W  Enable IRQ from remote CPU            (0=Disable, 1=Enable)
The major missing part is bit23. It isn't R/W (neither on ARM9 nor ARM11 side).
But 3dbrew says that it is "?" which would suggest that... I don't know what it might suggest... the bit being Read-only or Write-only on ARM9 or ARM11 or both?
Is there anything known if/how/when/where bit23 is being used what for?
Interesting, so there is another sync bit and the unknown IRQ fires writing 1. Guess: This may be intended for TWL/AGB_FIRM to inform the ARM11 when the ARM7/9 talk to each other but not sure.

nocash wrote: Sun Jan 05, 2020 10:51 pm RSA hardware is now more or less rev-engineered on hardware level. The old docs did have "big-endian" confused with "little-endian" byte order, plus unspecified "normal" and "reversed" word orders, and very little info on data aborts and slotcnt/slotsize registers... I have rewritten pretty much everything in the RSA chapter (except for some obscure notes at the bottom of the document, like "if 2 divides mod", I am not sure what that could mean... or if it is important for anything... it was probably only relevant for early speculations on how to dump the bootrom keys).

I have also measured some RSA timings for 100h-byte and 80h-byte keys. The hardware acceleration is quite slow:
200ms for 100h-byte private keys doesn't seem to be fit for use... even when doing only a single calcultation.
3.3ms for 100h-byte public keys might be usable... but I wouldn't be surprised if nintendo is doing dozens or hundreds of public key checks during booting... which might explain why the 3ds is booting like crap : /

Now I am wondering if the ARM11 would be faster when doing RSA by software without using the RSA hardware. Hmmm, the RSA software function that I am using on PC contains a 64bit/32bit divide opcode... that could have worked on DSi, but 3DS seems to have erased all integer division support (and the new floating point unit supports only max 32bit integers)... so porting that code would require some code changes.

Code: Select all

3DS Crypto - RSA Registers (ARM9)
  1000B000h 4    R/W RSA_CNT        Control/status and keyslot select
  1000B0F0h 4    ?   RSA_UNKNOWN    Unknown
  1000B1x0h 4    R/W RSA_SLOTCNT_x  Keyslot 0..3 control/status (x=0..3)
  1000B1x4h 4    R   RSA_SLOTSIZE_x Keyslot 0..3 size/status    (x=0..3)
  1000B200h 4    W   RSA_EXPFIFO    Exponent (10001h, or private key) ;\for
  1000B204h FCh  W   RSA_EXPFIFO    Mirrors of above                  ; current
  1000B400h 100h R/W RSA_MOD        Modulus (public key)              ;/keyslot
  1000B800h 100h R/W RSA_DATA       Incoming Data and Result

1000B000h - ARM9 - RSA_CNT (R/W)
  0     Start/Busy (0=Idle/Ready, 1=Enable/Busy)
  1     IRQ Enable (0=Disable, 1=Enable, set ARM9 IF.bit22 when Ready)
  2-3   Unused (0)
  4-5   Keyslot    (0..3=Key 0-3)   ;for Start/Busy and RSA_MOD,EXPFIFO
  6-7   Unused (0)
  8     Byte order (0=Little endian, 1=Big Endian) ;for RSA_MOD,DATA,EXPFIFO
  9     Word order (0=Little endian, 1=Big Endian) ;for RSA_MOD,DATA
  10-31 Unused (0)

1000B0F0h - ARM9 - RSA_UNKNOWN (R and/or W)
  0-28  Unknown/unused?   (always zero)
  29    Unknown/readonly? (always set)
  30-31 Unknown/unused?   (always zero)
The bootroms writes zero in RSA IRQ init, so some bits might be write-only?
Writing seems to have no effect on anything though.

1000B100h+(0..3)*10h - ARM9 - RSA_SLOTCNT_0..3 (R/W)
  0     Read: Exponent Status (0=Bad=LessThan4orOdd, 1=Good=4orMoreAndEven) (R)
        Write: Clear RSA_SLOTSIZE/EXPFIFO (0=Clear, 1=No Change)            (W)
  1     Disable RSA_EXPFIFO Writes        (0=Normal, 1=DataAbort)         (R/W)
  2     Disable RSA_MOD Reads             (0=Normal, 1=DataAbort)         (R/W)
  3-30  Unused (0)
  31    Disable RSA_SLOTCNT_x Writes      (0=Normal, 1=Disable/permanent) (R/w)
Bit1,2 can be changed only if Status Bit0 was set (Good=4orMoreAndEven).

1000B104h+(0..3)*10h - ARM9 - RSA_SLOTSIZE_0..3 (R)
  0-31  Number of words written to EXPFIFO (range 0..40h)
Indicates the EXP size (and implied: the MOD and DATA sizes). Whilst that
number is important, programmers should usually know how many words they had
used, without needing to use the SLOTSIZE register.

1000B200h - ARM9 - RSA_EXPFIFO - Exponent; usually 10001h, or Private Key (W)
1000B204h..1000B2FFh - ARM9 - RSA_EXPFIFO mirrors (W)
  0-31  FIFO, in current byte-order, to be written MSW first (max 40h words)
The number of words written to EXPFIFO does imply the size of the MOD and DATA
values (that makes sense for private exponents, but the public exponent 10001h
must be padded with leading zeroes, and then followed by 10001h in last word
(aka 01000100h for big-endian).
The RSA hardware wants the number of words to be an even number in range of
4..40h, aka a multiple of 8 bytes in range 10h..100h (and if so, sets
SLOTCNT.bit0).

1000B400h..1000B4FFh - ARM9 - RSA_MOD - Modulus; Public Key (R/W)
  100h-byte area, in currently selected byte/word-order, for current keyslot
The upper bits are unused/don't care when SLOTSIZE<40h.

1000B800h..1000B8FFh - ARM9 - RSA_DATA (100h bytes) (R/W)
  100h-byte area, in currently selected byte/word-order
Contains the data that is to be encrypted/decrypted, and contains the result
after completion.
The upper bits are unused/don't care when SLOTSIZE<40h.

RSA Timing Examples (versus 67MHz timer)
  0.9ms (E475h clks)   Public key, 80h-bytes (DSi-style)
  3.3ms (3571Dh clks)  Public key, 80h-bytes zeropadded to 100h-bytes size
  3.3ms (3574Ah clks)  Public key, 100h-bytes (3DS-style)
  200ms (CE59A5h clks) Private key, 100h-bytes (3DS-style)

Invalid Operations with/without Data Abort
  Reading MOD or DATA when busy (and maybe also on writing?)  --> Data Abort
  Reading MOD when disabled in SLOTCNT                        --> Data Abort
  Writing EXPFIFO when disabled in SLOTCNT                    --> Data Abort
  Writing more than 40h words to EXPFIFO                      --> Data Abort
  Reading EXPFIFO or reading unused registers like 1000B5xxh  --> Returns Zero

Keyslot 0..3 usage
During boot, the bootrom uses the following PUBLIC keys (100h byte modulus,
with exponent 10001h):
  Slot 0 uninitialized                     (unused)
  Slot 1 retail=FFFFB1E0h, debug=FFFFC4E0h (for FIRM from eMMC)
  Slot 2 retail=FFFFB2E0h, debug=FFFFC5E0h (for FIRM from Wifi-Flash/NDS-Cart)
  Slot 3 retail=FFFFB0E0h, debug=FFFFC3E0h (for NCSD from eMMC)
After boot, the bootrom does replace the above keys by four PRIVATE keys (with
100h byte modulus, and 100h byte exponent):
  Slot 0 retail=FFFFB3E0h, debug=FFFFC6E0h ;\Hardware slots (modulus+exponent)
  Slot 1 retail=FFFFB5E0h, debug=FFFFC8E0h ; (the modulus are also stored in
  Slot 2 retail=FFFFB7E0h, debug=FFFFCAE0h ; RAM at ITCM+3D00h+(0..3)*100h)
  Slot 3 retail=FFFFB9E0h, debug=FFFFCCE0h ;/
  Slot 4 retail=FFFFBBE0h, debug=FFFFCEE0h ;\
  Slot 5 retail=FFFFBDE0h, debug=FFFFD0E0h ; RAM slots (modulus+exponent are
  Slot 6 retail=FFFFBFE0h, debug=FFFFD2E0h ; stored at ITCM+4100h+(0..3)*200h)
  Slot 7 retail=FFFFC1E0h, debug=FFFFD4E0h ;/
Later on, the firmware does replace some slots by other keys:
  Slot 0 Arbitrary (uh?)
  Slot 1 CXI access desc (following the exheader, uh?)
  Slot 2 Unused (contains the private key from bootrom)
  Slot 3 Unused (contains the private key from bootrom)

RSA Basics and Differences to DSi
The 3DS RSA hardware is more or less same as the DSi RSA BIOS functions, see
there for general info about RSA maths and RSA padding:
--> BIOS RSA Functions (DSi only)
Apart from being hardware-implemented, the 3DS is usually using 100h-bytes
(instead 80h), with SHA256 signatures (instead SHA1), and with OpenPGP headers
(instead raw padding).

Blurb
Writing to RSA_MOD does not change the exponent written with RSA_EXPFIFO. An
attack based on the Pohlig-Hellman algorithm exists to "read" the contents of
RSA_EXPFIFO as a result (see 3DS System Flaws).

RSA Overview
The RSA module is essentially a hardware-accelerated modular exponentiation
engine. It is specially optimized for RSA applications, so its behavior can be
incoherent when RSA's invariants are broken.
The PKCS (uh?) message padding must be manually checked by software, as
hardware will only do raw RSA operations.

Observed edge cases
  "if 2 divides mod, output == 0"
uh, how to "divide" a "mod" ?
uh, for mod2, remainder SHOULD be 0 or 1 (or is it ALWAYS 0 here?)
uh, also, MOD may be required to be bigger than DATA?
PS. Does anyone know what "PKCS" means? I guess it refers to RSA-padding... or padding with OID values (which I have called "OpenPGP" in gbatek, but I am not sure if that is the official name, or who came up with those OID values originally).
Private key operations have always been slower. It's just how it works. And i doubt software RSA has any advantage because that code just clogs the caches, takes code space and eats CPU time it could spend on other things (with extra hardware CPU and RSA stuff can run in parallel). I hope the DSi measurements account for the timer clock being half of the 3DS ARM9 timers.

And yeah, what PSI said. PKCS is a standard for signature padding. You can see on the 3DS what happens if you don't follow the standard correctly... *cough* sighax *cough*
For signatures you want deterministic padding to prevent brute force attacks. PKCS signatures contain some fixed padding and ASN.1 data which tell you where in the signature the hash is located (it's probably always at the end but you need to parse it anyway for security).



PSI:
Can you come online on IRC? There is something we want to ask you.
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

Okay, PKCS means Public-Key Cryptography Standards, and that OID stuff is called ASN.1. Yeah, that seems to pre-date the RFC 4880 OpenPGP specs that I had referred to.
The advantage of the OpenPGP specs is that they contain some kind of eye-catching tables; the PKCS specs are looking so abstract that I would almost prefer to have never heard of them ; )

Good to know that there's a visible glow on the IR LED... then it would be unneccessary (and pretty stupid) to wire a multimeter to it. I have tried to toggle a bunch of bits here and there to try to light the LED, but nothing happened so far. Best would be probably to trace the LED wire to see which chip it is connected to.

My guess for the two SYNC IRQs would have been that they were intended for sending separate IRQs to the two ARM11 cores. If so, they seem to have dropped that idea in New3DS (at least I haven't found extra IRQs for the QuadCore).
Something related to ARM7/ARM9 might be also possible, I have no idea if ARM7/ARM9 is sharing the same sync IRQ signals as used by ARM9/ARM11.

The RSA timings are all measured on 3DS hardware in 3DS mode (I have only called the 80h-byte keys "DSi-style" because the DSi did use keys of that size) (note that the 3DS can compute 80h-byte keys in two ways: With 80h-byte size (fast), or with leading zeroes, padded to 100h-byte size (gives the same result, but works almost 4x slower)).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
nocash
Posts: 1405
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: 3DS reverse engineering

Post by nocash »

And some more info on SHA Registers. Apart from general clean-up, new details are...
That the length is being counted in byte-units, and that (and when) the length is automatically being updated by hardware... and info on how to write single bytes to the FIFO (eg. needed & working when length is odd).
There are still some unknown corner cases, but at least they are now properly question-marked... and of course one should rather avoid corner cases like writing when FIFO full anyways.
I don't know if there's support for 32bit writes when old fifo-content has odd length, that might be useful - if it should be supported.
For the "reportedly "?" (but actually: always 0)" bits, I guess that info could be completely removed... unless there are known cases where hardware or software is using that bits?

Code: Select all

DS Crypto - SHA Registers
  1000A000h 4     ARM9        SHA_CNT           ;\
  1000A004h 4     ARM9        SHA_BLKCNT        ; for ARM9
  1000A040h 20h   ARM9        SHA_HASH          ;
  1000A080h 40h   ARM9        SHA_FIFO          ;/
  10101000h 4     ARM11/ARM9  SHA_CNT           ;\for ARM11 (some registers
  10101004h 4     ARM11/ARM9  SHA_BLKCNT        ; can be also accessed by ARM9,
  10101040h 20h   ARM11/ARM9  SHA_HASH          ; but FIFO and DMA DRQs are
  10301000h 40h   ARM11       SHA_FIFO          ;/working for ARM11 only)

1000A000h/10101000h - SHA_CNT - SHA Control (R/W)
  0     Read: IN_FIFO full   (0=No/ready, 1=Full/Busy)                      (R)
        Write: First round   (0=No change, 1=Reset BLKCNT and HASH)         (W)
  1     Final round          (0=No/ready, 1=Enable/Busy)                  (R/W)
  2     IN_FIFO DMA Enable   (0=Disable, 1=Enable CDMA DRQ 0Bh)           (R/W)
  3     Byte order of Result (0=Little endian, 1=Big endian/Standard)     (R/W)
  4-5   Mode                 (0=SHA256, 1=SHA224, 2=3=SHA1)               (R/W)
  6-7   Unused (0)  ;reportedly "?" (but actually: always 0)                (?)
  8     OUT_FIFO Enable      (0=No, 1=Readback Mode)           ;\optional (R/W)
  9     OUT_FIFO Status      (0=Empty, 1=Non-empty)            ; readback   (R)
  10    OUT_FIFO DMA Enable  (0=Disable, 1=Enable CDMA DRQ 0Ch);/         (R/W)
  11-15 Unused (0)                                                          (-)
  16-17 Unused (0)  ;reportedly "?" (but actually: always 0)                (?)
  18-31 Unused (0)                                                          (-)
The optional readback mode allows to readback each 40h-byte block from the
FIFO, this can reduce memory reads (and temporary memory writes), for example:
  EMMC --> SHA --> AES --> Memory        ;saves 1xMemWrite and 2xMemRead
  EMMC --> AES --> SHA --> Memory        ;saves 1xMemRead
However, the readback mode is reportedly slow, and it may be faster to use an
extra memory read instead of readback(?)

1000A004h/10101004h - SHA_BLKCNT - SHA Input Length (R/W)
  0-31  Length in bytes (0..FFFFFFFFh)
The length is automatically updated by hardware:
  Length is reset to zero when setting SHA_CNT.bit0 (start).
  Length increments by 40h after each 40h-byte FIFO block.
  Length increments by remaining FIFO size after setting SHA_CNT.bit1 (final).
The hardware does automatically append the length value (and some padding bits)
to the data stream before computing the final result.

1000A040h/10101040h - SHA_HASH - State/Result (20h bytes) (R/W)
Contains the SHA state/result. The word order is fixed, the byte order (per
32bit word) depends on SHA_CNT.bit3.
Setting SHA_CNT.bit0 does automatically apply the following initial values:
 SHA256 6A09E667,BB67AE85,3C6EF372,A54FF53A,510E527F,9B05688C,1F83D9AB,5BE0CD19
 SHA224 C1059ED8,367CD507,3070DD17,F70E5939,FFC00B31,68581511,64F98FA7,BEFA4FA4
 SHA1   67452301,EFCDAB89,98BADCFE,10325476,C3D2E1F0,0       ,0       ,0
The values are updated after each 40h-byte FIFO block, and updated once more
after final round.
SHA1 leaves the last 3 words unused. SHA224 and SHA256 do internally use all 8
words (but the last word is usually omitted when reading the SHA224 result).

1000A080h/10301000h - SHA_FIFO (40h bytes) - SHA_IN (W) and SHA_OUT (R)
  0-7   1st byte        ;\
  8-15  2nd byte        ; data to be checksummed
  16-23 3rd byte        ;
  24-31 4th byte        ;/
The FIFO is mapped to a 40h-byte area at FIFO+0..3Fh. The word address is don't
care (one can write all words to FIFO+0, or to FIFO+0,4,8,..,3Ch).
However, 8bit/16bit writes do REQUIRE the lower two addresss bits to match up
with the number of previously written bytes (eg. byte writes must go to
FIFO+0,1,2,3,4,5,..,3Fh or FIFO+0,1,2,3,0,1,.,3). Writing 8bit/16bit is
normally needed only for the last block before setting final flag (and only if
the length isn't a multiple of 40h). Writing 8bit/16bit may be also needed if
the data comes from an odd source address (but that would slowdown everything).

Invalid Operations with/without Data Abort
  Reading FIFO when CNT.bit8=0 returns ZERO (readback disabled)
  Reading FIFO when CNT.bit8=1 and FIFO empty causes Data Abort (enabled+empty)
Untested...
  Writing FIFO when FIFO full... is ignored? or data abort?
  Writing 32bit to FIFO content is odd (not N*4 bytes)... causes what?
  Writing 32bit to FIFO already contains 3Dh..3Fh bytes... causes what?
I haven't measured SHA timings (nor timings for SHA interacting with other hardware). But added a note that the FIFO readback mode could be slower than separate transfers.
If it is actually slower... I guess that could depend on whether SHA is done before or after AES (ie. for the two examples mentioned in above doc, one case would save 1xMemRead, and the other (probably rarer) case would save 2xMemRead+1xMemWrite).
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty
Post Reply