Reformatting GBATEK html version

Discussion of development of software for any "obsolete" computer or video game system.
Post Reply
nocash
Posts: 1258
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Reformatting GBATEK html version

Post by nocash » Wed Oct 21, 2020 11:47 pm

profi200 wrote:
Sat Oct 17, 2020 8:05 am
Recommendation (not exactly related to 3DS reverse engineering):
Split gbatek into multiple html files. It has become so big that lower end devices struggle with the site. Lower end for example being older ARMv7 tablets/smartphones or if you try to browse the site on a Raspberry Pi.
It even took seconds to load on a relatively modern i7 machine when i still had ADSL (got my connection upgraded to 100 Mbit/s VDSL recently).
Yes, the html version is very slow and doesn't look pretty. I am mostly using the source code (about equivalent to viewing "gbatek.txt" in your favorite text editor), and the no$gba.exe help engine (for having bold headlines and hyperlinks). But I've been wondering about making a version with multiple-htm-files for a while, too. And also about a few changes...

New version with multiple .htm files
Just started working on that today. Current plan is simply making a separate page for each chapter (for each section with headline on gray background). Alternately, I could group all Video chapters or all Sound chapters into larger pages, but I tend to use only one chapter per page.
Apart from being faster to load the pages, I hope that it will also get more hits from internet search engines (which do currently more or less ignore gbatek, probably because they are considering it as too large to match to specifc search requests).

Old version with single .txt/.htm file
I'll keep hosting the old whole-document-in-one-file, too (and you could also keep generating it yourself using the "Save as help.txt/htm" function in no$gba help engine).
http://problemkaputt.de/gbatek.txt - text 3.8Mbyte
http://problemkaputt.de/gbatek.htm - html 4.7Mbyte
I am wondering what is making the page loading so slow. It might be my server being not the fastest one, or the network transfer speed, or browsers just being unable to display large documents. The first two issues could be avoided by saving a local copy of the htm/txt file.

Width Limit?
How would one limit the html page width to 80 characters? The various tables are using a nonproportional font with max 80 chars per line, and the line wrapping for the remaining text should be matched to that width (currently it tends to be 200+ chars/line when using a fullscreen browser window, which doesn't look nice).

Search Function?
Is there some freeware search engine for searching keywords/strings in multiple html files (eg. in /gbatek/*.htm), using php or so?

Blind People?
I don't know if there are many blind people interested in video game hardware, but just in case: What could/should be improved? I am aware of some possible issues:
The headlines are currently merely using large font size (or bold text for sub-headers), I am not sure if it's a real problem, but I guess they should be ideally marked as html headlines?
The tables are just plain txt without row/column markers (I think that's useful when copying chapters into text editors, eg. as source code comments), but for blind people it's probably almost impossible to keep track of the column widths. I am wondering if html tables with row/column markers are much easier to read though. Reading two-dimensional tables via voice-output is probably quite confusing.
Well, and then there are some ascii-art drawings, that's probably about as worse as gif's or jpeg's.

VG Wort Links
I am planning to add VG Wort links to the document, which will hopefully get me a little money per view. The money comes from copyright fees charged on selling storage media like harddisks, cdr's, sd cards, etc. in germany (accordingly, it's only working for german authors and views from german readers, and the money comes from the above fees: you won't have to pay anything or register/login when reading the html document).
Counting the number of views is done by linking to a 1x1 pixel image on the vgwort.de server. I haven't heard anybody complaining about that method yet.
Concerning privacy, it should be anonymous (and even if there would be a data leak, a list of IP addresses that have accessed gbatek is probably not so harmful).
The other issue is that the browser may be trying to download the pixel during offline reading, ie. bugging you to go online, or even automatically going online. The latter might be a problem if your internet provider is charging money per traffic or online time. Is there a way around that? Like adding a "skip-if-offline" flag to the vgwort-link?
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

tepples
Posts: 22141
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Reformatting GBATEK html version

Post by tepples » Thu Oct 22, 2020 9:07 am

nocash wrote:
Wed Oct 21, 2020 11:47 pm
I am wondering what is making the page loading so slow. It might be my server being not the fastest one, or the network transfer speed, or browsers just being unable to display large documents. The first two issues could be avoided by saving a local copy of the htm/txt file.
In my experience, it's not the server, and it's not the network. I think it's just web browsers not optimized for the use case of displaying and scrolling through 3 megabytes of text. Most HTML documents are far shorter than that. Another complaint I also see in a Discord server about Game Boy and Game Boy Advance homebrew development is that loading GBATEK blocks rendering the GBA stuff until the browser has finished parsing a bunch of DSi and 3DS stuff in which they're not interested. Would you say the 3DS in 3DS mode is almost as different from the GBA and DS as the GBA is from the GB (despite GB mode) or the Super NES is from the GB (despite the SGB accessory)?
nocash wrote:
Wed Oct 21, 2020 11:47 pm
Width Limit?
How would one limit the html page width to 80 characters? The various tables are using a nonproportional font with max 80 chars per line, and the line wrapping for the remaining text should be matched to that width (currently it tends to be 200+ chars/line when using a fullscreen browser window, which doesn't look nice).
The typical way is to use CSS to set a maximum width on the body element and horizontally center it within its parent (that is, within the html element).

Code: Select all

/* Something like this inside your <style> element should get you started. */
body { max-width: 40em; margin: 0 auto }
nocash wrote:
Wed Oct 21, 2020 11:47 pm
Search Function?
Is there some freeware search engine for searching keywords/strings in multiple html files (eg. in /gbatek/*.htm), using php or so?
Starting at "search" on the IndieWeb wiki, I see many solutions listed in the "site search with site backend - level 4" section as overkill for a small static site. At a previous job, I implemented a full-text search engine over thousands of product description pages in PHP.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
Blind People?
I don't know if there are many blind people interested in video game hardware, but just in case: What could/should be improved? I am aware of some possible issues:
The headlines are currently merely using large font size (or bold text for sub-headers), I am not sure if it's a real problem, but I guess they should be ideally marked as html headlines?
Proper use of headline elements helps users and assistive devices infer your document's outline.
Use h1 through h6 to mark section titles at different depths in the outline, or equivalently using section to mark nested sections and h1 to give each of a title. It might even let software generate a table of contents for you, if your source code isn't already doing that.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
The tables are just plain txt without row/column markers (I think that's useful when copying chapters into text editors, eg. as source code comments), but for blind people it's probably almost impossible to keep track of the column widths. I am wondering if html tables with row/column markers are much easier to read though.
A screen reader uses the table, tr, th, and td elements to figure out what the row and column context for each cell are. So does a visual web browser running on a small screen, such as a mobile phone display that can't show much more on each line than the 40 columns of C64 or MSX text mode. As for automated conversion from to actual table cells, I'd recommend starting by widening the gutter between cells in each table to 2 spaces. This way a program can determine what is and isn't a table cell. This should let you keep the tables as preformatted text and then convert that to HTML each time the HTML version is rebuilt, though it might be tricky to specify when a single row spans multiple lines of text.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
Well, and then there are some ascii-art drawings, that's probably about as worse as gif's or jpeg's.
SVG is preferred for diagrams, preferably with alt attributes for short substitutes and D-links to longer substitutes. I remembered that a lot of ASCII art diagrams on wiki.nesdev.com are data flow diagrams, which led me to search the web for data flow diagram accessibility. This led me to "Making Data Flow Diagrams Accessible for Visually Impaired Students Using Excel Tables" by Vicki L. Sauter (PDF) from the 1Q 2015 issue of Journal of Information Systems Education. What it says about Excel tables you could apply to HTML table elements.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
VG Wort Links
I am planning to add VG Wort links to the document, which will hopefully get me a little money per view. The money comes from copyright fees
I guess this raises a question of how much of the NO$GMB version of Pan Docs is Pan's and how much is yours. I've noticed that the GB homebrew community has forked the document on https://gbdev.io/pandocs/, and I think it'd be for the best to figure out where credit is due.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
The other issue is that the browser may be trying to download the pixel during offline reading, ie. bugging you to go online, or even automatically going online. The latter might be a problem if your internet provider is charging money per traffic or online time.
Such as a mobile device going out of Wi-Fi range and switching over to an LTE network that charges $10 per gigabyte. Web browsers provide "data saver" and "tracking protection" options for users who frequently run into this.

I'll notify the GBA homebrew community of this discussion so they can weigh in.

User avatar
nin-kuuku
Posts: 28
Joined: Tue Jan 24, 2017 1:23 am

Re: Reformatting GBATEK html version

Post by nin-kuuku » Thu Oct 22, 2020 9:58 am

I'm testing gbatek.htm whit my ARMv4+ARMv5 dumbphone (sometimes called Nintendo DS) and it is indeed somewhat slow. But it's still faster than this thread. I think new bowsers just dont know what to do whit actual content. More flashanimations needed?

profi200
Posts: 49
Joined: Fri May 10, 2019 4:48 am

Re: Reformatting GBATEK html version

Post by profi200 » Fri Oct 23, 2020 7:20 am

nocash wrote:
Wed Oct 21, 2020 11:47 pm
Old version with single .txt/.htm file
I'll keep hosting the old whole-document-in-one-file, too (and you could also keep generating it yourself using the "Save as help.txt/htm" function in no$gba help engine).
http://problemkaputt.de/gbatek.txt - text 3.8Mbyte
http://problemkaputt.de/gbatek.htm - html 4.7Mbyte
I am wondering what is making the page loading so slow. It might be my server being not the fastest one, or the network transfer speed, or browsers just being unable to display large documents. The first two issues could be avoided by saving a local copy of the htm/txt file.
I doubt it's the server. But a slow connection increases initial loading time a little. I'm on Firefox 82 now and force enabled WebRender (it's basically full GPU acceleration) and after the site is loaded it's fine. On lower end devices it's noticeably slower in both initial loading and navigation. Modern Browsers are probably not made for this amount of html in one chunk but the x86(_64) CPUs are so fast these days that it doesn't really show on these machines.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
Search Function?
Is there some freeware search engine for searching keywords/strings in multiple html files (eg. in /gbatek/*.htm), using php or so?
Yeah, this may be needed for a split version. The all in one version you can just use the search feature of the browser.
nocash wrote:
Wed Oct 21, 2020 11:47 pm
VG Wort Links
I am planning to add VG Wort links to the document, which will hopefully get me a little money per view. The money comes from copyright fees charged on selling storage media like harddisks, cdr's, sd cards, etc. in germany (accordingly, it's only working for german authors and views from german readers, and the money comes from the above fees: you won't have to pay anything or register/login when reading the html document).
Counting the number of views is done by linking to a 1x1 pixel image on the vgwort.de server. I haven't heard anybody complaining about that method yet.
Concerning privacy, it should be anonymous (and even if there would be a data leak, a list of IP addresses that have accessed gbatek is probably not so harmful).
The other issue is that the browser may be trying to download the pixel during offline reading, ie. bugging you to go online, or even automatically going online. The latter might be a problem if your internet provider is charging money per traffic or online time. Is there a way around that? Like adding a "skip-if-offline" flag to the vgwort-link?
I'm not sure this will work. You have some stuff on gbatek that is of questionable origin (from the TWL SDK). And as mentioned by others many browsers block stuff like hidden pixels because it's considered tracking (even if they promise to keep it anonymous you can never know). Ad-blockers are also common now and once you land on one of the blocking lists hits will plummet.

nitro2k01
Posts: 229
Joined: Sat Aug 28, 2010 9:01 am

Re: Reformatting GBATEK html version

Post by nitro2k01 » Sat Oct 24, 2020 3:25 am

I've been hosting a "wiki-fied" version of Pan Docs since about 2008. However, I later started regretting this decision, and took the initiative to put a version under Pan Docs under control of the gbdev organization on Github, which is run by avivace, me and a bunch of other people. What we do there is store the "source code" of the document as Markdown, which is a formatting that's supposed to be readable as plaintext. If you put a number of equals or minus characters under a line, it becomes a header. If you put asterisks around a word, it becomes italic. Double asterisks bold. And so on. It's meant to be both human readable as text, and machine readable. This is then used to output HTML although it could be converted to almost any format using a tool (ironically and unrelatedly) called Pandoc.

I don't know if you are intent on using your own tools, but maybe this would be something to consider? Both the part about using more standard tooling to do the conversion from a source format like Markdown, and making it more of a community project. Maybe doing that would run the risk of wasting your time dealing with "people problems" instead of doing actual research, though.

I think Pandocs fits comfortably into a single document. In Pandocs, I often navigate by simply searching with ctrl+f in the web browser. In GBATEK, the document is far too big and searching that way can easily become sluggish. I think it should definitely be split up into smaller sections. I still personally prefer the "single document" layout as a concept, but maybe splitting it up as GBA/NDS/DSi/3DS/CPU reference would be a good compromise. With that said, I don't program for these machines (only for GB) so I rarely have a reason to look at GBATEK.

As for VG Wort, we have something similar in Sweden. It's called the cassette tape tax (or formally privatkopieringsersättning, compensation for private copying) but applies to all storage mediums sold. In my opinion, this is an immoral tax collected by a private organization. I personally would want nothing to do with it, including collecting money from it. I think a better way (both morally and in terms of actual income) would be to push your Patreon more and also post more actively there. I'm pretty sure there are more than 32 people in the world willing to give you money for the work you do.

calima
Posts: 1238
Joined: Tue Oct 06, 2015 10:16 am

Re: Reformatting GBATEK html version

Post by calima » Sun Oct 25, 2020 1:48 am

FWIW, even fullsnes.htm could use splitting, maybe main system on one page and all the weird extension devices on another.

nocash
Posts: 1258
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Reformatting GBATEK html version

Post by nocash » Sun Oct 25, 2020 1:37 pm

tepples wrote:
Thu Oct 22, 2020 9:07 am

Code: Select all

body { max-width: 40em; margin: 0 auto }
Thanks! That's about good enough (and easier than expected, I have never used CSS, and thought that it would require an external .css file).
The 40em doesn't exactly translate to 80 fixed-width characters in PRE sections (results are a bit random depending on the default font sizes used by the browser).
I have added a few more style settings for font-size & font-family to avoid that random appearance:

Code: Select all

<HTML>
 <HEAD>
  <STYLE>
    body,h2{font-size:12px}
    pre{font-size:11px}
    h1,h2{font-weight:normal;margin:0;display:inline;padding:0}
    body,h1,h2{font-family:Arial, Helvetica, sans-serif}
    pre{font-family:"Courier New", Courier, monospace}
    body{max-width:45em}
Are there any drawbacks when using hardcoded things like "font-size:12px"? I hope it won't hurt people with bad eyes, prevent page zooming, or mess up printing (in case somebody is still using paper documents).
Marking headlines as h1 and h2... it may be good for blind people... but it's vandalizing the visual appearance (using weird bold font sizes and inserting blank lines above & below the headline). The above STYLE settings are fixing that issues, so I guess I could add those h1/h2 tags (it'll look ugly on older pre-CSS browsers).
tepples wrote:
Thu Oct 22, 2020 9:07 am
I'd recommend starting by widening the gutter between cells in each table to 2 spaces. This way a program can determine what is and isn't a table cell.
Good idea... but it might run into practical problems... exceeding the 80 column limit, act weirdly on blank cells, or require more strict format (with constant number of chars per column). And after conversion, many browsers may be unable to copy/paste the table converted back from htm to txt.

For the Search Function, I have near zero experience with server script languages, and don't know where to start. Some compact ready for use solution would be nice. It would require loading file(s), searching string(s), and displaying a list of search results with links to the corresponding chapters.
If the search function (and results page) can parse html would be nice (ie. remove <tags> and parse &chr; to actual characters). Or, otherwise it could also search the gbatek.txt version.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

Post Reply