Page 1 of 2

Auditing your own word censors.

Posted: Fri Jan 10, 2020 3:45 am
by Pokun
[Split from this topic, from which a spam reply was deleted]

What the...?? What kind of filter would replace things by "my mom"!?

Anyway you can write down some checksum or something on your ROM and, if any other files are included, compressed archive to make sure nothing is altered at any point. To make sure the cartridge was made correctly I guess you have to dump a copy and check with the ROM checksum you made.

As for cleaning up unused code, I guess that's fine if the game is tested thoroughly after that. After final testing, no changes should probably be made to the ROM or you have to do over the testing again.

Re: Auditing your own code.

Posted: Fri Jan 10, 2020 6:10 pm
by nocash
The filter does replace "via-gra" by "my mom", maybe other words, too. That might explain some confusing spam posts that had occured in past some years, like people offering to "buy (my mom) online"... it might also shed new light on posts from people telling that they were "building a desktop PC for (my mom)"?

Re: Auditing your own code.

Posted: Fri Jan 10, 2020 8:34 pm
by rainwarrior
cruddy 3V bootleg carts █RX SPAM█?

wow, that kind of silent replacement is pretty fucking disturbing. I do not like this at all.

It'd be one thing if it replaced blocked words with **** or whatever, but making substitutions like that erodes my confidence in all communication.

Re: Auditing your own code.

Posted: Fri Jan 10, 2020 9:23 pm
by tokumaru
I always chuckle when I see spammers talking about their moms here! I love it! I'm pretty sure this filter has been in place for years, but I still don't know which specific words get replaced (in addition to the one that was just mentioned).

Re: Auditing your own code.

Posted: Fri Jan 10, 2020 11:10 pm
by Memblers
That word filter is my fault, it's been in there for ages. It wrecks the spam URLs and was a funny/dumb way of bullying the spammers, in the style of "why do you keep hitting yourself?". The only ones with their mom was v_iagra, l_evitra, c_ialis (I remember now we did have a problem with it affecting "specialist" at one time, so it now it has wildcard characters removed). There's 14 in total, mostly related to pharmaceuticals and warcraft currencies.

Guess I could change the v_word at least, obviously it was never supposed to affect actual user posts, sorry about that.

Re: Auditing your own code.

Posted: Fri Jan 10, 2020 11:18 pm
by tepples
Without giving too much away, I'll explain all current word censors in general terms.

- Test patterns used by spambots
- Prescription drugs
- Services to gain an advantage in an MMORPG
- A couple specific reddit posts copied and pasted here, replaced with a sandwich ad
- The URL of one of blargg's old sites, replaced with a newer URL

Re: Auditing your own code.

Posted: Sat Jan 11, 2020 2:38 am
by nocash
I like the replacement, it's quite funny, and it does hardly hit non-spam posts. The post about building a PC for my mom was probably meant as-so... although some people do probably really build PCs for getting paid with special goods ; )
If anything, it's possibly a bit unfair for real moms. Something that isn't a person might work better. My nostrils, half-eaten bananas, used chewing gums, my intimate thoughts about useless things, whatever.
Uhm, but please not ******, that makes me feel uneasy and reminds me about scary people whom say "f***ing cool sh**" instead of "fucking cool shit".

Re: Auditing your own code.

Posted: Sat Jan 11, 2020 3:34 am
by Pokun
OK that explains everything, and makes sense considering how popular this place is among spammers. I do remember one time when a spammer was seemingly trying to sell his mom. It was very funny. :)

Re: Auditing your own code.

Posted: Sat Jan 11, 2020 10:58 am
by rainwarrior
Mainly I just want to be able to believe that what someone typed is what I'm looking at. If you wanna tell me that the substitutions hit 0% of real posts, then it's not a problem, but I just have to trust you on it.

tepples: I don't want to know the list of words to try and reverse engineer original messages from substituted ones. :P That just adds another layer of puzzle on top of the comprehension and trust problem. Also shouldn't those words be kept secret anyway, as per their function as an anti-spam factor?

Like whatever secret stuff you gotta do to fight spam is fine. If there's a reason that my mom is more suitable than **** then go ahead and stick with it, but if it's all the same to you I'd much rather know when the text I'm looking at has been altered than not know. I don't know if that's worse for spam security though, and I presume that's best discussed in private anyway.

Re: Auditing your own code.

Posted: Sat Jan 11, 2020 2:12 pm
by tepples
That's why I mentioned categories, to keep the actual words secret. The only category likely to show up in remotely on-topic posts is the substitution for one of blargg's old sites. If it'd help, I could add special punctuation to all substitutions.

Re: Auditing your own code.

Posted: Sun Jan 12, 2020 3:13 am
by Memblers
I went in and added a note to most of the substitutions, at least the ones that have a non-zero chance of being used in a normal post. Now instead of spammers looking like they're engaging in some bizarre form of human trafficking, now it just looks like they're selling cruddy 3V bootleg carts █RX SPAM█. Which might be almost as offensive around here, heheh.

I almost forgot about that "building a PC for my mom" thread. Thinking about it, I do remember doing a double-take when I first saw that. Fun coincidence.

Re: Auditing your own code.

Posted: Sun Jan 12, 2020 12:12 pm
by Bregalad
rainwarrior wrote:
Fri Jan 10, 2020 8:34 pm
!cruddy 3V bootleg carts (word replaced by spam filter)?

wow, that kind of silent replacement is pretty fucking disturbing. I do not like this at all.
I can see where you're coming from, but on one side, I like this funny home-made and NESdev specific spam fight. It's not creepy like Facebook or Google doing massive private (non state controlled) and possibly politically or economically biased censorship.

Re: Auditing your own code.

Posted: Sun Jan 12, 2020 6:36 pm
by rainwarrior
That post you quoted has said 3 different things over the course of this conversation without me editing the post. ;)

Re: Auditing your own code.

Posted: Sun Jan 12, 2020 7:41 pm
by tepples
Fighting spam would be a lot easier with a non-ancient version of MySQL that accepts non-BMP characters like 🐖

EDIT: Apparently non-BMP characters can be used in posts, but not in word censors. I was thinking of surrounding each replacement with a pig emoji on each side, but that won't work in the current setup.

Re: Auditing your own code.

Posted: Sun Jan 12, 2020 8:09 pm
by Gilbert
This thread is really offtopic now, as everyone is talking about your mom forum spam filters. Maybe it's a good time to split the topic?