Porting my tools to C or C++

You can talk about almost anything that you want to on this board.

Moderator: Moderators

tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Porting my tools to C or C++

Post by tepples »

Some data conversion and compression tools in the build process of some of my public and private programming projects for retro game consoles are time-consuming. For example, 240p Test Suite takes 10 seconds to run a DTE encoder on the help pages, and per Amdahl's law, the remainder of the build process has to stall on this one step no matter how many cores I devote to everything else. My pipelines are slow in part because they are written in the dynamic language Python and run in the CPython interpreter. I know of two ways to speed up a program written in Python:

Run the tools in PyPy
PyPy is a JIT comiler that speeds up pure-Python code. I tried this, and it didn't speed much up. It turned out a lot of the time was spent in PyPy's compatibility with CPython's extension API, which is slow. Many of these use Pillow (Python Imaging Library) to crop an image into 8x8- or 16x16-pixel chunks and do operations on each chunk's data, but the transition from PyPy to Pillow to PyPy to produce each such chunk is slow. A feature request for PyPy support in Pillow was closed as soon as it built, not as soon as it was fast. A fast PyPy extension uses CFFI.

Rewrite the extension in a less dynamic language
In this post, jroatch reported well over a hundredfold speedup by rewriting a data compression utility from Python to C. This is practical on Debian or Ubuntu, where I can just switch what packages I ask the user to apt install. But I'm not aware of how to make installing a C++ compiler and image reading and writing libraries for said compiler easy for my collaborators who use the Windows operating system. I've made a walkthrough for installing Git Bash, Make, Python, Pillow, and cc65 on Windows, but I'd need the same for a C or C++ compiler. I would also need to support my collaborators in case they miss a step in the walkthrough, the Windows GUI changes out from under them (which it has done), or a bug fix or enhancement to a tool breaks functionality on only Windows. And I imagine this will prove more difficult seeing as I have only occasional access to a Windows PC.

Which of these two options (or a third that I didn't mention but you will) is any good? And what else will I need to know to make it practical? For example, what image reading and writing library for Python is better than Pillow, and what image reading and writing library for C or C++ is any good?
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Porting my tools to C or C++

Post by calima »

You're wasting your time trying to cater to Windows users with source. They expect exes, so build them an exe via mingw.
User avatar
gauauu
Posts: 779
Joined: Sat Jan 09, 2016 9:21 pm
Location: Central Illinois, USA
Contact:

Re: Porting my tools to C or C++

Post by gauauu »

It really depends on what your main goals are. The two important questions are:
- what would you enjoy doing
- what do your collaborators want and care about?

Are your collaborators complaining about tool speed or difficulty with python? Are you excited about the idea of switching to C or C++? If not, then it just might not be worth the effort of converting and maintaining a C or C++ version. Tool speed is often not the most important metric.
calima wrote:You're wasting your time trying to cater to Windows users with source. They expect exes, so build them an exe via mingw.
100% agreed. The standard windows way to do it is to provide an exe. Linux users are comfortable getting C or C++ code and doing something with it. Windows users (even most developers) aren't.
User avatar
slembcke
Posts: 172
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: Porting my tools to C or C++

Post by slembcke »

Mac/Linux machines ship with Python out of the box, and installing a C compiler is one command away. Windows ships with neither. To get Python _or_ a C compiler they are going to need to install something like Cygwin anyway so they can be a dozen clicks away from installing what they need.

Admittedly I'm not much of a Windows user, but I don't see Python having any advantage there.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Porting my tools to C or C++

Post by Dwedit »

How about both... Pre-built EXE for windows users, and source code for people who need to maintain or modify the tools.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Porting my tools to C or C++

Post by rainwarrior »

slembcke wrote:Mac/Linux machines ship with Python out of the box
Mac still ships with only Python 2.7 so they might need to install Python 3 anyway. :P

In response to OP though, honestly I don't think you'll get any meaningful benefit out of porting all your tools to C.

jroatch had a high performance need for a compressor, that's a good reason to rewrite something, but that situation probably doesn't apply to most of your python tools, as far as I've seen. (Even your DTE encoder problem could probably be sidestepped a million times more easily with a simple cached result.)


Installing Python 3 on Windows is pretty easy IMO, at least in 2019. A lot better than it used to be. Installing PIL, not quite so easy, but at least it's a well maintained library that PIP isn't going to barf on.


Personally I've had a few requests to offer EXE versions of python things I'd written and made freely available, and I'm usually more than happy to just say no.


Also there's the question of Python to EXE compilers: these have always been and are still terrible, as far as my experience goes trying them.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Porting my tools to C or C++

Post by tepples »

One thing I care about is ability to build the whole project quickly in a fresh folder when I add a new file to the build, to ensure I didn't forget to add an essential source code file to the repository. Sometimes even parallel make doesn't help when one long-running task, such as compressing all of a game's dialogue, dominates the Gantt chart of a build process.
gauauu wrote:Are your collaborators complaining about tool speed or difficulty with python?
Both: some one, others the other. In particular, koitsu (though not a collaborator on any of my current projects) falls into the latter camp, having complained of difficulty using a program that was developed against a different version of Python than is installed. (Search for coiler by author koitsu for examples.) The switch from Python 2 to Python 3 was especially troublesome.
gauauu wrote:The standard windows way to do it is to provide an exe.
But how would the executables be distributed to my collaborators? I thought it was standard practice not to check executables into a version control repository. Is it standard to distribute them separately? If so, how is it standard to ensure that the version of the executable matches the version of the source code, in case a change to the tool changes the format of the tool's output and a corresponding change to the source of the game engine expects the changed format?
slembcke wrote:To get Python _or_ a C compiler they are going to need to install something like Cygwin anyway
The steps that I currently supply to my collaborators install Git Bash, ezwinports Make, and Python.org Python. To what extent is this "something like Cygwin"? I'm asking sincerely, as I don't know to what extent these incorporate Cygwin.

I guess I could cross-compile the Windows version on my GNU/Linux box. That would need three things:
  1. Instructions to build a MinGW-w64 cross-compiler under GNU/Linux or Darwin (or should I use the Windows native compiler in Wine? or is Ubuntu's mingw-w64 package OK?)
  2. Confidence that an executable that works as expected in Wine will work as expected in Windows as well
  3. Means to release the executables in sync with the source code
  4. Again, suggestions for libraries to load and save images
User avatar
gauauu
Posts: 779
Joined: Sat Jan 09, 2016 9:21 pm
Location: Central Illinois, USA
Contact:

Re: Porting my tools to C or C++

Post by gauauu »

tepples wrote:One thing I care about is ability to build the whole project quickly in a fresh folder when I add a new file to the build, to ensure I didn't forget to add an essential source code file to the repository. Sometimes even parallel make doesn't help when one long-running task, such as compressing all of a game's dialogue, dominates the Gantt chart of a build process.
I have a personal jenkins server that rebuilds fresh copies of my whole project after I push. It may be overkill, but it has saved me from making that mistake with forgetting to add a file to the repo.
gauauu wrote:Are your collaborators complaining about tool speed or difficulty with python?
Both: some one, others the other. In particular, koitsu (though not a collaborator on any of my current projects) falls into the latter camp, having complained of difficulty using a program that was developed against a different version of Python than is installed. (Search for coiler by author koitsu for examples.) The switch from Python 2 to Python 3 was especially troublesome.
If it were me, I'd limit my efforts to my current collaborators. Don't spend a ton of time on a situation that MIGHT happen.
gauauu wrote:The standard windows way to do it is to provide an exe.
But how would the executables be distributed to my collaborators? I thought it was standard practice not to check executables into a version control repository. Is it standard to distribute them separately? If so, how is it standard to ensure that the version of the executable matches the version of the source code, in case a change to the tool changes the format of the tool's output and a corresponding change to the source of the game engine expects the changed format?
As far as I can tell, the standard is to have a build machine that auto-compiles the windows version from the repo source, and makes that available. Or just build it and post on the web somewhere.

But again, it all depends on who your collaborators are. Which is why I suggest that it's important to know who you're making this for, and to know what you're really trying to accomplish.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Porting my tools to C or C++

Post by rainwarrior »

tepples wrote:Sometimes even parallel make doesn't help when one long-running task, such as compressing all of a game's dialogue, dominates the Gantt chart of a build process.
I wanna echo the "who is this for" question. If you're having an actual performance problem with your build tool on a project that you're working on, you really don't need us to help you make a decision for that.

Instead, let's not be vague. You've published a lot of python script tools. Some are being actively used by people, some are not. There's no reason for you to undertake a project to convert them all to another language, where they'll do the exact same thing. It's a waste of your time that could be better spent making real improvements, or making new things, or otherwise just enjoying life.

Whatever language you choose, if enough people try to use it, there's going to be someone that complains that it's not their favourite language. That's just something you have to accept. If you'd written a tool in C maybe Koitsu would have had an easier time to run it that one time, but some other time someone else would have complained that it wasn't in Python. :P (To be fair, Koitsu even tried to run it, and I'm sure the problem that came up for him could have been easily addressed within Python.)

If you wrote (past tense) a compressor tool in Python and it was slow, the question of whether you should rewrite it comes down to who needs it to be faster, how much, and why. All of that is critical to making a decision, there is no blanket answer anyone here could give to this question when posed vaguely and generically. Only you know the specifics, and all I can advise is that you really shouldn't spend 6 hours rewriting something for 5 people to save 20 seconds each over the next few years.


If you're concerned about Python as your default language going forward... IMO it's pretty fine on Windows these days. These are some things I sometimes try to do with my own releases if I think they'll help:
  • Make a point of saying Python 3 instead of just Python.
  • Start a script with a hashbang and assert for Python 3.
  • Avoid using any additional libraries if I can. If I can't, try to only use something well supported like PIL.
Though really I think the whole Python schism problem is kinda dying out at this point in time. It doesn't come up much at all for me anymore.


And tepples, your dedication to helping people do mundane things like learn how to use a command line, or install and set up a compiler, is pretty remarkable and commendable, but I don't think you should treat every single program you write as if it's someone's first programming experience ever. Every program has a context, and expected prerequisites. Anything that enough people try to use will find a user that doesn't understand the prerequisites, but you can only do so much to prevent that upstream.

As far as I'm concerned, since almost everything of yours is offered free and open source, you've got no obligation to make any of them easier to enter than they are. Most of the ones I've tried are OK to get into, and honestly most of them are of such a niche purpose that a change of language isn't really going to have much affect.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Porting my tools to C or C++

Post by rainwarrior »

Oh yeah, I also forgot, I heard about Numba as an "easy" solution to just make Python code faster.

http://numba.pydata.org/

Unfortunately my experience with it so far has been that it works fine on Linux, and doesn't work at all on Windows. Might be better today (it says it works on the website), or might not, but it might be an option for someone looking to improve Python performance without a lot of extra work.

Edit: Tried it again today, and it installed fine through PIP in Windows. Tried it on a fractal rendering I'd recently rewritten from Python to C++ for performance. Numba gave about 10x speedup. C++ was more like 100x. So... at least in this case Numba isn't as good as the rewrite, but it's still a pretty huge boost for very little extra work.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Porting my tools to C or C++

Post by koitsu »

tepples wrote:But how would the executables be distributed to my collaborators? I thought it was standard practice not to check executables into a version control repository. Is it standard to distribute them separately? If so, how is it standard to ensure that the version of the executable matches the version of the source code, in case a change to the tool changes the format of the tool's output and a corresponding change to the source of the game engine expects the changed format?
GitHub will contact you and scream quite loudly if you check humongous binaries (of any sort; see: several hundred megabytes, or huge sums of smaller sized binaries) into your source repository directly (I speak from experience about this, working at a 120+ employee job where that was done, with a high-end GitHub account). That isn't how you do it.

How you do it is through GitHub's Releases tab/page. There are programs a thousand times larger than yours released this way -- OBS Studio is the first that comes to my mind. GitHub talks about it. You can tie this in to a CI/CD (continous integration / continual deployment) infrastructure service like Travis CI which connects directly with GitHub to do this. That way, you aren't storing binary releases in your GitHub source repo directly. Travis CI is not the only one; for example, Sour uses AppVeyor. Heroku might be another possibility. Generally speaking people tend to use CI when they integrate some form of unit testing performed on a commit (that's what these status icons ("Build Passing") represent -- you can see for OBS they use both Travis CI and AppVeyor). Or you can do it yourself using shells scripts and Python and Wine or whatever.

You also, obviously, do not have to use GitHub if you prefer a different VCS (GitLab, BitBucket, etc.).

I mirror calima's sentiments about Windows users, of which I am one: give us a standalone binary if possible; if not, give us binary + DLLs + whatever in a zip file that can be extracted into a single directory and run directly from there (i.e. "standalone"). Nobody wants to deal with installing PLs and other crap just to run whatever this program is, I assure you.

Edit: words.
Last edited by koitsu on Mon Feb 18, 2019 8:58 pm, edited 1 time in total.
User avatar
Banshaku
Posts: 2417
Joined: Tue Jun 24, 2008 8:38 pm
Location: Japan
Contact:

Re: Porting my tools to C or C++

Post by Banshaku »

After having to stop working on my project for a long period of time, what I learned that it's not worth rewriting something unless the gain is substantial. Most of the time, you just end up playing in your sandbox with your toys without accomplishing anything new :lol:
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Porting my tools to C or C++

Post by rainwarrior »

calima wrote:You're wasting your time trying to cater to Windows users with source. They expect exes, so build them an exe via mingw.
koitsu wrote:I mirror calima's sentiments about Windows users, of which I am one: give us a standalone binary if possible; if not, give us binary + DLLs + whatever in a zip file that can be extracted into a single directory and run directly from there (i.e. "standalone"). Nobody wants to deal with installing PLs and other crap just to run whatever this program is, I assure you.
I'd agree with this for any software that's intended for widespread use. A portable EXE with all of its dependencies already included is usually the ideal way to receive a program.

Part of the reason this is practical is just how good Windows has been about backward compatibility. Binaries have a much longer shelf life there than in Linux or Mac. (Mac users will need binaries too, but they're even more of a hassle to prepare.)

It's also why I've advocated that any commercial modern NES release needs a stand-alone executable, and not just a ROM. Assuming someone has or can install an emulator is a pretty big filter.


However... and I can't stress this enough, the "who is this for" question is relevant here. Tepples has made a lot of tools with a lot of different purposes, and they don't deserve a one-size-fits-all rule. Most of them are not for general widespread use. A lot of them are very specific niche tools for NES developers.

If this is a tool for software development, or for a software developer, there's usually not a huge barrier to installing a widely used programming language, and in a lot of cases they'll already have it ready to use. It's easier for the individual not to have to install anything, but sometimes there's a pretty good trade for the developer to just require someone to have a thing that makes developing the software easier. This is especially true when you're providing it for free. Providing nicely packaged releases is extra work which should be undertaken for a clear purpose.

Like for example you shouldn't need to package an NES emulator, or instructions for how to find and set up your first NES emulator on 6 different platforms with releases for the NESDev compo. There is context to everything, and there's always things that are reasonable to expect as a prerequisite.

...and maybe also just consider that sometimes it's just a good filter to not try to pick up every possible user. Some software, even free software, should be a bit exclusive, and should require some prerequisite learning or understanding before it's accessible to a person. If someone who isn't ready yet is keen enough to try, they might even learn something more broadly useful in the process of preparing to use your tool.


For an example, in my own work, something like NSFPlay absolutely gets a binary release, because it's made for people who just want to listen to NES music. That's who it's for and why I maintain it. On the other hand, my EZNSF tool for building an .NES ROM album out of an .NSF music file has a python prerequisite, and I'll never change that. It's intended user is someone who's willing to work hard enough to produce an NES ROM, and if they're not willing to meet that minimum bar I'm not interested in working hard to lower it for them. It was much easier to write in Python, and bending over backwards to provide binaries, or porting to C++ or whatever would have been a detriment to the project: wasted time that would have been better spent improving its actual function (or doing just about anything else). It's also much easier to customize and modify the project because it doesn't have a binary form and the source is directly accessible. I think there are many good reasons to go either way for different projects.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Porting my tools to C or C++

Post by Oziphantom »

For source on Windows, make it compile with VS. Installing a bunch of linux stuff to compile a 2 file project is more hassle than it is worth. If one can just drop the files into a command line project and hit build, 10000% better.

Maybe go for C#, it has nice parallel support and its async await system might be handy for you as well. It has a bunch of built in image tools as part of the .net framework and there is mono for mac and linux people. Its also really really nice to code in, compiles and is fast.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Porting my tools to C or C++

Post by tepples »

Let me explain some of the "who is this for":

NES homebrew community in general

I have uploaded several of my retro console projects as public repositories on GitHub. Building these requires GNU Coreutils, GNU Make, Python, Pillow, and ca65. Python and Pillow are used for image conversion and data compression. On Windows, I have recommended obtaining Coreutils via Git Bash and Make via ezwinports, through an installation procedure that I fine-tuned with the help of Greg Caldwell at Retrotainment Games.

Just because you have installed a toolchain to cross-assemble NES binaries doesn't mean you have installed one to native-assemble PC binaries. I had hoped to avoid requiring Visual Studio, as its multi-gigabyte download size could cause someone behind a harshly capped Internet connection, such as satellite or cellular, to run up a substantial data transfer overage bill, angering the head of household who has to pay the ISP bill. In several forums or chat communities to which I have belonged for more than a year, I've seen at least one member mention being behind dial-up or satellite because neither cable nor fiber nor DSL serves his or her address: R*** N**** (dial-up), T**H****S******* (satellite), and our own Rahsennor (satellite). I'll admit that since Python 3.4 bundled pip, the "coiler" compatibility issues that koitsu mentioned in the past have become much less apparent.

Retrotainment Games

I am lead programmer for several projects by Retrotainment Games, a video game developer owned by the used game store Cash-In Culture based out of Pennsylvania. Each game has a repository containing assembly language source code and the Python source code of the data conversion tools. Many of these tools are released in my public repositories; others have been cleared for release but not yet released; still others are private because they are specific to a game.

I feel comfortable explaining the following work process publicly because it could be reasonably inferred from the data structures of the Haunted games and the NES projects in my public repositories.

When an artist wants to add a level to a game, he pulls the repository, adds a PNG file of the background, adds a text file containing collision rectangles, enemy positions, and exit positions, and edits the master level list to add the map's background PNG filename, background palette, and collision map filename. Then he tests the level by building the game on his computer. The game's makefile runs a conversion program written in Python on the PNG file, or on all PNG files if the repository was freshly cloned or if the format of compressed background has changed. Converting a background a dozen screens long can take several seconds because it has to find the optimal subpalette for each 16x16-pixel piece of the background. The main Mall map in HH85, for instance, is roughly 24 screens long. Conversion will fail when the background violates some constraint of the engine, such as having only 256 unique tiles' visibility ranges cross a given 264x240-pixel window. If this happens, the software produces an error report, and the artist must edit the background to fix the violations and re-run the build process on any backgrounds in violation.

When an enemy designer wants to change speed, flight pattern, or other build-time constants or data tables for an enemy that I have coded, he pulls the repository, edits whatever constants are needed, and builds the game to test it. Again, if he hasn't built the game in a while, the background images must be reconverted.

When a writer wants to change a dialogue cue, he edits the text and places a dialogue trigger in the level. A text compression program written in Python compresses the entire script, and currently the compression takes several seconds. I have yet to thoroughly investigate whether this is a genuine algorithmic issue or just dynamic language overhead.

If I were to rewrite the most time-consuming among these tools in a compiled language, I'd probably need to make two kinds of artifact: a release of the game itself and a release of just the compiled version of the data conversion tools. Releases to the latter would generally come between releases of the former. Do binary release platforms generally support this? Or do they require all artifacts to be built from the same tag? Would it be better to make two separate repositories for each game: one for the game's source code and assets and one for its asset conversion tools?

In fact, I'm already facing this problem with ft2pently, a tool by NovaSquirrel to convert a FamiTracker text export to a Pently score. ft2pently is written in C and must be compiled before use. A composer who prefers to work in FamiTracker and prefers to keep a FamiTracker module would need a compiled copy of ft2pently around. I suspect that inability to instantly hear the Pently conversion of a module is why at least one Retrotainment project stuck with what is now called FamiTone4.

Eventually, the public

I may want to rewrite the Action 53 builder to allow a more drag-and-drop procedure to build a collection. This will require writing a GUI app for PC. The same might be true if I decide to make a game with a level editor.

At times, I've tried to develop PC games, distributing Windows binaries built with MinGW and source code. I've occasionally received complaints that they do not build on GNU/Linux or macOS, or if they do build, they don't run or lack sound or joystick support or whatever. This was before I switched from Windows to GNU/Linux as a daily driver, and I still don't own a sufficiently recent Mac on which to test. If I ever get back into PC game development, I'll need a way to reach PC platforms other than my own.
Post Reply