Memory management in PC programming languages

You can talk about almost anything that you want to on this board.

Moderator: Moderators

Post Reply
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Memory management in PC programming languages

Post by tepples »

A topic about avoiding Windows 10 drifted to the merits of various memory management paradigms: manual malloc/free (C), destructors/RAII (C++), reference counting (C++/CPython), tracing garbage collection (Java, C#, and Python for cycles), and scope guards ("finally" in Java, C#, and Python).
In [url=https://forums.nesdev.com/viewtopic.php?p=224408#p224408]this post[/url], Nicole wrote:Rust does not use garbage collection; its memory safety comes from static analysis at compile-time.
Is that anything like destructors/RAII in C++?
adam_smasher
Posts: 271
Joined: Sun Mar 27, 2011 10:49 am
Location: Victoria, BC

Re: Memory management in PC programming languages

Post by adam_smasher »

Thanks for the split, tepples.

Rust does have RAII, but its ownership system is something quite different.

The rough idea is that the current owner and lifetime of an object is tracked in the type system, and violations of memory safety are flagged as compile time errors. So like it RAII makes it impossible to forget to free an object, but it goes much further than that. You can't, say, return a pointer to a stack-allocated object; you can pass a pointer to a stack allocated object into a function, but only if it's only "borrowing" it; you can't, say, store it and retain a dangling pointer. Unless you explicitly turn off the borrow checker with an "unsafe" declaration, Rust code is guaranteed to be memory safe - it's impossible for a compiling program to crash or corrupt memory due to memory mismanagement.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Memory management in PC programming languages

Post by tepples »

Can a function in Rust return a self-destructing pointer to a heap-allocated object, which is automatically deallocated once the caller exits? C++ std::unique_ptr does this. It's like a std::shared_ptr, except the reference count is always in effect 1 so it has to be passed around as a move (change of ownership) rather than a copy.
niconii
Posts: 219
Joined: Sun Mar 27, 2016 7:56 pm

Re: Memory management in PC programming languages

Post by niconii »

tepples wrote:
In [url=https://forums.nesdev.com/viewtopic.php?p=224408#p224408]this post[/url], Nicole wrote:Rust does not use garbage collection; its memory safety comes from static analysis at compile-time.
Is that anything like destructors/RAII in C++?
Rust has that too, but I meant in the sense that things like memory corruption, use-after-free, iterator invalidation, etc. aren't possible in safe Rust (assuming no compiler bugs). The same ownership model Rust uses can even ensure thread safety, preventing all data races.

That said, Rust isn't magic. This comes with extra verbosity in code due to lifetime annotations (though in some cases they can be omitted), and there are some limitations regarding Rust's ownership system. For example, Rust has trouble with structs holding references to their own fields, because it doesn't know how to express the lifetime of that reference.

This doesn't mean there's no way to get around these limitations though, which is why I referred earlier to "safe Rust". Any code within an unsafe { ... } block can execute functions declared as unsafe fn, as well as dereference "raw pointers" (equivalent to C pointers). It doesn't simply disable the borrow checker (references are still checked), but you can explicitly cast from a reference to a raw pointer within an unsafe block. The main advantage here is that unsafe code is clearly marked; if by some chance memory corruption happens, the amount of code you have to look through for memory-safety bugs is reduced.
tepples wrote:Can a function in Rust return a self-destructing pointer to a heap-allocated object, which is automatically deallocated once the caller exits? C++ std::unique_ptr does this. It's like a std::shared_ptr, except the reference count is always in effect 1 so it has to be passed around as a move (change of ownership) rather than a copy.
Yes, Box<T> would be the equivalent in Rust. (In fact, it's the usual way to allocate something on the heap in Rust, aside from Vec<T>, the equivalent to std::vector<T> in C++.)

In Rust, values are actually moved by default. Types can be declared as copy-by-default instead if that's more appropriate, such as for primitive number types and immutable references.
Rahsennor
Posts: 479
Joined: Thu Aug 20, 2015 3:09 am

Re: Memory management in PC programming languages

Post by Rahsennor »

There are a bunch of languages that can do safe memory management without garbage collection these days. Most of them (including Rust) are based on linear logic, which like the other things koitsu mentioned before the split, isn't new - it dates back to 1987, and AFAICT first saw practical application in the early ninties.

I'm currently reading up on ATS. It can prove the safety of almost any use of memory, even things like strcpy(), entirely at compile time... but, well, just look at the example code. The syntax is a mess, the documentation is flaky and the compiler errors are numeric gibberish. It does work, but the learning curve is more of a cliff.

I hear the author is planning a redesign though so that may change in future.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Memory management in PC programming languages

Post by Oziphantom »

If you want a good guide on how to stuff up memory management, Apple paints a perfect Greek Tragedy ;)

Most of the issues come from trying to simplify C++. People have this idea that C++ is the evil harsh language (I sadly know a lot of Java Programmers), and hence its memory management system is dastardly. The reason for this is because the whole Heap/Stack and Pointer/Object thing that confuses so many people. So languages make it simpler, by making everything harder and hence we get Heap & Object only which makes life a lot more painful. But it makes it harder for people to stuff up. So then people get the idea that C++ doesn't have "one way to rule them all" and hence they have to think and it gets complicated from there.

1 hand I like how in C++ I can choose how something lives and dies, and now in C++11+ I can chose to not care if I want to ;)
Other hand, C# and I don't have to care at all and that is fine with me, mostly.

It all comes down to what you are trying to do and under what constraints you are trying to do it under. I have 16GB RAM so if my tiny application eats a few hundred KBs for 3 mins longer than it its needs too, I don't even notice. If I want some code to run on my A1000, they yes every byte lives and dies at the click of my finger, its the only way we survive.

I think Garbage Collection has a bad name. 1. Java used it, badly, and as the corporate world bought into the "Java makes coding faster" and applications moved to it, and this was back in the PIII days, GC would kill performance, gobble RAM and just make life miserable. 2.) Apple "its just better" used it ( they have recently banned it as of 2~3 OSX versions ago... ) and they did it badly, and hence Macs needed lots and lots of expensive RAM in order to do trivial things.
Some early parts of iOS were still using it internally, and that would suddenly cause your app to have a really long pause.. Sadly there was a watch dog timer and if your app didn't respond on the main thread for some amount of seconds, your app would get killed. So long task, then GC kicks in and BOOM crash... and people lost their work.. not fun.

Also GC on BASIC 2.0 on a C64 can be "fun and games", I think there was an sys you could call that would force collection, came in handy sometimes.

Basically Memory Management is the Captain America vs Iron Man Argument of the computing world.
Cap says Freedom for all aka manual memory management, do what you want.
Iron says We can't have liberty without restrictions and prevention is better, hence Garbage Collection and other 'helpers'
User avatar
slembcke
Posts: 172
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: Memory management in PC programming languages

Post by slembcke »

Oziphantom wrote:Apple "its just better" used it ( they have recently banned it as of 2~3 OSX versions ago... ) and they did it badly, and hence Macs needed lots and lots of expensive RAM in order to do trivial things.

...

Some early parts of iOS were still using it internally, and that would suddenly cause your app to have a really long pause..
Apple had their conservative GC for Obj-C, but almost nobody used it (like seriously, I've only ever had a dozen programs that did). It wasn't particularly slow or bad, but it had a couple of minor annoyances that the "old fashioned" reference counting stuff didn't. Automatic reference counting (ARC) is the replacement. It works really quite well, though it has a lot more CPU overhead than GC, but no pauses. You certainly don't want it enabled for hotspot functions if you care about performance.

iOS never, ever had the conservative GC enabled. That's part of the reason why so few people ever used it on Mac.

IMO, the biggest problem with GC in general in 2018 is that languages that use is rely on it too much, and the idiomatic way to do anything is to have a rat's nest of separate allocations and 64 bit references all over the place. The programming model basically becomes compute via cache miss. While it doesn't matter for a lot of simple editor programs, it's pretty bad for anything that needs performance. You can write decently high performance code in C# for example, but it's not going to look like how the "best practices" tell you do do it. In my experience, most programmers will refuse to write SoA code or use SIMD intrinsics. I guess that's job security for me though, since I'm often hired to fix slow OO code. ;)

Don't get me wrong though, GC is great in general. I just think best practices should encourage using it more coarsely.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Memory management in PC programming languages

Post by Oziphantom »

I thought the OS was using it. In that it had one GC to rule them all(about when Apple though Ruby was worth as much as actual Rubies), like way back in the day. If not what is its excuse???

ARC breaks the entire language though. In the beginning there was "alloc and release" then Apple added Ref counting and autorelease pools. Which became a balance nightmare, then they added ARC to try and fix it, broke the language in the process and then decided, stuff it lets just make a new language... then broke that 3 times ;)

Sadly I can't find the article but there was this "revelation" by a Java programmer that they had be using OO wrong all the time, and basically they described coding in the C function way, just will every thing being objects.. It was funny but I don't think they realized. C# has SIMD intrinsics? There is the System.Numerics namespace, or do you use a custom lib?
User avatar
slembcke
Posts: 172
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: Memory management in PC programming languages

Post by slembcke »

Oziphantom wrote:I thought the OS was using it. In that it had one GC to rule them all(about when Apple though Ruby was worth as much as actual Rubies), like way back in the day. If not what is its excuse???
It would be very much news to me if the OS ever used it internally. Ruby? When was Apple infatuated with Ruby? I mean they include it along with Python in the OS, but it's never been used in any Apple products I'm aware of.
Oziphantom wrote:ARC breaks the entire language though. In the beginning there was "alloc and release" then Apple added Ref counting and autorelease pools. Which became a balance nightmare, then they added ARC to try and fix it, broke the language in the process and then decided, stuff it lets just make a new language... then broke that 3 times ;)
??? Retain/release/autorelease has been the way to manage memory in the NextStep APIs long before Apple acquired Next. Unlike regular retain/release systems, autorelease made it so that there were like 4 simple rules that covered 99.9% of use cases. I'm not aware of any ways that ARC broke the language. The only difference was that it made calling retain/release/autorelease explicitly a compile error.
Oziphantom wrote:C# has SIMD intrinsics? There is the System.Numerics namespace, or do you use a custom lib?
Neither. On C#/Java projects sometimes I rewrite parts in C and link them in.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Memory management in PC programming languages

Post by Oziphantom »

2004~5. All the mac fanboys where I worked started buying Ruby books, when I asked why I, "Apple have said its a really good language" was the gist of what they said. I think it was for making those widget things, does OSX even still allow those?

Well Next is for all intents Apple, in that Apple is now basically just Next. So to be Pedantic Jobs Added it then ;)
For ARC to work, it needs to know what calls what and when and how. And the "great thing about OBJ-C is that its a Dynamic Language" and "We all know Dynamic Languages are better"..so you have a system that needs to Statically analyze a Dynamic Language... To this end, you can no longer ISA swizzle in Obj-C( I had a XML parser that did it, and I had to replace it in all my projects), also if you call objc_msgSend(.... ) it kind of has a fit. I think you can sign a lot of wavers and get it to let you do it, but it really doesn't like it. You basically need to treat it as a static language. And it makes it harder to throw Obj-C out the window and mix in c and c++, as soon as you want to pass an Obj-C pointer through things, you have to start signing wavers ;)
User avatar
slembcke
Posts: 172
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: Memory management in PC programming languages

Post by slembcke »

Oziphantom wrote:2004~5. All the mac fanboys where I worked started buying Ruby books, when I asked why I, "Apple have said its a really good language" was the gist of what they said
That's when I started learning Ruby too, but because I heard nice things from other devs. Only one of them was a Mac guy. (shrugs)
Oziphantom wrote: I think it was for making those widget things, does OSX even still allow those?
Dashboard Widgets? Yeah, they are still around, but they are basically HTML5 before it had a name. (It's the origin of the canvas element, and why it's API looks exactly like a simpler CoreGraphics API) I've never seen anyone use them on purpose though. lol
Oziphantom wrote:Well Next is for all intents Apple, in that Apple is now basically just Next. So to be Pedantic Jobs Added it then
So to be clear, we are talking about history that is almost as old as the NES. Good to know. ;)
Oziphantom wrote:For ARC to work, it needs to know what calls what and when and how. And the "great thing about OBJ-C is that its a Dynamic Language" and "We all know Dynamic Languages are better"..so you have a system that needs to Statically analyze a Dynamic Language...
Not sure where you are going with this. Dynamic lookup or not, every method call is explicit and is turned into a function call (which you seem to know already). It doesn't need to know the full call graph or it would require whole program optimization. It follows the same simple rules, and those don't require the programmer to memorize the call graph either.
Oziphantom wrote:And it makes it harder to throw Obj-C out the window and mix in c and c++, as soon as you want to pass an Obj-C pointer through things, you have to start signing wavers
??? C code doesn't have to be aware at all if ARC will or won't be used in another file. You call the exact same (public, documented) functions to retain/release objects. The even added the "bridging" functions that let you pass ownership between C and ARC within the same module. This was one of the biggest advantages to ARC vs the conservative GC is that it required no ABI changes. Existing, compiled, library code could be used without changes.

Where is the joke about "signing waivers" coming from? Everything you described is a public and documented API. I mean it's fine to dislike Apple, but there are plenty of real things they've done to be upset about without needing hyperbole. :p

Anyway, I think ARC is a good way to do GC on a memory or power constrained device. It doesn't waste memory like a GC does, but does spend more CPU time managing a lot of atomic counters. On a desktop machine where memory is plentiful and cheap, modern GC methods have been well proven to have pretty low overhead.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Memory management in PC programming languages

Post by Oziphantom »

Yeah Dashboard widgets, back then they loved them. Had their screens covered in weather and little pac man that moved around the screen chasing ghosts all kinds of random stuff. I mean it was something to do with it I guess other than use Adobe products.. they were programmers ;)

Well OBJ-C has two points of interest in its timeline, when it was made and before c++ came out (a year later? ) and when it was used for iOS dev ;) So from a Memory Management stand point there is the Original, and then what Next did to it then what Apple did to it. ;) Personally I think the AutoRelease made it harder, (unless they put auto) in the name of things. As now you have to remember or guess if this is or isn't auto and then act accordingly. I get its a hack to get around the no Stack objects, which is a pain, but I think it just made more pain. In that it was mostly fine, but then I would have to spend a couple of days tracking all the retain release for an object ( thankfully you could overwrite the retain release functions in real time ;) ) for a couple of days while I got them all 'just so' across the code base. Having a strict, if something is returned it needs to be released at some point would have made life easier I feel, especially given the large number of Java devs coming from j2me development who were not equipped with techniques to guard themselves from these issues. The Pools and the fact things only die when the pool cleans up makes it kinda of a GC, just one you have more control over. Also gives you bugs where something is fine because it got released to 0, but then something else would retain it before the pool cleaned it up, and then sometimes it would cleanup before the next retain :)

Its not explicit as you can change what it calls and the function that it calls at any point during execution.(i.e you can make it call MyThing or OtherThing, or you can make it so that MyThing actually points to OtherThing so when it calls MyThing it really calls OtherThing) Which ARC is not happy about, and tries to get you to keep it all static. You Sign wavers with ARC not Apple. I.e __bridge_retained (is I solely swear ARC that I will hold and keep this object, that I will not destroy it without your consent and promise it will have all the good bits in life) vs RC Hand pointer over, call release sometime no questions asked ;)

But yes I agree on the devices it is designed for ARC is a lot better idea than GC, and I think they also realised that Dynamic Languages are bad and Static all the way ;)
GC is just a hack to get the convenience of Stack objects back ;)
User avatar
slembcke
Posts: 172
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: Memory management in PC programming languages

Post by slembcke »

Autorelease isn't related to stack memory at all. It's so that you can return objects by reference that nothing owns except the autorelease pool, which will release it at a known point in the future. It's what makes the ownership rules simple compared to basic retain counting. You never own a reference to a returned object until you explicitly retain it. Other retain counted systems (like in CoreFoundation for example) need documentation on every function if the returned reference is passing ownership to the caller or not. It's impossible to have a consistent set of API level retain counting rules without autorelease, and I would strongly argue that having them makes it an order of magnitude easier to make a whole code base "just so". When auditing somebody else's code, it's possible to find memory errors without having to look up every function in the documentation. Try working with straight CoreFoundation or Glib for comparison. ;)
Oziphantom wrote:where something is fine because it got released to 0, but then something else would retain it before the pool cleaned it up
Again, that is describing exactly why autorelease pools exist. So that a value can exist for a short, but well defined time frame even if nothing owns it. I mean I get that you didn't like Obj-C, but you can't really claim that the short list of rules on the page I linked earlier are overwhelming.
Oziphantom wrote:GC is just a hack to get the convenience of Stack objects back ;)
Oh come now. :p
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Memory management in PC programming languages

Post by Oziphantom »

No it replicates the convenience of Stack Objects.

C

Code: Select all

{
  myStruct* pThing = callSomeFunc();
  ...
  free(pThing);
}

myStruct* callSomeFunc()
{
  return malloc(sizeof(myStruct));
}
not very convenient, and you may or may not need to free something
c++

Code: Select all

{
  myStruct thing = callSomeFunc();

  //do nothing as it will autmomatically die
}

myStruct callSomeFunc()
{
  return myStruct();
}
Convenient
Obj-C

Code: Select all

{
  myClass* pThing = [self callSomeFunc];

  [pThing release];
}

-(MyClass*)callSomeFunc:
{
  return [[MyClass alloc] init];
}
basically C
so we add autorelease pools
Obj-C

Code: Select all

{
  myClass* pThing = [self callSomeFunc];

  //do nothing it automagically dies, at some point in the future
}

-(MyClass*)callSomeFunc:
{
  return [[[MyClass alloc] init] autorelease];
}
Now its almost as convenient as C++

Only we now have the case where something may or may not autorelease something. And yes there are "guidelines" to follow that should keep you out of trouble, but as soon as 1 case breaks those guidelines..boom.

I think if we got rid of autorelease and just made it so everything had to be released no matter what, and if the object is "owned" by something else you just do return [thing retain]; It would have made it a lot simpler, and given the Java programmers a new strict rule to follow. There is still the slight issue with stack object and static strings that trip them up, but they kind of learn those eventually.

The Autorelease is not a well defined amount of time, its until the main loop exits and the pool's clean up function gets called, which when you have threads and share objects is not defined at all. Or when you have timers. Basically you are guaranteed to have it for your method and beyond that is a guess, blocks make this fun also ;) Unless you wrap things in your own auto-release pool to which they die when you say so.

ARC has a flaw in that it can't handle circular references, on APIs where everything is MVC they tend to happen a lot. Especially when none of the APIs have the all important "user data" field ;)
User avatar
slembcke
Posts: 172
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: Memory management in PC programming languages

Post by slembcke »

In your C example, you could have returned the struct by value as well. Assuming your C++ struct was also a basic POD type, both of them would have been trivially initialized and probably had copy elision applied. If you C++ struct was *not* a POD type (and in C++, there is no general way to know from a snippet like that) then the generated code would be very different. The copy constructor could have allocated some heap memory or some other more expensive resource. (Off topic, all the implicit magic that happens in C++ drives me nuts)
Oziphantom wrote:I think if we got rid of autorelease and just made it so everything had to be released no matter what, and if the object is "owned" by something else you just do return [thing retain]; It would have made it a lot simpler
Well... kind of, but I don't think I've ever seen a RC API work that way for a couple reasons AFAIK.
1) It does place a lot more burden on the programmer to make sure every object reference you receive is released, and that generally means a *lot* of them. Basic things like aFunction(anObject.aProperty) need to be split into several lines because that property reference passed ownership that you need to put in a variable so you can explicitly release it. Instead of an extra rule about how to treat references you don't own, you need to deal with the fact that you own every reference that you ever receive, and you need something like full on RAII in place in order to deal with disposing of them correctly in every case. That's a relatively huge burden on the programmer.
2) Atomic counters are really quite expensive. You really don't want to do it when it's not necessary. C++ shared_ptr is effectively implemented this way. Every time the pointer is copied or destroyed it has to do atomic ops. That's why it's so easy to overuse shared_ptr and make a program that spends more time retain counting that computing. ARC has this problem to some extent too since it's very pedantic, but autorelease allows it to make many fewer of them.

Auto release pools are drained on every event loop invocation, definitely not at the end of the main loop. If you make a raw pthread, you have to know to manage your own pool yes, but every other available thread API on Apple platforms does this for you.

Retain counting in general, including shared_ptr has problems with circular references. Both ARC and shared_ptr provide thread safe weak references though, so it's not exactly a fatal flaw, just one you have to be aware of. That's more of an argument for "proper" GC than it is about ARC.
Post Reply