11

Assume I've got some really big Python class that might consume a fair amount of memory. The class has some method that is responsible for cleaning up some things when the interpreter exits, and it gets registered with the atexit module:

import atexit
import os

class ReallyBigClass(object):
    def __init__(self, cache_file):
        self.cache_file = open(cache_file)
        self.data = <some large chunk of data>
        atexit.register(self.cleanup)

    <insert other methods for manipulating self.data>

    def cleanup(self):
        os.remove(self.cache_file)

Various instances of this class might come and go throughout the life of the program. My questions are:

Is registering the instance method with atexit safe if I, say, del all my other references to the instance? In other words, does atexit.register() increment the reference counter in the same way as traditional binding would? If so, does the entire class instance now have to hang around in memory and wait until exit because one of its methods has been registered with atexit, or can portions of the instance be garbage collected? What would be the preferred way to structure such a cleanup at exit for transient class instances like this so that garbage collection can happen effectively?

Jed
  • 1,011
  • 1
  • 9
  • 15

2 Answers2

21

Registering an instance method with atexit makes the whole class instance persist until the interpreter exits. The solution is to decouple any functions that are registered with atexit from the class. Then the instances can be successfully garbage collected. For example,

import atexit
import os
import gc
import random

class BigClass1(object):
    """atexit function tied to instance method"""
    def __init__(self, cache_filename):
        self.cache_filename = cache_filename
        self.cache_file = open(cache_filename, 'wb')
        self.data = [random.random() for i in range(10000000)]
        atexit.register(self.cleanup)

    def cleanup(self):
        self.cache_file.close()
        os.remove(self.cache_filename)

class BigClass2(object):
    def __init__(self, cache_filename):
        """atexit function decoupled from instance"""
        self.cache_filename = cache_filename
        cache_file = open(cache_filename, 'wb')
        self.cache_file = cache_file
        self.data = [random.random() for i in range(10000000)]
        
        def cleanup():
            cache_file.close()
            os.remove(cache_filename)

        atexit.register(cleanup)

if __name__ == "__main__":
    import pdb; pdb.set_trace()

    big_data1 = BigClass1('cache_file1')
    del big_data1
    # When you reach this point, check process memory
    # before running the garbage collection below.
    gc.collect()
    # Now check process memory again. Memory usage will
    # be same as before the garbage collection call, indicating
    # that something still holds a reference to the class that
    # big_data1 used to reference.

    big_data2 = BigClass2('cache_file2')
    del big_data2
    # When you reach this point, check process memory
    # before running the garbage collection below.
    gc.collect()
    # Now check process memory again. Memory usage will
    # have dropped, indicating that the class instance that
    # big_data2 used to reference has been
    # successfully garbage collected.

Stepping through this line by line and monitoring the process memory shows that memory consumed by big_data1 is held until the interpreter exits while big_data2 is successfully garbage collected after del. Running each test case alone (comment out the other test case) provides the same results.

Jed
  • 1,011
  • 1
  • 9
  • 15
  • This doesn't hold for Python3. I tested in Python 3.9 with the script, and turns out in both classes the cleanup functions registered with `atexit` are run after the interpreter exits. – Flamingo Jan 23 '23 at 21:51
  • 1
    @Flamingo, I just tested Python 3.9.13, and the conditions I described in my answer still hold. It is true that in both cases the `atexit` method is called when the interpreter exists, and that is as designed. The essential issue is that registering an instance method with `atexit` makes it impossible for the interpreter to free the memory used by `big_data1` until the interpreter exits. Conversely, the `cleanup` closure (decoupled from the class instance) in 'big_data2` is registered with `atexit`, allowing the memory for `big_data2` to be garbage collected before the interpreter exits. – Jed Jan 25 '23 at 04:48
  • you are right, sorry I misunderstood it. I tried to change my downvote to upvote but it says "You last voted on this answer 2 days ago. Your vote is now locked in unless this answer is edited." It would be really helpful to add a comment after `gc.collect()` about if the object is gc collected or not. Thanks! – Flamingo Jan 26 '23 at 07:04
  • I've updated the example code to include comments about checking process memory before and after the calls to `gc.collect()`. – Jed Jan 27 '23 at 02:46
1

My imagination for atexit implementation is like below:

try:
    # Your whole program
finally:
    if sys.exitfunc:
        sys.exitfunc()

and module atexit just setup sys.exitfunc to own callback. So you don't need to worry about any disappeared objects due to any cleanup caused by interpreter shutdown.

NOTE 1: it's not specified what will happen if sys.exitfunc will call sys.exit(EXIT_CODE) but in my case 'EXIT_CODE' is returned

NOTE 2: atexit executes all registered functions sequentially and catch all exceptions, so it also hide sys.exit execution (as it just raise SystemExit exception)

ddzialak
  • 1,042
  • 7
  • 15