OL: infobase memory leaks fixed

We found a memory/threads leak on ol-home related to the fastcgi library used by the infobase process.

threads-week

as you can see: restarting infobase restores the number of threads.

 

With Sam Stoller we used a gdb script to print the python stacktrace.

Thread 1 (Thread 0x7fb5e3c5c740 (LWP 23820)):
#3 Frame 0xc40860, for file /opt/openlibrary/venv/local/lib/python2.7/site-packages/flup/server/threadedserver.py, line 76, in run (self=<WSGIServer(_appLock=, multiprocess=False, _umask=None, roles=(1,), _hupReceived=False, _connectionClass=, _jobClass=, _threadPool=<ThreadPool(_workQueue=[
], _lock=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=, _RLock__count=0) at remote 0x7fb5dcc2d5d0>, a
cquire=, _is_owned=, _release_save=, release=, _acquire_restore=, _Verbose__verbose=False, _Condition__waiters=[, , , , multiprocess=False, _umask=None, roles=(1,), _hupReceived=False, _connectionClass=, _jobClass=, _threadPool=<ThreadPool(_workQueue=[],
 _lock=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=, _RLock__count=0) at remote 0x7fb5dcc2d5d0>, acq
uire=, _is_owned=, _release_save=, release=, _acquire_restore=, _Verbose__verbose=False, _Condition__waiters=[, , , , addr=('0.0.0.0', 7050),
 flups=)
    return flups.WSGIServer(func, multiplexed=True, bindAddress=addr).run()
#18 Frame 0x7fb5e17cc830, for file /opt/openlibrary/venv/local/lib/python2.7/site-packages/web/wsgi.py, line 42, in runwsgi (func=, args=['7050'])
    return runfcgi(func, validaddr(args[0]))
#21 Frame 0x7fb5dcc1ab00, for file /opt/openlibrary/venv/local/lib/python2.7/site-packages/web/application.py, line 313, in run (self=<application(fvars={'from_json': , 'things': , 'reindex': , 'get_data': , 'seq': , 'app': , 'echo': , 'load_config': , 'logreader': , 'new_key': , 'get_many': , 'to_int': , 'readlog': , 'web': , 'update_config': , 'setup_remoteip': , 'cache': , '__package__': 'infogami.infobase', 'write': , 'start':...(truncated)
    return wsgi.runwsgi(self.wsgifunc(*middleware))
#25 Frame 0x7fb5dcc1e1f8, for file /opt/openlibrary/deploys/openlibrary/6b2cc05/infogami/infobase/server.py, line 615, in run ()
    app.run()
#28 Frame 0x7fb5e18f3cc8, for file /opt/openlibrary/deploys/openlibrary/6b2cc05/infogami/infobase/server.py, line 639, in start (config_file='/olsystem/etc/infobase.yml', args=('fastcgi', '7050'))
    run()
#33 Frame 0x7fb5e1c56230, for file /opt/openlibrary/openlibrary/scripts/infobase-server, line 32, in main (args=['/olsystem/etc/infobase.yml', 'fastcgi', '7050'], server=)
    server.start(*args)
#36 Frame 0x7fb5e3bb8208, for file /opt/openlibrary/openlibrary/scripts/infobase-server, line 61, in  ()
    main(sys.argv[1:])

It looks like there is a deadlock related to multiplexed=True of the wsgi server, as defined in /opt/openlibrary/venv/local/lib/python2.7/site-packages/web/wsgi.py line 17.

We found also this interesting note about the flup multiplexed.

Anand Chitipothu fixed it switching the flup fastcgi infobase server to a multiplexed=False with the patch: https://github.com/internetarchive/openlibrary/pull/234

“The web.py runfcgi is using multiplexed=True option for fastcgi server and that seem to cause some memory leaks. Using a variant of runfcgi that sets multiplexed=False.”

This fixed the leaks problem:

threads-day