Updates from March, 2015 Toggle Comment Threads | Keyboard Shortcuts

  • gio 5:25 pm on March 26, 2015 Permalink | Reply
    Tags: stats   

    OL: graphs statistics 

    OL statistics are present in two different graphics:

    Screen Shot 2015-03-26 at 10.17.14 AM

    Screen Shot 2015-03-26 at 10.18.25 AM

    They are generated through two scripts:

    :: /opt/openlibrary/openlibrary/scripts/ipstats.py runs on ol-www1
    that create the graph:Screen Shot 2015-03-31 at 3.50.06 PM

    :: /opt/openlibrary/openlibrary/scripts/store_counts.py runs on ol-home
    To generate the count stats of the past n days, execute the command:

    giovanni@ol-home:~$ sudo -s
    root@ol-home:/home/giovanni# su openlibrary
    openlibrary@ol-home:/home/giovanni$ source /opt/openlibrary/venv/bin/activate
    (venv)openlibrary@ol-home:/home/giovanni$ cd /opt/openlibrary/openlibrary/scripts/

    and run the command:

    $ python store_counts.py /opt/openlibrary/olsystem/etc/infobase.yml /opt/openlibrary/olsystem/etc/openlibrary.yml /opt/openlibrary/olsystem/etc/coverstore.yml  n

    this creates the graphs: Screen Shot 2015-03-31 at 3.49.59 PM

    The scripts run as scheduled in the /etc/cron.d/openlibrary

    0 * * * * openlibrary /olsystem/bin/verify-node.sh ol-home && /olsystem/bin/olenv $SCRIPTS/store_counts.py /opt/openlibrary/olsystem/etc/infobase.yml /opt/openlibrary/olsystem/etc/openlibrary.yml /opt/openlibrary/olsystem/etc/coverstore.yml 1
    0 * * * * www-data /olsystem/bin/verify-node.sh ol-www1 && /olsystem/bin/olenv $SCRIPTS/ipstats.py  /opt/openlibrary/olsystem/etc/openlibrary.yml
    59 23 * * * openlibrary /olsystem/bin/verify-node.sh ol-home && /olsystem/bin/olenv $SCRIPTS/store_counts.py /opt/openlibrary/olsystem/etc/infobase.yml /opt/openlibrary/olsystem/etc/openlibrary.yml /opt/openlibrary/olsystem/etc/coverstore.yml 1
    59 23 * * * www-data /olsystem/bin/verify-node.sh ol-www1 && /olsystem/bin/olenv $SCRIPTS/ipstats.py  /opt/openlibrary/olsystem/etc/openlibrary.yml

    When, for some reason, the graphs are broken you have to run the scripts manually.
    Be careful running them within the right days window.

    See the code for the details:

  • gio 11:00 pm on March 11, 2015 Permalink | Reply

    OL: how to generate the dump files, step-by-step 

    Here the instructions about how to generate the ol_dump.txt.gz files.

    ol-home is the right place to do this.

    1 :: Dumping the data table from ol-db1
    this task requires around 1 hour to complete.

    giovanni@ol-home:/1/var/tmp$ psql -h ol-db1 -U openlibrary openlibrary -c "copy data to stdout" | gzip -c > data.txt.gz

    2 :: Activate the virtual environment /opt/openlibrary/venv

    giovanni@ol-home:/1/var/tmp$ source /opt/openlibrary/venv/bin/activate

    3 :: Generate the metadata table dump from archive db
    this task requires around 1 hour to complete.

    (venv)giovanni@ol-home:/1/var/tmp$ ARCHIVE_DB_PASSWORD=`/opt/.petabox/dbserver`
    (venv)giovanni@ol-home:/1/var/tmp$ python /opt/openlibrary/openlibrary/scripts/2012/dump-ia-items.py --host db-current --user archive --password $ARCHIVE_DB_PASSWORD --database archive | gzip -c > ia_metadata_dump_2015-03-11.txt.gz

    4 :: Generate the dump of all revisions of all documents.
    this task requires around 8 hours to complete.

    (venv)giovanni@ol-home:/1/var/tmp$ /opt/openlibrary/openlibrary/scripts/oldump.py cdump data.txt.gz 2015-03-11 | gzip -c > ol_cdump.txt.gz
    (venv)giovanni@ol-home:/1/var/tmp$ rm data.txt.gz

    5 :: Generate the dump of latest revisions of all documents.
    this task requires around 6 hours to complete.

    (venv)giovanni@ol-home:/1/var/tmp$ gzip -cd ol_cdump.txt.gz | python /opt/openlibrary/openlibrary/scripts/oldump.py sort --tmpdir /1/var/tmp | python /opt/openlibrary/openlibrary/scripts/oldump.py dump | gzip -c > ol_dump_2015-03-11.txt.gz
    (venv)giovanni@ol-home:/1/var/tmp$ rm -rf /1/var/tmp/oldumpsort

    6 :: Splitting the Dump into authors, editions, works, redirects

    (venv)giovanni@ol-home:/1/var/tmp$ gzip -cd ol_dump_2015-03-11.txt.gz | python /opt/openlibrary/openlibrary/scripts/oldump.py split --format ol_dump_%s_2015-03-11.txt.gz

    7 :: Generate the denormalized works Dump <<---- TO FIX: the script returns exceptions
    where each row contains a JSON document with the following fields:

    • work – The work documents
    • editions – List of editions that belong to this work
    • authors – All the authors of this work
    • ia – IA metadata for all the ia items referenced in the editions as a list
    • duplicates – dictionary of duplicates (key -> it’s duplicates) of work and edition docs mentioned above
    (venv)giovanni@ol-home:/1/var/tmp$ python /opt/openlibrary/openlibrary/scripts/2011/09/generate_deworks.py ol_dump_2015-03-11.txt.gz ia_metadata_dump_2015-03-11.txt.gz | gzip -c > ol_dump_deworks_2015-01-11.txt.gz
    (venv)giovanni@ol-home:/1/var/tmp$ ls
    ia_metadata_dump_2015-03-11.txt.gz  ol_dump_2015-03-11.txt.gz
    ol_dump_redirects_2015-03-11.txt.gz ol_dump_authors_2015-03-11.txt.gz
    ol_dump_deworks_2015-01-11.txt.gz   ol_dump_editions_2015-03-11.txt.gz
  • gio 7:12 pm on February 26, 2015 Permalink | Reply

    OL: infobase memory leaks fixed 

    We found a memory/threads leak on ol-home related to the fastcgi library used by the infobase process.


    as you can see: restarting infobase restores the number of threads.


    With Sam Stoller we used a gdb script to print the python stacktrace.

    Thread 1 (Thread 0x7fb5e3c5c740 (LWP 23820)):
    #3 Frame 0xc40860, for file /opt/openlibrary/venv/local/lib/python2.7/site-packages/flup/server/threadedserver.py, line 76, in run (self=&lt;WSGIServer(_appLock=, multiprocess=False, _umask=None, roles=(1,), _hupReceived=False, _connectionClass=, _jobClass=, _threadPool=&lt;ThreadPool(_workQueue=[
    ], _lock=&lt;_Condition(_Condition__lock=&lt;_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=, _RLock__count=0) at remote 0x7fb5dcc2d5d0&gt;, a
    cquire=, _is_owned=, _release_save=, release=, _acquire_restore=, _Verbose__verbose=False, _Condition__waiters=[, , , , multiprocess=False, _umask=None, roles=(1,), _hupReceived=False, _connectionClass=, _jobClass=, _threadPool=&lt;ThreadPool(_workQueue=[],
     _lock=&lt;_Condition(_Condition__lock=&lt;_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=, _RLock__count=0) at remote 0x7fb5dcc2d5d0&gt;, acq
    uire=, _is_owned=, _release_save=, release=, _acquire_restore=, _Verbose__verbose=False, _Condition__waiters=[, , , , addr=('', 7050),
        return flups.WSGIServer(func, multiplexed=True, bindAddress=addr).run()
    #18 Frame 0x7fb5e17cc830, for file /opt/openlibrary/venv/local/lib/python2.7/site-packages/web/wsgi.py, line 42, in runwsgi (func=, args=['7050'])
        return runfcgi(func, validaddr(args[0]))
    #21 Frame 0x7fb5dcc1ab00, for file /opt/openlibrary/venv/local/lib/python2.7/site-packages/web/application.py, line 313, in run (self=&lt;application(fvars={'from_json': , 'things': , 'reindex': , 'get_data': , 'seq': , 'app': , 'echo': , 'load_config': , 'logreader': , 'new_key': , 'get_many': , 'to_int': , 'readlog': , 'web': , 'update_config': , 'setup_remoteip': , 'cache': , '__package__': 'infogami.infobase', 'write': , 'start':...(truncated)
        return wsgi.runwsgi(self.wsgifunc(*middleware))
    #25 Frame 0x7fb5dcc1e1f8, for file /opt/openlibrary/deploys/openlibrary/6b2cc05/infogami/infobase/server.py, line 615, in run ()
    #28 Frame 0x7fb5e18f3cc8, for file /opt/openlibrary/deploys/openlibrary/6b2cc05/infogami/infobase/server.py, line 639, in start (config_file='/olsystem/etc/infobase.yml', args=('fastcgi', '7050'))
    #33 Frame 0x7fb5e1c56230, for file /opt/openlibrary/openlibrary/scripts/infobase-server, line 32, in main (args=['/olsystem/etc/infobase.yml', 'fastcgi', '7050'], server=)
    #36 Frame 0x7fb5e3bb8208, for file /opt/openlibrary/openlibrary/scripts/infobase-server, line 61, in  ()

    It looks like there is a deadlock related to multiplexed=True of the wsgi server, as defined in /opt/openlibrary/venv/local/lib/python2.7/site-packages/web/wsgi.py line 17.

    We found also this interesting note about the flup multiplexed.

    Anand Chitipothu fixed it switching the flup fastcgi infobase server to a multiplexed=False with the patch: https://github.com/internetarchive/openlibrary/pull/234

    “The web.py runfcgi is using multiplexed=True option for fastcgi server and that seem to cause some memory leaks. Using a variant of runfcgi that sets multiplexed=False.”

    This fixed the leaks problem:


    • internetarchive 7:22 pm on February 26, 2015 Permalink | Reply

      GREAT work, team. Thank you for hunting it down and getting it resolved!

  • gio 10:48 pm on February 25, 2015 Permalink | Reply

    OL: import new books from IA – importer 

    To import new books from IA you have to add IA identifiers using the OL page: https://openlibrary.org/admin/imports/add

    On ol-home must be running the process manage-imports.py:

    python scripts/manage-imports.py --config /olsystem/etc/openlibrary.yml import-all

    you can launch it with:

    sudo -u openlibrary /olsystem/bin/olenv HOME=/home/openlibrary OPENLIBRARY_RCFILE=/olsystem/etc/olrc-importbot python scripts/manage-imports.py --config /olsystem/etc/openlibrary.yml import-all >> /tmp/importer.log

    you can see the log at: /tmp/importer.log

  • gio 7:18 pm on February 19, 2015 Permalink | Reply

    OL: stats graph in homepage 

    The statistrics graph in the OL homepage are generated with the script:
    it works calling the code openlibrary.admin

  • gio 10:58 pm on February 11, 2015 Permalink | Reply

    OL: recent activities summary 

    Primary results:

    • Open Library has a more reliable and stable infrastructure.
    • It’s easier doing activities like monitoring, system management and diagnosis.
    • The github community can come back to fix bug and develop.

    I have to specially thank Raj, Sam, Anand and Andy for helping me in this process.


    == Cluster Reliability ==

    :- Tomcat configuration updated to better fit the OL infrastructure.
    :- Solr configuration updated to fit the other Archive solr instances.
    :- Added an haproxy to ol-solr2 helping tomcat to handle all the connections properly.
    :- Rebooted ol-solr2 on SSD.


    == Diagnostics ==

    :- Installed and configured Munin to produce some graphic reports.
    :- Coded a daemon and a munin plugin tracking the Response Time, and the Status Code response rates.
    :- Designed, coded, configured and installed an “one-page” dashboard to let us better monitoring the cluster status and where to find the main info related to OL: wiki, nagios, admin-center, lending stats, github, etc.
    :- Minor Nagios triggers updated, making the alarm more useful and effective.


    == Bugs Fixing ==

    :- Found and debugged a memory/threads leak on ol-home, the problem seems related to the original infogami/webpy code. I informed Anand, and I hope he will have time to answer and tell us how to solve this issue ASAP.
    :- Found an outage issue related to ARP and the load balancer DNS . Sam and Andy are working on it.
    :- Found and fixed an important issue on the deployment process.
    :- Fixed some management scripts that were not working properly.
    :- Fixed the backup scripts that were not working properly.
    :- Updating the underestimated disk space for backups on ol-home.
    Me and Andy we will finish this week and we’ll finally solve the annoying DISK-FULL monthly problem
    :- Fixed the sitemaps generation process. Finally now we have an updated sitemap, working correctly. This solve some problems we had with the google-bot.
    :- Learned how to recover from an ACS4-related OL outage.
    :- Minor log-rotation, disk full problem solved on ol-solr2.
    :- Fixed the issues with the Vagrant developing instance.


    == Security Upgrade ==

    :- Removed the SSLv3 protocol support from nginx, solving the POODLE vulnerability.


    == Documentation ==

    :- Updated the wiki page with all the NEW documentation we wrote during these activities.
    :- Updated the wiki page with some old documentation not deprecated yet.


    == Github and Developers ==

    :- Merged some old pull request from the community.
    :- General cleaning.

  • gio 6:48 pm on February 10, 2015 Permalink | Reply

    OL: deploying the code 

    To deploy the OL code:

    rkumar@ol-www1:~$ sudo -s
    root@ol-www1:/home/rkumar$ su openlibrary
    openlibrary@ol-www1:/home/rkumar$ . /opt/openlibrary/venv/bin/activate 
    (venv)openlibrary@ol-www1:/home/rkumar$ pip install fabric
    (venv)openlibrary@ol-www1:/home/rkumar$ /olsystem/bin/deploy-code openlibrary

    It is possible that may not work as fabric that we use is very old verison (cit. anand).

    Try it from ol-home.

    Fix it using:

    sudo -u openlibrary rsync -av rsync://ol-home/opt/openlibrary/venv /opt/openlibrary/


    /olsystem/bin/olenv pip install -U fab==1.1.2
  • gio 6:00 pm on February 10, 2015 Permalink | Reply

    OL: monitoring Nginx response time and response status codes with python and munin 

    To let Nginx to log properly the response time please see the blog article tracking app response time on nginx.

    We expect to have the /var/log/nginx/access.log formatted as:

    '$seed$remote_addr $host $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time'

    The log file lines look like: openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /search?q=The+Law&subject_facet=Law&person_facet=R.+A.+Ramsay HTTP/1.1" 200 7159 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0.660 openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /search?q=RICO&author_key=OL2687659A&subject_facet=In+library HTTP/1.1" 200 6895 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0.684 openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /search?sort=old&subject_facet=Protected+DAISY&language=mul&publisher_facet=Nelson+Doubleday&publisher_facet=Macmillan+and+co.%2C+limited&publisher_facet=Franklin+Library HTTP/1.1" 200 5860 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0.307 openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /works/OL118971W HTTP/1.0" 301 0 "https://openlibrary.org/" "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)" 0.057

    Using a python script and a munin plugin we can plot this two graphs:

    • Nginx response time: plotting the requests/second rate splitting up in response time ranges.
    • Nginx response code: plotting the requests?second rate for every response code class.

    This script /usr/local/nglog/nglog.py runs in background reading the access.log line by line, sampling the request/second rate for different response time ranges and for the different response code classes. The values are saved in the files /tmp/nginx_request_status_stats and /tmp/nginx_request_time_stats.

    giovanni@ol-www1:/tmp$ cat /tmp/nginx_request_time_stats 
    T1.value 75.100000 
    T2.value 54.555556 
    T3.value 16.938889
    T4.value 1.711111 
    T5.value 2.933333
    giovanni@ol-www1:/tmp$ cat /tmp/nginx_request_status_stats 
    s2xx.value 123.600000 
    s3xx.value 19.755556 
    s4xx.value 7.883333 
    s5xx.value 0.000000

    When a status code 500 is found the script logs the request in the file /var/log/nginx/nglog.log
    The script is launched with the bash script /usr/local/nglog/nglogd.sh and it executes:

    tail -F /var/log/nginx/access.log| python /usr/local/nglog/nglog.py &

    To plot the results we use two ad-hoc munin plugins:

    case $1 in
            cat <<'EOM'
    graph_title NGINX response time
    graph_vlabel num_req /sec
    T1.label <= 0.5s
    T2.label >0.5 and <=1
    T3.label >1 and <=5
    T4.label >5 and <=10
    T5.label > 10
            exit 0;;


    case $1 in
            cat <<'EOM'
    graph_title NGINX response status code
    graph_vlabel req/sec
    s2xx.label 2xx
    s3xx.label 3xx
    s4xx.label 4xx
    s5xx.label 5xx
            exit 0;;
    • Oleg 7:55 pm on January 14, 2017 Permalink | Reply

      tell me plz, where I can find a script “nglog/nglog.py” (GitHub)

    • gio 6:59 pm on May 23, 2017 Permalink | Reply

      the script is not available online sorry. But it is just a Nginx’s log parser, with some counters.

  • gio 11:00 pm on February 4, 2015 Permalink | Reply

    OL: How to generate the sitemaps 

    First you need the last ol_dump file.
    This file is generated on ol-home using as source the dump file ol_dump.txt.gz. To know how to generate it please check the post: OL: how to generate the dump files

    :: Generate sitemaps on ol-home

    anand@ol-home:/1/var/tmp/sitemaps$ python /opt/openlibrary/openlibrary/scripts/2009/01/sitemaps/sitemap.py ../dumps/ol_dump_2015-01-31/ol_dump_2015-01-31.txt.gz
    Wed Feb  4 12:48:03 2015 writing sitemaps/sitemap_works_1700.xml.gz 39979
    Wed Feb  4 12:48:03 2015 writing sitemaps/sitemap_works_1701.xml.gz 39963
    Wed Feb  4 12:48:04 2015 writing sitemaps/sitemap_works_1702.xml.gz 39975
    Wed Feb  4 12:48:04 2015 writing sitemaps/sitemap_works_1703.xml.gz 39947
    Wed Feb  4 12:48:05 2015 writing sitemaps/sitemap_works_1704.xml.gz 39347
    Wed Feb  4 12:48:05 2015 writing sitemaps/sitemap_works_1705.xml.gz 39691
    Wed Feb  4 12:48:05 2015 writing sitemaps/sitemap_works_1706.xml.gz 38423
    Wed Feb  4 12:48:06 2015 writing sitemaps/sitemap_works_1707.xml.gz 30851
    Wed Feb  4 12:48:06 2015 writing sitemaps/sitemap_works_1708.xml.gz 3447
    Wed Feb  4 12:48:06 2015 writing sitemaps/siteindex.xml.gz 19891
    Wed Feb  4 12:48:06 2015 done

    :: Copy sitemaps to ol-www1

    anand@ol-www1:~$ sudo mkdir -p /1/var/lib/openlibrary/sitemaps
    anand@ol-www1:~$ sudo rsync -av rsync://ol-home/var_1/tmp/sitemaps/sitemaps /1/var/lib/openlibrary/sitemaps/
    sent 94,519 bytes  received 329,106,541 bytes  73,155,791.11 bytes/sec
    total size is 328,678,782  speedup is 1.00

    :: Verify sitemaps are available

    anand@ol-www1:~$ curl -I https://openlibrary.org/static/sitemaps/siteindex.xml.gz
    HTTP/1.1 200 OK
    Server: nginx/1.1.19
    Date: Wed, 04 Feb 2015 16:48:50 GMT
    Content-Type: text/plain
    Content-Length: 14689
    Last-Modified: Wed, 04 Feb 2015 12:48:06 GMT
    Connection: keep-alive
    Accept-Ranges: bytes
  • gio 8:21 pm on February 2, 2015 Permalink | Reply

    OL: Sitemap generation 

    To generate the Sitemap, execute this code on ol-home:

    python sitemaps.py ol_dump_works_latest.txt.gz

    the sitemaps.py is located at


    The last dump is available at: http://openlibrary.org/data/ol_dump_works_latest.txt.gz
    for more details you can see https://openlibrary.org/developers/dumps.

    After the sitemap is generated you need to place it in /1/var/lib/openlibrary/sitemaps
    as defined in /olsystem/etc/nginx/sites-available/openlibrary.conf on ol-www1.

        location ~ ^/static/(docs|tour|sitemaps|jsondumps|images/shelfview|sampledump.txt.gz)(/.*)?$ {
            root /1/var/lib/openlibrary/sitemaps;
            autoindex on;
            rewrite ^/static/(.*)$ /$1 break;

    Note: use /1/var/tmp on ol-home, then you can rsync it from rsync://ol-home/var_1/tmp/

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc