Tagged: openlibrary Toggle Comment Threads | Keyboard Shortcuts

  • gio 6:48 pm on February 10, 2015 Permalink | Reply
    Tags: openlibrary   

    OL: deploying the code 

    To deploy the OL code:

    rkumar@ol-www1:~$ sudo -s
    root@ol-www1:/home/rkumar$ su openlibrary
    openlibrary@ol-www1:/home/rkumar$ . /opt/openlibrary/venv/bin/activate 
    (venv)openlibrary@ol-www1:/home/rkumar$ pip install fabric
    (venv)openlibrary@ol-www1:/home/rkumar$ /olsystem/bin/deploy-code openlibrary

    It is possible that may not work as fabric that we use is very old verison (cit. anand).

    Try it from ol-home.

    Fix it using:

    sudo -u openlibrary rsync -av rsync://ol-home/opt/openlibrary/venv /opt/openlibrary/


    /olsystem/bin/olenv pip install -U fab==1.1.2
  • gio 6:00 pm on February 10, 2015 Permalink | Reply
    Tags: openlibrary   

    OL: monitoring Nginx response time and response status codes with python and munin 

    To let Nginx to log properly the response time please see the blog article tracking app response time on nginx.

    We expect to have the /var/log/nginx/access.log formatted as:

    '$seed$remote_addr $host $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time'

    The log file lines look like: openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /search?q=The+Law&subject_facet=Law&person_facet=R.+A.+Ramsay HTTP/1.1" 200 7159 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0.660 openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /search?q=RICO&author_key=OL2687659A&subject_facet=In+library HTTP/1.1" 200 6895 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0.684 openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /search?sort=old&subject_facet=Protected+DAISY&language=mul&publisher_facet=Nelson+Doubleday&publisher_facet=Macmillan+and+co.%2C+limited&publisher_facet=Franklin+Library HTTP/1.1" 200 5860 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0.307 openlibrary.org - [10/Feb/2015:17:34:35 +0000] "GET /works/OL118971W HTTP/1.0" 301 0 "https://openlibrary.org/" "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)" 0.057

    Using a python script and a munin plugin we can plot this two graphs:

    • Nginx response time: plotting the requests/second rate splitting up in response time ranges.
    • Nginx response code: plotting the requests?second rate for every response code class.

    This script /usr/local/nglog/nglog.py runs in background reading the access.log line by line, sampling the request/second rate for different response time ranges and for the different response code classes. The values are saved in the files /tmp/nginx_request_status_stats and /tmp/nginx_request_time_stats.

    giovanni@ol-www1:/tmp$ cat /tmp/nginx_request_time_stats 
    T1.value 75.100000 
    T2.value 54.555556 
    T3.value 16.938889
    T4.value 1.711111 
    T5.value 2.933333
    giovanni@ol-www1:/tmp$ cat /tmp/nginx_request_status_stats 
    s2xx.value 123.600000 
    s3xx.value 19.755556 
    s4xx.value 7.883333 
    s5xx.value 0.000000

    When a status code 500 is found the script logs the request in the file /var/log/nginx/nglog.log
    The script is launched with the bash script /usr/local/nglog/nglogd.sh and it executes:

    tail -F /var/log/nginx/access.log| python /usr/local/nglog/nglog.py &

    To plot the results we use two ad-hoc munin plugins:

    case $1 in
            cat <<'EOM'
    graph_title NGINX response time
    graph_vlabel num_req /sec
    T1.label <= 0.5s
    T2.label >0.5 and <=1
    T3.label >1 and <=5
    T4.label >5 and <=10
    T5.label > 10
            exit 0;;


    case $1 in
            cat <<'EOM'
    graph_title NGINX response status code
    graph_vlabel req/sec
    s2xx.label 2xx
    s3xx.label 3xx
    s4xx.label 4xx
    s5xx.label 5xx
            exit 0;;
    • Oleg 7:55 pm on January 14, 2017 Permalink | Reply

      tell me plz, where I can find a script “nglog/nglog.py” (GitHub)

    • gio 6:59 pm on May 23, 2017 Permalink | Reply

      the script is not available online sorry. But it is just a Nginx’s log parser, with some counters.

  • gio 11:00 pm on February 4, 2015 Permalink | Reply
    Tags: openlibrary   

    OL: How to generate the sitemaps 

    First you need the last ol_dump file.
    This file is generated on ol-home using as source the dump file ol_dump.txt.gz. To know how to generate it please check the post: OL: how to generate the dump files

    :: Generate sitemaps on ol-home

    anand@ol-home:/1/var/tmp/sitemaps$ python /opt/openlibrary/openlibrary/scripts/2009/01/sitemaps/sitemap.py ../dumps/ol_dump_2015-01-31/ol_dump_2015-01-31.txt.gz
    Wed Feb  4 12:48:03 2015 writing sitemaps/sitemap_works_1700.xml.gz 39979
    Wed Feb  4 12:48:03 2015 writing sitemaps/sitemap_works_1701.xml.gz 39963
    Wed Feb  4 12:48:04 2015 writing sitemaps/sitemap_works_1702.xml.gz 39975
    Wed Feb  4 12:48:04 2015 writing sitemaps/sitemap_works_1703.xml.gz 39947
    Wed Feb  4 12:48:05 2015 writing sitemaps/sitemap_works_1704.xml.gz 39347
    Wed Feb  4 12:48:05 2015 writing sitemaps/sitemap_works_1705.xml.gz 39691
    Wed Feb  4 12:48:05 2015 writing sitemaps/sitemap_works_1706.xml.gz 38423
    Wed Feb  4 12:48:06 2015 writing sitemaps/sitemap_works_1707.xml.gz 30851
    Wed Feb  4 12:48:06 2015 writing sitemaps/sitemap_works_1708.xml.gz 3447
    Wed Feb  4 12:48:06 2015 writing sitemaps/siteindex.xml.gz 19891
    Wed Feb  4 12:48:06 2015 done

    :: Copy sitemaps to ol-www1

    anand@ol-www1:~$ sudo mkdir -p /1/var/lib/openlibrary/sitemaps
    anand@ol-www1:~$ sudo rsync -av rsync://ol-home/var_1/tmp/sitemaps/sitemaps /1/var/lib/openlibrary/sitemaps/
    sent 94,519 bytes  received 329,106,541 bytes  73,155,791.11 bytes/sec
    total size is 328,678,782  speedup is 1.00

    :: Verify sitemaps are available

    anand@ol-www1:~$ curl -I https://openlibrary.org/static/sitemaps/siteindex.xml.gz
    HTTP/1.1 200 OK
    Server: nginx/1.1.19
    Date: Wed, 04 Feb 2015 16:48:50 GMT
    Content-Type: text/plain
    Content-Length: 14689
    Last-Modified: Wed, 04 Feb 2015 12:48:06 GMT
    Connection: keep-alive
    Accept-Ranges: bytes
  • gio 8:21 pm on February 2, 2015 Permalink | Reply
    Tags: openlibrary   

    OL: Sitemap generation 

    To generate the Sitemap, execute this code on ol-home:

    python sitemaps.py ol_dump_works_latest.txt.gz

    the sitemaps.py is located at


    The last dump is available at: http://openlibrary.org/data/ol_dump_works_latest.txt.gz
    for more details you can see https://openlibrary.org/developers/dumps.

    After the sitemap is generated you need to place it in /1/var/lib/openlibrary/sitemaps
    as defined in /olsystem/etc/nginx/sites-available/openlibrary.conf on ol-www1.

        location ~ ^/static/(docs|tour|sitemaps|jsondumps|images/shelfview|sampledump.txt.gz)(/.*)?$ {
            root /1/var/lib/openlibrary/sitemaps;
            autoindex on;
            rewrite ^/static/(.*)$ /$1 break;

    Note: use /1/var/tmp on ol-home, then you can rsync it from rsync://ol-home/var_1/tmp/

  • gio 7:44 pm on February 2, 2015 Permalink | Reply
    Tags: openlibrary   

    OL tracking app response time on nginx 

    To track the response time let’s edit


    adding the $request_time to the log_format line:

        log_format iacombined '$seed$remote_addr $host $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time';

    Note: $request_time includes the time to complete sending the data to the client.

    To show the requests longer than 3 seconds:

    sudo tail -F /var/log/nginx/access.log -s 0.1  | perl -ne '$|=1; $_ =~ / ([^ ]+$)/; if ($1 > 3.0) {print;}'

    To show the requests longer than 3 second, hiding the 400 errors:

    sudo tail -F /var/log/nginx/access.log -s 0.1  | grep -v '"-" 40. 0' --line-buffered | perl -ne '$|=1; $_ =~ / ([^ ]+$)/; if ($1 > 3.0) {print;}'
  • gio 11:01 pm on January 27, 2015 Permalink | Reply
    Tags: openlibrary   

    OL: main backups location 

    :: ol-home

    The main backup files are located at

    • /1/postgres_backups/backups$

      contains the postgres database dumps for openlibrary and coverstore databases.

    • /1/postgres_backups/base_backups$

      is a copy of the postgres database directory. Very useful for restoring in case of crashes.

    • /1/postgres_backups/pg_xlog_archive$

      Postgres maintains Write-Ahead-Log (WAL) files. They are used in replication and for point-in-time recovery (restore the database to a timestamp).

    If there are issues about the generation of the backup files please be sure the folder /olsystem/etc/crond.d/pg-backups has permits 644.

  • gio 10:40 pm on January 20, 2015 Permalink | Reply
    Tags: openlibrary   

    OL: how to merge and deploy a pull request 

    First to freeze eventually local updates:

    git stash

    load locally the pull request code patch

    git checkout -b bmmcginty-overdriveSearchUrlChange master
    git pull https://github.com/bmmcginty/openlibrary.git overdriveSearchUrlChange

    mind to test the code and not just merge it from git website

    git checkout master
    git merge --no-ff bmmcginty-overdriveSearchUrlChange
    git push origin master   (pushing on my repo)
    git push archive master  (pushing on archive repo)

    now you can deploy the code

    to deploy openlibrary to all nodes:

    /olsystem/bin/deploy-code openlibrary

    to deploy olsystem to all nodes:

    /olsystem/bin/deploy-code olsystem
Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc