Updates from August, 2015 Toggle Comment Threads | Keyboard Shortcuts

  • gio 10:08 pm on August 3, 2015 Permalink
    Tags: index, , reindexing, ,   

    OL: reindexing the Solr search 

    To reindex the OpenLibrary’s Solr search index making it consistent with the db data we can use the script:


    here you can find the source.

    Here same basic usage:

    /olsystem/bin/olenv python /opt/openlibrary/openlibrary/scripts/ol-solr-indexer.py --config /olsystem/etc/openlibrary.yml --bookmark ol-solr-indexer.bookmark --backward --days 2

    /olsystem/bin/olenv is the script to load the right virtualenv.
    --config /olsystem/etc/openlibrary.yml is the OpenLibrary yml configuration file.
    --bookmark ol-solr-indexer.bookmark is the location of the last scan timestamp (YYYY-MM-DD hh:mm:ss) bookmarked.
    --backward / --forward the direction to do the reindexing
    --days the number of days to reindex.

    the script can run in daemon mode. At the moment we still using the new-solr-updater for the partial updates…

  • gio 10:36 pm on July 17, 2015 Permalink
    Tags: , ,   

    SOLR: commits and optimize 

    • Commit: When you are indexing documents to solr none of the changes you are making will appear until you run the commit command. So timing when to run the commit command really depends on the speed at which you want the changes to appear on your site through the search engine. However it is a heavy operation and so should be done in batches not after every update.

      curl 'http://<SOLR_INSTANCE_URL>/update?commit=true'

    • Optimize: This is similar to a defrag command on a hard drive. It will reorganize the index into segments (increasing search speed) and remove any deleted (replaced) documents. Solr is a read only data store so every time you index a document it will mark the old document as deleted and then create a brand new document to replace the deleted one. Optimize will remove these deleted documents. You can see the search document vs. deleted document count by going to the Solr Statistics page and looking at the numDocs vs. maxDocs numbers. The difference between the two numbers is the amount of deleted (non-search able) documents in the index.

      curl 'http://<SOLR_INSTANCE_URL>/update?optimize=true'

      Also Optimize builds a whole NEW index from the old one and then switches to the new index when complete. Therefore the command requires double the space to perform the action. So you will need to make sure that the size of your index does not exceed %50 of your available hard drive space. (This is a rule of thumb, it usually needs less then %50 because of deleted documents)

    (source: http://stackoverflow.com/a/3737972)

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc