In this tutorial, we'll see how to query an Apache Solr instance from a Django console in order to obtain a list of results and apply some filtering to it, using Dismax search. For this tutorial, I will assume that you are familiar with the following technologies and that you have them installed:

  • Django Framework
  • Django API
  • Apache Solr
  • Django Haystack module

I will also asume that your Django is able to access Solr via Haystack. If it isn't, make it work by reading Haystack documentation: http://haystacksearch.org/

For information, the platform I'll be using for this example is Openshift, as I have installed there everything needed. Now, let's start by studying a basic solr query output (retrieve all results):

/collection1/select?q=*%3A*&wt;=json&indent;=true

    {
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "indent":"true",
      "q":"*:*",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "company_name":"SC Something Ltd.",
        "tags":["pistols",
          "rifles",
          "guns"],
        "text":"Something someone somewhere sometime",
        "external_link":"http://something-somewhere.com",
        "_version_":1527471593000796160},
      {
        "company_name":"Miniprix Someone LLC.",
        "tags":["python",
          "dhtml",
          "javascript"],
        "text":"Lorem ipsum dolor something someone somewhere",
        "external_link":"http://miniprixter.com",
        "_version_":1527471593010233344}]
  }}

From what we can see, we have a set of two documents, each containing the following items:

  • company_name
  • tags
  • text
  • external_link

Let's start by querying Solr from Django's API: First, we open a django shell:

    python manage.py shell

Further, we import what we need: SearchQuerySet and AltParser (more about AltParser here http://django-haystack.readthedocs.org/en/latest/inputtypes.html#haystack.inputs.AltParser)

    >> import haystack
    >> from haystack.query import SearchQuerySet
    >> from haystack.inputs import AltParser

Now we are ready to query Solr. Let's list all results first:

    >> docs = SearchQuerySet()
    >> docs
    [<SearchResult: Anunturi.Firm (pk=u'3')>,
     <SearchResult: Anunturi.Firm (pk=u'4')>]

As we can see, the results are listed in order by pk, first 3, then 4.

Say we want to search for the word "someone" in all documents. We will indicate that we want to search in both text and company_name fields. We will also indicate that we need a minimum match of 1.

    >> docs = SearchQuerySet()
    >> filter = AltParser(('dismax','someone',qf='company_name text', mm=1))
    >> filtered_results = docs.filter(content=filter)
    >> filtered_results
    [<SearchResult: Anunturi.Firm (pk=u'4')>,
     <SearchResult: Anunturi.Firm (pk=u'3')>]

As we can notice, the order of the results is different now. First is number 4, then second is 3. The explanation is that the word "someone":

  • is only found in the company_name field of document no. 4
  • is found in the text field of both documents (no. 3 and no. 4)

Thus, document no. 4 has the searched word not only in the text field, but also in the company_name, an advantage that places it first in the search query set.

Using the AltParser we can also boost fields at query time. Say we wanted to alter the default list order by boosting a field(let's say by 30 points).

    >> docs = SearchQuerySet()
    >> filter = AltParser(('dismax','someone',qf='text^30 company_name', mm=1))
    >> filtered_results = docs.filter(content=filter)
    >> filtered_results
    [<SearchResult: Anunturi.Firm (pk=u'4')>,
     <SearchResult: Anunturi.Firm (pk=u'3')>]

Here we indicated that the text field has more importance in our search, so documents who contain the word "someone" in this field should be listed first. That's why again, no. 4 is first and no. 3 is second.

You can find more about how to boost stuff using Haystack here http://django-haystack.readthedocs.org/en/v2.4.1/boost.html?highlight=boost