Word Entity Duet: Scoring Plugins

The Word Entity Duet project includes four scoring plugins, which can be used for queries and defining featuresets. A description of how the score is computed for each plugin and a sample query are provided below.

Boolean And

The Boolean And script scores documents that contain all the query terms with the term frequency of the least frequent query term. Documents that do not contain all the query terms are scored zero.

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "president sanders"
        }
      },
      "script_score": {
        "script": {
          "source": "booland",
          "lang": "boolandscript",
          "params": {
            "field": "body",
            "query": "president sanders"
          }
        }
      }
    }
  }
}

Boolean Or

The Boolean Or script scores documents with at least one query term with the document term frequency of the most frequent query term. Documents that do not contain any query terms are scored zero.

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "president sanders"
        }
      },
      "script_score": {
        "script": {
          "source": "boolor",
          "lang": "boolorscript",
          "params": {
            "field": "body",
            "query": "president sanders"
          }
        }
      }
    }
  }
}

Coordinate Match

The Coordinate Match scoring script scores documents with the number of query terms that are present in the document.

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "president sanders"
        }
      },
      "script_score": {
        "script": {
          "source": "cm",
          "lang": "cmscript",
          "params": {
            "field": "body",
            "query": "president sanders"
          }
        }
      }
    }
  }
}

Tf-Idf

Tf-idf stands for term frequency-inverse document frequency. To use this plugin, the inverse term frequency for each query term must be computed and passed in as a parameter. The score for each document is the sum of the term frequency times the inverse term frequency for each query term.

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "president sanders"
        }
      },
      "script_score": {
        "script": {
          "source": "boolor",
          "lang": "boolorscript",
          "params": {
            "field": "body",
            "query": "president sanders"
            "idfs": "0.1 0.2"
          }
        }
      }
    }
  }
}