Opensearch Dashboards shows 10000 as hits.total

You might have to open an incognito browser or clear your cache.

Can you check the manifest.yml in the distribution you are using? The component for OpenSearch Dashboards should look like this:

components:
  - name: OpenSearch-Dashboards
    repository: https://github.com/kavilla/OpenSearch-Dashboards-1.git
    ref: avillk/1.2.0/track_total_hits

I actually checked my request for the discover page and it looks like this:

{"params":{"index":"opensearch_dashboards_sample_data_flights","body":{"trackTotalHits":true,"version":true,"size":500,"sort":[{"timestamp":{"order":"desc","unmapped_type":"boolean"}}],"aggs":{"2":{"date_histogram":{"field":"timestamp","calendar_interval":"1d","time_zone":"America/Los_Angeles","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{"hour_of_day":{"script":{"source":"doc['timestamp'].value.hourOfDay","lang":"painless"}}},"docvalue_fields":[{"field":"timestamp","format":"date_time"}],"_source":{"excludes":[]},"query":{"bool":{"must":[],"filter":[{"match_all":{}},{"range":{"timestamp":{"gte":"2021-10-14T18:41:02.301Z","lte":"2022-01-27T19:41:02.301Z","format":"strict_date_optional_time"}}}],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@opensearch-dashboards-highlighted-field@"],"post_tags":["@/opensearch-dashboards-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}},"preference":1643312460877}}

And I can see trackTotalHits: true in the request however I actually get in an error from SQL Workbench. But now I believe this might be a case of remote cross-search and @opensearch-project plugins.

When you pull down distribution and verify it’s the build provided can you run:
./bin/opensearch-dashboards-plugin remove queryWorkbenchDashboards. Then re-run it. Otherwise, I can create a build without the plugin then I am able to see results fine and here is how the request looks like:

If it’s gets working for you I think we might need to figure out where in the distribution this is going wrong because I don’t believe it’s the default OpenSearch Dashboards.

I don’t believe it was, I believe this might have been error from the fork.

Hi @kavilla, thanks for all this. I have some good news. When I use an Incognito Window, the hits.total displayed in Discover is accurate! So, that part seems to work in your distribution and requires some browser-side cached item to be refreshed. Even though I have hard refreshed (Shift + refresh) in regular mode, Discover only shows the accurate count when using Incognito Mode. I’ll investigate what I need to do to get my customers to clear the applicable cached item in regular mode.

However, The Visualize feature still has an inaccurate hits.total, and the Request does not contain track_total_hits: true.

I did check manifest.yml, and can confirm the OpenSearch-Dashboards component is exactly how you’ve pasted it.

I did remove the queryWorkbenchDashboards as you asked, and the behavior is the same as above, no change.

In Summary:

  • Discover works! It is passing the track_total_hits: true in the Request and I’m getting accurate hits.total! Awesome!
  • Visualize does not work, the Request does not contain track_total_hits: true, and the hits.total is always maximum of 10000.

So I was digging into this. What I see locally, is that the request makes a call to internal/opensearch without those params which then calls to the JS client to actually make the HTTP request to OpenSearch. In that request I see the path query string that is required:

{
  "method": "POST",
  "path": "/opensearch_dashboards_sample_data_flights/_search",
  "body": {
    "version": true,
    "size": 500,
    "sort": [
      {
        "timestamp": {
          "order": "desc",
          "unmapped_type": "boolean"
        }
      }
    ],
    "aggs": {
      "2": {
        "date_histogram": {
          "field": "timestamp",
          "fixed_interval": "30s",
          "time_zone": "America/Los_Angeles",
          "min_doc_count": 1
        }
      }
    },
    "stored_fields": [
      "*"
    ],
    "script_fields": {
      "hour_of_day": {
        "script": {
          "source": "doc['timestamp'].value.hourOfDay",
          "lang": "painless"
        }
      }
    },
    "docvalue_fields": [
      {
        "field": "timestamp",
        "format": "date_time"
      }
    ],
    "_source": {
      "excludes": []
    },
    "query": {
      "bool": {
        "must": [],
        "filter": [
          {
            "match_all": {}
          },
          {
            "range": {
              "timestamp": {
                "gte": "2022-01-29T06:58:15.614Z",
                "lte": "2022-01-29T07:13:15.615Z",
                "format": "strict_date_optional_time"
              }
            }
          }
        ],
        "should": [],
        "must_not": []
      }
    },
    "highlight": {
      "pre_tags": [
        "@opensearch-dashboards-highlighted-field@"
      ],
      "post_tags": [
        "@/opensearch-dashboards-highlighted-field@"
      ],
      "fields": {
        "*": {}
      },
      "fragment_size": 2147483647
    }
  },
  "querystring": {
    "ignore_unavailable": true,
    "track_total_hits": true,
    "timeout": "30000ms",
    "preference": 1643440394128
  }
}

When you go to the visualization can you share with me the request and response (the data values can be filtered out) like from the network tab in your inspector.

@kavilla, thanks for keeping at this with me. I have inspected the Request and Response with the Network Tab in Developer Tools in Chrome, Incognito Window, using your distribution.
The Discover feature, I do find evidence that track_total_hits: true is being set in the Request, and Discover does show the accurate hits.total in the UI. :+1:
The Visualization feature does not have evidence of track_total_hits being set in the Request, and the hits.total in the Response is 10000.

Steps to reproduce the inaccurate 10000 result in Visualization feature:
Incognito Window
Visualize
Create Visualization
Data Table
Select an Index Pattern that is referencing an index family that resides on a remote cross-search cluster. Example, es1:blahblah-*
The Count displayed is 10000, which is an obviously inaccurate result.
View > Developer > Developer Tools. Network tab
Refresh the page with the browser refresh function.
Inspect the Network item named: _msearch
Use the Search feature to find “track_total_hits” anywhere in the _msearch item.
There are no results.
Expand every section, select all and copy. I have pasted the _msearch item below, with sensitive information redacted.

{body: {responses: [{took: 267, timed_out: false, num_reduce_phases: 2,…}]}, statusCode: 200,…}
body: {responses: [{took: 267, timed_out: false, num_reduce_phases: 2,…}]}
responses: [{took: 267, timed_out: false, num_reduce_phases: 2,…}]
0: {took: 267, timed_out: false, num_reduce_phases: 2,…}
hits: {total: 10000, max_score: null, hits: []}
hits: []
max_score: null
total: 10000
num_reduce_phases: 2
status: 200
timed_out: false
took: 267
_clusters: {total: 1, successful: 1, skipped: 0}
skipped: 0
successful: 1
total: 1
_shards: {total: 2826, successful: 2826, skipped: 2072, failed: 0}
failed: 0
skipped: 2072
successful: 2826
total: 2826
headers: {x-opaque-id: "redacted", content-type: "application/json; charset=UTF-8",…}
content-length: "289"
content-type: "application/json; charset=UTF-8"
x-opaque-id: "redacted"
meta: {context: null, request: {params: {method: "POST", path: "/_msearch",…},…}, name: "elasticsearch-js",…}
aborted: false
attempts: 0
connection: {url: "https://master-node-2.mydomain.net:9200/",…}
deadCount: 0
headers: {}
id: "https://master-node-2.mydomain.net:9200/"
resurrectTimeout: 0
roles: {master: true, data: true, ingest: true, ml: false}
data: true
ingest: true
master: true
ml: false
status: "alive"
url: "https://master-node-2.mydomain.net:9200/"
_openRequests: 0
context: null
name: "elasticsearch-js"
request: {params: {method: "POST", path: "/_msearch",…},…}
id: 1
options: {querystring: {ignore_throttled: true, ignore_unavailable: true}}
querystring: {ignore_throttled: true, ignore_unavailable: true}
ignore_throttled: true
ignore_unavailable: true
params: {method: "POST", path: "/_msearch",…}
body: "{\"ignore_unavailable\":true,\"index\":\"es1:blahblah-*\"}\n{\"timeout\":\"300000ms\",\"aggs\":{},\"size\":0,\"stored_fields\":[\"*\"],\"script_fields\":{},\"docvalue_fields\":[{\"field\":\"@ti-estamp\",\"format\":\"date_time\"},{\"field\":\"@timestamp\",\"format\":\"date_time\"},{\"field\":\"data.hungProcessStartDateTime\",\"format\":\"date_time\"},{\"field\":\"data.transaction.endTimestamp\",\"format\":\"date_time\"},{\"field\":\"data.transaction.startTimestamp\",\"format\":\"date_time\"}],\"_source\":{\"excludes\":[]},\"query\":{\"bool\":{\"must\":[],\"filter\":[{\"match_all\":{}},{\"range\":{\"@timestamp\":{\"gte\":\"2022-01-29T16:39:02.863Z\",\"lte\":\"2022-01-29T16:54:02.863Z\",\"format\":\"strict_date_optional_time\"}}}],\"should\":[],\"must_not\":[]}}}\n"
bulkBody: "{\"ignore_unavailable\":true,\"index\":\"es1:blahblah-*\"}\n{\"timeout\":\"300000ms\",\"aggs\":{},\"size\":0,\"stored_fields\":[\"*\"],\"script_fields\":{},\"docvalue_fields\":[{\"field\":\"@ti-estamp\",\"format\":\"date_time\"},{\"field\":\"@timestamp\",\"format\":\"date_time\"},{\"field\":\"data.hungProcessStartDateTime\",\"format\":\"date_time\"},{\"field\":\"data.transaction.endTimestamp\",\"format\":\"date_time\"},{\"field\":\"data.transaction.startTimestamp\",\"format\":\"date_time\"}],\"_source\":{\"excludes\":[]},\"query\":{\"bool\":{\"must\":[],\"filter\":[{\"match_all\":{}},{\"range\":{\"@timestamp\":{\"gte\":\"2022-01-29T16:39:02.863Z\",\"lte\":\"2022-01-29T16:54:02.863Z\",\"format\":\"strict_date_optional_time\"}}}],\"should\":[],\"must_not\":[]}}}\n"
headers: {user-agent: "elasticsearch-js/7.10.0-rc.1 (linux 5.4.0-84-generic-x64; Node.js v10.24.1)",…}
authorization: "Bearer redacted, lol"
content-length: "682"
content-type: "application/x-ndjson"
user-agent: "elasticsearch-js/7.10.0-rc.1 (linux 5.4.0-84-generic-x64; Node.js v10.24.1)"
x-opaque-id: "redacted"
x-opensearch-product-origin: "opensearch-dashboards"
method: "POST"
path: "/_msearch"
querystring: "ignore_throttled=true&ignore_unavailable=true"
timeout: 300000
statusCode: 200

@mhoydis, this is very insightful thanks.

For my visualization, it makes a call to internal/opensearch which basically is just an endpoint for OpenSearch Dashboards to customize the request prior to sending it to the npm client. In my case, it eventually calls OpenSearch using the _search API [docs]. But in your visualization it would appear it calls directly to the npm client to _msearch which is the multi-search API [docs]. _msearch does not support the track_total_hits param.

Do you think you can share, if any, custom configurations/settings within your stack? I’m trying to recreate how you see the direct call to _msearch in the network tab whereas my local OpenSearch Dashboards makes a call to /internal/opensearch. I believe I have cross cluster setup but doesn’t seem to be changing from /internal/opensearch to _msearch

Also, are you using a tenant or just the global tenant?

But if anything I think if the visualization is utilizing cluster settings to determine to use _msearch there might require a non-trivial refactor of visualizations. I do still believe it’s attainable for the discover page and the alerting plugin. I can also attempt to create a build with an updated alerting plugin for verification.

@kavilla, Is your Index Pattern referencing a single index (no wildcard in the name) or is it referencing an index family (wildcard in the name)?
I wonder if that is what is causes _msearch to be invoked in my case, but _search to be invoked in your case.

I will verify on my end if using an Index Pattern that references only a single index is relevant to the symptom.

I am using tenants in Opensearch Dashboards. (Global tenant is disabled, I’m using only defined tenants.) I can’t imagine this is relevant, but since you asked…

Hrmmmm… negative. Index Pattern wildcard seems not to matter. I get to _msearch regardless if my Index Pattern references multiple indices with a wildcard, or a single index (no wildcard).

I will share more details about my cross-search configuration.

Here is my remote cross-search configuration, below.
I am running a “coordinating cluster”, which is a small cluster that Opensearch Dashboards is connected to. Then, that small cluster is configured with the following remote search configuration, and the actual indices I’m searching live on these “data clusters”. In Opensearch Dashboards, all my Index Patterns are like, example, es1:blahblah-* , referencing the index family with a wildcard, and with the cluster prefer before the colon.

curl --insecure -u admin:redacted -X PUT "https://127.0.0.1:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "search": {
      "remote": {
        "es1": {
          "seeds": [ "10.123.123.1:9300", "10.123.123.2:9300", "10.123.123.3:9300" ]
        },
        "os3": {
          "seeds": [ "10.123.123.4:9300", "10.123.123.5:9300", "10.123.123.6:9300" ]
        },
        "os4": {
          "seeds": [ "10.123.123.7:9300", "10.123.123.8:9300", "10.123.123.9:9300" ]
        }
      }
    }
  }
}
'

Looking into what _msearch is for, it is the “Multi Search API”, which executes several searches in a single API request. I wonder is Opensearch Dashboards is using _msearch as part of an Advanced Setting… something like max concurrent searches or something along those lines… :thinking:

I will experiment with some Advanced Settings that might be forcing use of the multi-search API.

I was able to make my visualizations use _msearch with the setting for Batch concurrent searches toggled on.

I can confirm. Toggling off “Batch concurrent searches” results in Visualize feature returning an accurate result for indices on remote cross-search clusters.

Wow. That’s the ticket.

I assume that feature forces _msearch, which is not compatible with Visualize… but only for remote cross-search sources? :thinking:

I believe actually you said it worked with the discover page showing the right results? It’s more so that when this feature is toggled on, it will use _msearch which execute multiple searches at once. In the application, the query string for track_total_hits=true is default added to all _search requests but explicitly removed for all _msearch hits. However, since it works on the discover page it seems like _msearch can take track_total_hits within the body of the request just not in the query string.

I think the Discover page is an easy fix because the way it was implemented it’s easy to add to the body of the request. Whereas plugins might not be trivial to resolve and might require some refactoring. Could use insight from someone within OpenSearch side on why track_total_hits couldn’t be accepted on the overall request or in the query string and then applied to all the sub requests. That way there wouldn’t have to be any change in OpenSearch Dashboards.

1 Like

@ahoppity ^ would you be able to comment on this.

@mhoydis

I created an issue in OpenSearch https://github.com/opensearch-project/OpenSearch/issues/2093. Will try this path first.

1 Like