KNN Queries Are Slow And Not Cached

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch Docker image: opensearchproject/opensearch:2.17.0

Describe the issue:
We have built a vector index and knn queries are quite slow (around 20 seconds). We checked the stats and the cache seems empty although “knn_query_with_filter_requests” increments with more queries being executed.

Configuration:
docker compose:

services:
  opensearch:
    restart: unless-stopped
    image: opensearchproject/opensearch:2.17.0
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    environment:
      - discovery.type=single-node
      - cluster.name=opensearch-cluster
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping  
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=XYZ
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data:/usr/share/opensearch/data

docker compose override:

services:
  opensearch:
    environment:
      OPENSEARCH_JAVA_OPTS: -Xms32g -Xmx32g
    deploy:
      resources:
        limits:
          memory: 34g
          cpus: "20"

Mapping:

settings:
  default:
    index:
        number_of_shards: 2
        number_of_replicas: 2
        max_result_window: 3000000
        knn: true
    mappings:
      default:
        dynamic: false
        properties:
          "embedding":
            type: "knn_vector"
            dimension: 768
            method:
              engine: lucene
              name: hnsw
              space_type: l2
              parameters:
                ef_construction: 128
                m: 16

Opensearch has 32GB of memory. There are around 4 millions docs distributed over several indices.

Relevant Logs or Screenshots:

After warmup (k-NN plugin API - OpenSearch Documentation):

GET /_plugins/_knn/stats?pretty

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "opensearch-cluster",
  "circuit_breaker_triggered": false,
  "model_index_status": null,
  "nodes": {
    "xyz...": {
      "max_distance_query_with_filter_requests": 0,
      "graph_memory_usage_percentage": 0,
      "graph_query_requests": 0,
      "graph_memory_usage": 0,
      "cache_capacity_reached": false,
      "load_success_count": 0,
      "training_memory_usage": 0,
      "indices_in_cache": {},
      "script_query_errors": 0,
      "hit_count": 0,
      "knn_query_requests": 2174,
      "total_load_time": 0,
      "miss_count": 0,
      "min_score_query_requests": 0,
      "knn_query_with_filter_requests": 819,
      "training_memory_usage_percentage": 0,
      "max_distance_query_requests": 0,
      "lucene_initialized": true,
      "graph_index_requests": 0,
      "faiss_initialized": false,
      "load_exception_count": 0,
      "training_errors": 0,
      "min_score_query_with_filter_requests": 0,
      "eviction_count": 0,
      "nmslib_initialized": false,
      "script_compilations": 0,
      "script_query_requests": 0,
      "graph_stats": {
        "merge": {
          "current": 0,
          "total": 0,
          "total_time_in_millis": 0,
          "current_docs": 0,
          "total_docs": 0,
          "total_size_in_bytes": 0,
          "current_size_in_bytes": 0
        },
        "refresh": {
          "total": 0,
          "total_time_in_millis": 0
        }
      },
      "graph_query_errors": 0,
      "indexing_from_model_degraded": false,
      "graph_index_errors": 0,
      "training_requests": 0,
      "script_compilation_errors": 0
    }
  }
}

If we understand the docs correctly, the cache should not be empty after warmup:

The k-NN plugin builds a native library index of the vectors for each knn-vector field/Lucene segment pair during indexing, which can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the Apache Lucene documentation. These native library indexes are loaded into native memory during search and managed by a cache. To learn more about preloading native library indexes into memory, refer to the warmup API. Additionally, you can see which native library indexes are already loaded in memory. To learn more about this, see the stats API section.

I wanted to try the performance analyzer plugin but it is broken: [BUG] Performance Analyzer webserver on port 9600 not responding to any API calls (caused by JDK upgrade?) · Issue #545 · opensearch-project/performance-analyzer-rca · GitHub

Thanks for any hint.