What should be the Configurations for 3-4TB?

I have 14 indexes each having 230GB of data. Total of 3TB of data approx.

What should be the numbers of

  1. Master Nodes with memory size.
  2. Data Nodes with memory size.
  3. Coordinating Nodes.
  4. Shards per each indices.
  5. Replica Shards.

You can also provide the use case(log analytics or search), indexing rate and query rate to get better suggestion. But based on the best practice about shard size which is that it should be no more than 50GB, so each index should have at least 5 primary shards, and it’s better that shards are balanced between nodes, so we can have 5 nodes, each node has 700GB storage size and 8GB memory. Master node is not needed because we only have a small cluster with 5 nodes, and coordinate node is also needed if you don’t use ingest pipeline or aggregation. You can start from this point and test if the suggested configuration works for your use case, good luck!

By the way, here’re some best practices about how to size your OpenSearch cluster, you can check it: Sizing Amazon OpenSearch Service domains - Amazon OpenSearch Service.

1 Like

Thanks @gaobinlong. My use case is search. Currently I have 4 data nodes each of 64GB RAM (r5.2xlarge). But searches are still slow (search query having Boolean queries and aggregations).

Could you show the index mappings and query DSL?

The below is my index mapping

{
  "properties": {
    "added": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "articleId": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "articleSentiment": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "authors_byline": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "categories": {
      "type": "object",
      "properties": {
        "id": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "label": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "links": {
          "type": "object",
          "properties": {
            "parents": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            },
            "self": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            }
          }
        },
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "score": {
          "type": "float"
        }
      }
    },
    "combined_text": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "companies": {
      "type": "object",
      "properties": {
        "domains": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "id": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "label": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "links": {
          "type": "object",
          "properties": {
            "parents": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            },
            "self": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            }
          }
        },
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "score": {
          "type": "float"
        },
        "symbols": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "content": {
      "fields": {
        "classic_text": {
          "type": "text",
          "analyzer": "classic_analyzer"
        },
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        },
        "no_apostrophe_text": {
          "type": "text",
          "analyzer": "remove_apostrophe_analyzer"
        }
      },
      "analyzer": "classic_analyzer",
      "type": "text"
    },
    "country": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "data_source": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "description": {
      "fields": {
        "classic_text": {
          "type": "text",
          "analyzer": "classic_analyzer"
        },
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        },
        "no_apostrophe_text": {
          "type": "text",
          "analyzer": "remove_apostrophe_analyzer"
        }
      },
      "analyzer": "classic_analyzer",
      "type": "text"
    },
    "entities": {
      "type": "object",
      "properties": {
        "data": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "label": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "mentions": {
          "type": "long"
        },
        "type": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "file_name": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "final_theme": {
      "type": "object",
      "properties": {
        "label": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "value": {
          "type": "float"
        }
      }
    },
    "id": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "image_url": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "isIngested": {
      "type": "boolean"
    },
    "keywords": {
      "type": "object",
      "properties": {
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "weight": {
          "type": "float"
        }
      }
    },
    "labels": {
      "type": "object",
      "properties": {
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "language": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "link": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "locations": {
      "type": "object",
      "properties": {
        "area": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "city": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "country": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "county": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "state": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "matchedAuthors": {
      "type": "object",
      "properties": {
        "id": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "matched_authors": {
      "type": "object",
      "properties": {
        "id": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "mediaType": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "message_no": {
      "type": "long"
    },
    "negative": {
      "type": "float"
    },
    "neutral": {
      "type": "float"
    },
    "places": {
      "type": "object",
      "properties": {
        "amenity": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "city": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "coordinates": {
          "type": "object",
          "properties": {
            "lat": {
              "type": "float"
            },
            "lon": {
              "type": "float"
            }
          }
        },
        "country": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "country_code": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "county": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "neighbourhood": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "osm_id": {
          "type": "long"
        },
        "postcode": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "quarter": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "road": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "state": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "state_district": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "suburb": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "town": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        }
      }
    },
    "positive": {
      "type": "float"
    },
    "pubDate": {
      "type": "date"
    },
    "reach": {
      "type": "float"
    },
    "reprint": {
      "type": "boolean"
    },
    "reprint_group_id": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "search_id": {
      "type": "long"
    },
    "source": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "sources": {
      "type": "object",
      "properties": {
        "domain": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "home_page_url": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "id": {
          "type": "long"
        },
        "locations": {
          "type": "object",
          "properties": {
            "city": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            },
            "country": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            },
            "state": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            }
          }
        },
        "name": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "paywall": {
          "type": "boolean"
        },
        "scopes": {
          "type": "object",
          "properties": {
            "city": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            },
            "country": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            },
            "state": {
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              },
              "type": "text"
            }
          }
        }
      }
    },
    "sub_media_type": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "summary": {
      "fields": {
        "classic_text": {
          "type": "text",
          "analyzer": "classic_analyzer"
        },
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        },
        "no_apostrophe_text": {
          "type": "text",
          "analyzer": "remove_apostrophe_analyzer"
        }
      },
      "analyzer": "classic_analyzer",
      "type": "text"
    },
    "title": {
      "fields": {
        "classic_text": {
          "type": "text",
          "analyzer": "classic_analyzer"
        },
        "keyword": {
          "type": "keyword",
          "ignore_above": 256,
          "normalizer": "normalized_keyword"
        },
        "no_apostrophe_text": {
          "type": "text",
          "analyzer": "remove_apostrophe_analyzer"
        }
      },
      "analyzer": "classic_analyzer",
      "type": "text"
    },
    "top_source": {
      "type": "boolean"
    },
    "total_articles": {
      "type": "long"
    },
    "translation_text": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "url": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "wordCloud": {
      "type": "object",
      "properties": {
        "label": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "value": {
          "type": "float"
        }
      }
    }
  }
}

The basic Boolean query.

GET amx-data/_search
{
  "size": 50,
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "Apple AND Samsung",
          "fields": [
            "title",
            "summary",
            "description",
            "content",
            "title.no_apostrophe_text",
            "summary.no_apostrophe_text",
            "description.no_apostrophe_text",
            "content.no_apostrophe_text"
          ]
        }
      },
      "filter": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "mediaType": "Online"
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "language": "en"
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "locations.country": "us"
                }
              }
            ]
          }
        }
      ]
    }
  },
  "aggs": {
    "total_count": {
      "value_count": {
        "field": "_id"
      }
    }
  }
}

I think the query DSL can be optimized to achieve low search latency:

  1. The value_count aggregation on the field _id is equivalent to Get {index}/_count which shows the total number of documents in the index, so we don’t need to do aggregation at all.
  2. Match_phrase query is slower than match query or term query, and you just search only one phrase, so you may change the 3 match_phrase queries to term query, like
"term": {
   "mediaType.keyword": "online"
}

, hope this helps.

1 Like

Thanks, I updated my query and aggregations. It is working fine.