I have 14 indexes each having 230GB of data. Total of 3TB of data approx.
What should be the numbers of
- Master Nodes with memory size.
- Data Nodes with memory size.
- Coordinating Nodes.
- Shards per each indices.
- Replica Shards.
I have 14 indexes each having 230GB of data. Total of 3TB of data approx.
What should be the numbers of
You can also provide the use case(log analytics or search), indexing rate and query rate to get better suggestion. But based on the best practice about shard size which is that it should be no more than 50GB, so each index should have at least 5 primary shards, and it’s better that shards are balanced between nodes, so we can have 5 nodes, each node has 700GB storage size and 8GB memory. Master node is not needed because we only have a small cluster with 5 nodes, and coordinate node is also needed if you don’t use ingest pipeline or aggregation. You can start from this point and test if the suggested configuration works for your use case, good luck!
By the way, here’re some best practices about how to size your OpenSearch cluster, you can check it: Sizing Amazon OpenSearch Service domains - Amazon OpenSearch Service.
Thanks @gaobinlong. My use case is search. Currently I have 4 data nodes each of 64GB RAM (r5.2xlarge). But searches are still slow (search query having Boolean queries and aggregations).
Could you show the index mappings and query DSL?
The below is my index mapping
{
"properties": {
"added": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"articleId": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"articleSentiment": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"authors_byline": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"categories": {
"type": "object",
"properties": {
"id": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"label": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"links": {
"type": "object",
"properties": {
"parents": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"self": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"score": {
"type": "float"
}
}
},
"combined_text": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"companies": {
"type": "object",
"properties": {
"domains": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"id": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"label": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"links": {
"type": "object",
"properties": {
"parents": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"self": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"score": {
"type": "float"
},
"symbols": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"content": {
"fields": {
"classic_text": {
"type": "text",
"analyzer": "classic_analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"no_apostrophe_text": {
"type": "text",
"analyzer": "remove_apostrophe_analyzer"
}
},
"analyzer": "classic_analyzer",
"type": "text"
},
"country": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"data_source": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"description": {
"fields": {
"classic_text": {
"type": "text",
"analyzer": "classic_analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"no_apostrophe_text": {
"type": "text",
"analyzer": "remove_apostrophe_analyzer"
}
},
"analyzer": "classic_analyzer",
"type": "text"
},
"entities": {
"type": "object",
"properties": {
"data": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"label": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"mentions": {
"type": "long"
},
"type": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"file_name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"final_theme": {
"type": "object",
"properties": {
"label": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"value": {
"type": "float"
}
}
},
"id": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"image_url": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"isIngested": {
"type": "boolean"
},
"keywords": {
"type": "object",
"properties": {
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"weight": {
"type": "float"
}
}
},
"labels": {
"type": "object",
"properties": {
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"language": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"link": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"locations": {
"type": "object",
"properties": {
"area": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"city": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"country": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"county": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"state": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"matchedAuthors": {
"type": "object",
"properties": {
"id": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"matched_authors": {
"type": "object",
"properties": {
"id": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"mediaType": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"message_no": {
"type": "long"
},
"negative": {
"type": "float"
},
"neutral": {
"type": "float"
},
"places": {
"type": "object",
"properties": {
"amenity": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"city": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"coordinates": {
"type": "object",
"properties": {
"lat": {
"type": "float"
},
"lon": {
"type": "float"
}
}
},
"country": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"country_code": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"county": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"neighbourhood": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"osm_id": {
"type": "long"
},
"postcode": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"quarter": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"road": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"state": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"state_district": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"suburb": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"town": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"positive": {
"type": "float"
},
"pubDate": {
"type": "date"
},
"reach": {
"type": "float"
},
"reprint": {
"type": "boolean"
},
"reprint_group_id": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"search_id": {
"type": "long"
},
"source": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"sources": {
"type": "object",
"properties": {
"domain": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"home_page_url": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"id": {
"type": "long"
},
"locations": {
"type": "object",
"properties": {
"city": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"country": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"state": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
},
"name": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"paywall": {
"type": "boolean"
},
"scopes": {
"type": "object",
"properties": {
"city": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"country": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"state": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
}
}
}
}
},
"sub_media_type": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"summary": {
"fields": {
"classic_text": {
"type": "text",
"analyzer": "classic_analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"no_apostrophe_text": {
"type": "text",
"analyzer": "remove_apostrophe_analyzer"
}
},
"analyzer": "classic_analyzer",
"type": "text"
},
"title": {
"fields": {
"classic_text": {
"type": "text",
"analyzer": "classic_analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "normalized_keyword"
},
"no_apostrophe_text": {
"type": "text",
"analyzer": "remove_apostrophe_analyzer"
}
},
"analyzer": "classic_analyzer",
"type": "text"
},
"top_source": {
"type": "boolean"
},
"total_articles": {
"type": "long"
},
"translation_text": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"url": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"wordCloud": {
"type": "object",
"properties": {
"label": {
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"type": "text"
},
"value": {
"type": "float"
}
}
}
}
}
The basic Boolean query.
GET amx-data/_search
{
"size": 50,
"query": {
"bool": {
"must": {
"query_string": {
"query": "Apple AND Samsung",
"fields": [
"title",
"summary",
"description",
"content",
"title.no_apostrophe_text",
"summary.no_apostrophe_text",
"description.no_apostrophe_text",
"content.no_apostrophe_text"
]
}
},
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"mediaType": "Online"
}
}
]
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"language": "en"
}
}
]
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"locations.country": "us"
}
}
]
}
}
]
}
},
"aggs": {
"total_count": {
"value_count": {
"field": "_id"
}
}
}
}
I think the query DSL can be optimized to achieve low search latency:
_id
is equivalent to Get {index}/_count
which shows the total number of documents in the index, so we don’t need to do aggregation at all."term": {
"mediaType.keyword": "online"
}
, hope this helps.
Thanks, I updated my query and aggregations. It is working fine.