Parent child relationship

searchwithme · July 18, 2023, 1:55pm

Is there any performance impact with this type of query? (parent/child)
Also, if both parent and child are in the same shard, could this cause shard skewness?
Any alterntive solutions to parent/child?

Opster_support · July 25, 2023, 7:27pm

Yes, using the join field type for parent/child relationships in OpenSearch can have a performance impact. This is because when you execute a query, OpenSearch needs to resolve the join at query time, which can be expensive in terms of CPU and memory usage.

Regarding shard skewness, if both parent and child documents are in the same shard, it could potentially lead to uneven distribution of data across shards if the number of child documents per parent varies significantly. This could lead to some shards being larger and more heavily loaded than others, which can impact performance.

As for alternatives to parent/child relationships, you have a few options:

Denormalization: Instead of maintaining separate parent and child documents, you could denormalize your data and store all relevant information in a single document. This can simplify queries and improve performance, but it can also increase storage requirements and make updates more complex.
Nested objects: If the relationship between parent and child documents is one-to-many, you could use nested objects. This allows you to store multiple objects within a single document, which can be queried as a unit. However, nested objects can also increase storage requirements and complexity.
Application-side joins: Instead of relying on OpenSearch to resolve joins, you could handle this in your application. This involves executing separate queries for parent and child documents and combining the results in your application. This can be more efficient than using the join field type, but it also requires more application logic.

Remember, the best approach depends on your specific use case and requirements.

Disclaimer: OpsGPT.io helped with part of this answer

radu.gheorghe · July 26, 2023, 2:29pm

One thing to add here is that nested docs require you to reindex the whole ensemble, even if you update just a parent or a child. It’s as if it were one document, even though under the hood you have multiple Lucene docs.

In general, you have a trade-off between flexibility and performance. So from fastest to most flexible, you have the following solutions:

Denormalization
Nested
Parent-child
Application-side joins

Topic		Replies	Views
Does OpenSearch provide the query functionality for join in Solr? OpenSearch feature-request	1	535	December 19, 2023
Term query with join doesn't work, but works individually OpenSearch	0	248	November 2, 2022
Entity Relationship Query and Full Text Search OpenSearch	0	360	June 26, 2023
Opensearch optimization for search OpenSearch configure	7	1218	May 30, 2023
Problem with nested query OpenSearch	1	32	December 3, 2024

Parent child relationship

Related topics