Parent child relationship

  1. Is there any performance impact with this type of query? (parent/child)
  2. Also, if both parent and child are in the same shard, could this cause shard skewness?
  3. Any alterntive solutions to parent/child?

Yes, using the join field type for parent/child relationships in OpenSearch can have a performance impact. This is because when you execute a query, OpenSearch needs to resolve the join at query time, which can be expensive in terms of CPU and memory usage.

Regarding shard skewness, if both parent and child documents are in the same shard, it could potentially lead to uneven distribution of data across shards if the number of child documents per parent varies significantly. This could lead to some shards being larger and more heavily loaded than others, which can impact performance.

As for alternatives to parent/child relationships, you have a few options:

  1. Denormalization: Instead of maintaining separate parent and child documents, you could denormalize your data and store all relevant information in a single document. This can simplify queries and improve performance, but it can also increase storage requirements and make updates more complex.

  2. Nested objects: If the relationship between parent and child documents is one-to-many, you could use nested objects. This allows you to store multiple objects within a single document, which can be queried as a unit. However, nested objects can also increase storage requirements and complexity.

  3. Application-side joins: Instead of relying on OpenSearch to resolve joins, you could handle this in your application. This involves executing separate queries for parent and child documents and combining the results in your application. This can be more efficient than using the join field type, but it also requires more application logic.

Remember, the best approach depends on your specific use case and requirements.

Disclaimer: OpsGPT.io helped with part of this answer :slight_smile:

One thing to add here is that nested docs require you to reindex the whole ensemble, even if you update just a parent or a child. It’s as if it were one document, even though under the hood you have multiple Lucene docs.

In general, you have a trade-off between flexibility and performance. So from fastest to most flexible, you have the following solutions:

  1. Denormalization
  2. Nested
  3. Parent-child
  4. Application-side joins