## Problem Statement
Currently, OpenSearch lacks a direct means of providing …insights into “top queries” that have a significant impact on latency and resource usage (including CPU consumption, and memory usage etc). At present, users can utilize [OpenSearch's slow log](https://opensearch.org/docs/latest/monitoring-your-cluster/logs/#slow-logs) to obtain certain degree of insights into the execution time of slow queries. However, the slow log offers limited context for the queries, lacking ways to track back to the original requests and the corresponding full query body, and doesn't encompass any resource usage dimensions for the queries. Additionally, the current slow query log doesn't provide insights into resource usage and latency for **_individual_** phases of a query execution, preventing the answering of critical questions like, "Which phase of the slow search query consumes the most time?" Moreover, slow query logs lack an aggregated view of the "**Top N slow queries by a specific resource usage or latency**" over a specified time frame. These limitations make it challenging for OpenSearch users to identify, diagnose, and correlate slow queries, and to further optimize those queries to prevent negative impacts on overall system health.
## Objective
The objective of this RFC is to propose and define the architecture, components, and key features of the “Top N queries” framework, aiming to provide real-time insights into the most impactful queries based on resource usage and latency. By aggregating crucial metrics such as query latencies, and with planned expansions to include CPU consumption and memory usage in future iterations, this feature will empower users to identify and optimize resource-intensive queries, thereby enhancing the debugging and tracing experience.
The framework's scope encompasses the following key aspects:
* Implementation of a **priority queue-based** in-memory data store, with configurable window size, on the coordinator node, designed to efficiently store the top N queries. The data model of the stored query attributes should be highly extensible for different types of resources and metrics.
* Introduction of generic and extensible metrics collection workflows capable of gathering per-request latency and resource usage metrics from diverse sources. In the first phase, the initial focus will be on implementing a generic collection method for query _**latency**_ metrics. Future iterations will extend this method to encompass additional metrics such as CPU usage, leveraging the per-request resource collection capability from the [Resource Tracking Framework](https://github.com/opensearch-project/OpenSearch/issues/1179).
* Support for exporting the Top-N queries results to different sinks on a predefined schedule (push model). Initial iterations will include basic sinks like log on disk, while subsequent versions will introduce support for advanced sinks such as OPTL or other databases.
* Support for related APIs to facilitate querying of Top-N query results (Pull model). Providing an API to fetch those data would offer customers greater flexibility for analysis, with future potential for integration into dashboards, enhancing usability and analysis capabilities.
With the above framework components, a typical workflow for the top N query feature would be:
1. Per-request metrics (latency and resource usages) are collected and sent to the in-memory data store on the coordinator node, The data store aggregates these metrics, maintaining a record of "queries with top resource usages." Granularity levels, such as per minute, per hour, and per day, can be configured based on user preferences.
2. The "queries with top resource usages" data is publish to different sinks at a schedule based on user’s configuration.
3. Customer can query the "queries with top resource usages" data via an API for further analysis.
<p align="center">
<img src="https://github.com/opensearch-project/OpenSearch/assets/7891523/acd50987-a048-4a04-a264-ba5e1f89877f" width="600">
</p>
<p align="center">
<img src="https://github.com/opensearch-project/OpenSearch/assets/7891523/fe546f83-17b1-45cd-8aad-5ed6daac520a" width="600">
</p>
## Detailed Solution
### Data model to store query information
The data model designed for storing query information should contain essential data for those query requests, such as _search query type, total shards, and indices involved in the search request_. Moreover, the data model should be structured to accommodate customized fields, facilitating adding other customized properties in the future to enhance extensibility. Additionally, the data model for the query data should feature a comprehensive breakdown of latency and resource usage across individual phases of the query execution. The data models to store query information are outlined below.
```java
// based class to store request information for a request
abstract class RequestData {
private SearchType searchType;
private int totalShards;
private String[] indices;
// property map to store future customized attributes for the
// requests. This is for extensibility consideration.
// for example, we can add user-account which initialized the request in the future
private Map<String, String> propertyMap;
// Getters and Setters.
}
// child class to store latency information for a request
class RequestDataLatency extends RequestData {
Map<String, Long> phaseLatencyMap;
Long totalLatency;
// Getters and Setters.
}
// child class to store cpu resource usage information for a request
class RequestDataCPU extends RequestData {
Map<String, float> phaseResourceUsageMap;
float totalResourceUsage;
// Getters and Setters.
}
```
### In-memory data store on Coordinator node
The proposed data store, `SearchQueryAnalyzer`, is a generic, extendible, priority queue based Object for tracking the "Top N queries" based on specific resource usage within a defined time window. As previously mentioned, this data store should provide support for exporting data to various sinks and also offers a dedicated method for querying data via an API.
```java
abstract class SearchQueryAnalyzer {
// Number of top queries to keep
private int topNSize;
// window size in seconds
private int windowSize;
// getter and setters
}
public class SearchQueryLatencyAnalyzer extends SearchQueryAnalyzer {
PriorityQueue<RequestDataLatency> topRequests;
// Async function to ingest latency data for a request
public void ingestDataForQuery(SearchRequest request, Map<String, String> customizedAttributes, Long latency){};
public List<RequestDataLatency> getTopLatencyQueries(){};
public void export(Sink sinkName){};
}
public class SearchQueryCPUAnalyzer extends SearchQueryAnalyzer {
PriorityQueue<RequestDataCPU> topRequests;
// Async function to ingest cpu usage data for a request
public void ingestDataForQuery(SearchRequest request, Map<String, String> customizedAttributes, float cpuUsage){};
public List<RequestDataCPU> getTopNCpuUsageQueries(){};
public void export(Sink sinkName){};
}
```
It's crucial to note that the `topN` and `windowSize` value for each analyzer store should be customizable through a configuration file. To enhance flexibility, an API will also be provided, allowing users to dynamically adjust these configuration values without restarting the OpenSearch process. Further details regarding the configuration API will be elaborated in the "APIs for Configurations" section below.
### Metrics Collection
Different metrics collection workflow will be implemented tailored to different type of metrics. The initial focus will be on the metrics collection workflow for latency metrics (considering latency as the primary resource usage dimension in the first iteration), However, future iterations can incorporate additional dimensions, such as CPU usage, based on metrics available from the [resource tracking framework](https://github.com/opensearch-project/OpenSearch/issues/1179)
To capture per-request phase latency information, a listener-based approach will be employed. Based on the [newly introduced framework to track search query latency](https://github.com/opensearch-project/OpenSearch/issues/7334), a new component, `SearchRequestResourceListener`, will be defined. This relevant listener functions will be executed during each phase of search queries, recording latency and storing all relevant information into the `SearchQueryLatencyAnalyzer`. The high-level class design is presented below.
```java
public class SearchRequestResourceListener implements SearchRequestOperationsListener {
private long startTime;
// Query phase
void onQueryPhaseStart(SearchPhaseContext context) {
// record the timestamp when query phase starts
}
void onQueryPhaseEnd(SearchPhaseContext phase, long tookTime) {
// calculate and store the latency for this phase
}
void onQueryPhaseFailure(SearchPhaseContext context) {}
void onRequestStart(SearchPhaseContext context) {
// Set the start time
}
void onRequestEnd(SearchPhaseContext context) {
// Report latency
SearchQueryLatencyAnalyzer.ingest();
}
}
```
### Data export - pull model
To facilitate user access to the "top queries with resource usage" information, we will implement dedicated API endpoints in OpenSearch. The structure of these API endpoints is outlined below:
#### Top Queries - general API
**Endpoint:**
`GET /insights/top_queries`
**Parameters:**
* `type`: Type of the metric that top n queries are calculated on
**Example Usage:**
`GET /insights/top_queries?type=latency`
**Response:**
Upon querying the `GET /insights/top_queries?type=latency` API, the response will be a structured set of data providing insights into the top queries based on latency within the specified parameters:
```js
{
"timestamp": "2023-11-09T12:30:45Z",
"top_queries": [
{
"query_id": "12345",
"latency": 600,
"timestamp": "2023-11-09T12:29:15Z",
"search_type": "QUERY_THEN_FETCH",
"indices": [
"index1",
"index2"
],
"shards": 5,
"phases_details": {
"query": 100,
"fetch": 200,
"expand": 300,
},
// Additional customized attributes details specific to the query
"attributes": {
"user_id": "value"
}
},
// Additional top queries results based on latency
]
}
```
#### Top Queries by CPU resource usage (future iterations):
**Endpoint example:**
`GET /insights/top_queries?type=cpu`
**Parameters:**
* `type`: Type of the metric that top n queries are calculated on, in this case, it's CPU usage
**Response:**
The response will contain details about the top queries based on CPU usage within the specified time window and limit.
```js
{
"timestamp": "2023-11-09T12:30:45Z",
"top_queries": [
{
"query_id": "12345",
"cpu_usage": 0.6,
"timestamp": "2023-11-09T12:29:15Z",
"search_type": "QUERY_THEN_FETCH",
"indices": [
"index1",
"index2"
],
"shards": 5,
"phases_details": {
"query": 0.1,
"fetch": 0.2,
"expand": 0.3,
},
// Additional customized attributes details specific to the query
"attributes": {
"user_id": "value"
}
},
// Additional top queries results based on latency
]
}
```
The below sequence diagram illustrates the workflows and detailed interactions among various components within the OpenSearch backend when the “Top N queries” feature is enabled. It captures the process of calculating and storing latency when OpenSearch processes a search query, and also how the results will be returned when queried.

### Data export - push model
The Java Scheduler can be used to automate the export of Top N queries at scheduled intervals, pushing the data to various sinks. In the initial iteration, we will establish support for a fundamental sink, enabling the writing of top N queries to a log file on disk.
Subsequent iterations will introduce more advanced sinks, enhancing the export capabilities. Examples of advanced sinks include OPTL and SQL/time series databases. This iterative approach allows this feature to evolve and accommodate a broader spectrum of user needs, expanding the export functionality to encompass diverse and advanced storage and analysis solutions in the future.
### APIs for Configurations
As previously mentioned, The `topN` and `windowSize` value and the data export interval for each `SearchQueryAnalyzer` store should be highly configurable. This can be done through a configuration file that is read when the OpenSearch process starts. Additionally, an API will be provided to dynamically configure these values without requiring a restart of the OpenSearch process. The proposed API endpoint to configure the `windowSize`, `topN` value and `export_interval` is described as follows:
```js
PUT /_cluster/settings{
"persistent":{
"search.top_n_queries.latency.enabled" : "true",
"search.top_n_queries.latency.window_size" : 60,
"search.top_n_queries.latency.top_n_size" : 5,
"search.top_n_queries.latency.export_interval" : 30,
}
}
```