Mapping ambiguous numeric tokens like "1/4" to the correct product attribute

Describe the issue:

We’re running OpenSearch with a Magento/Adobe Commerce catalog of industrial tools (endmills, drills, etc.) with millions of SKUs. Products have many dimensional attributes: cutter diameter, overall length, shank diameter, flute length, etc and many of these share common values.

When a user searches for 1/4, how should we approach mapping that query to the correct attribute(s)?

For example, 1/4 could mean:

  • 1/4" cutter diameter

  • 1/4" shank diameter

  • 1/4" overall length

Today we’re just doing a broad match across all fields, which returns a lot of noise. What strategies have you used to:

  1. Prioritize certain attributes over others when the query is just a bare dimension (e.g., assume cutter diameter first)?

  2. Handle compound queries like 1/4 endmill where part of the query is a dimension and part is a product type?

  3. Use query-time or index-time boosting to weight the most commercially relevant attribute?

Is this something best handled with function_score queries, custom analyzers, or some kind of query classification layer before hitting OpenSearch?

Any experience or patterns appreciated.

@carlosjimenezdev Thank you for the question, There are a number of way to achieve this of course. The following is the most straightforward approach which you can adapt to your business needs.

Create the sample index:

PUT /industrial_tools
{
  "settings": {
    "analysis": {
      "analyzer": {
        "dimension_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "sku":                 { "type": "keyword" },
      "product_name":        { "type": "text", "analyzer": "standard" },
      "product_type":        { "type": "text", "analyzer": "standard", "fields": { "keyword": { "type": "keyword" } } },
      "dimensions_catchall": { "type": "text", "analyzer": "dimension_analyzer" },
      "popularity_score":    { "type": "float" },
      "is_featured":         { "type": "boolean" },
      "sales_rank":          { "type": "float" },
      "cutter_diameter_str": {
        "type": "text",
        "analyzer": "dimension_analyzer",
        "copy_to": "dimensions_catchall",
        "fields": { "keyword": { "type": "keyword" } }
      },
      "shank_diameter_str": {
        "type": "text",
        "analyzer": "dimension_analyzer",
        "copy_to": "dimensions_catchall",
        "fields": { "keyword": { "type": "keyword" } }
      },
      "overall_length_str": {
        "type": "text",
        "analyzer": "dimension_analyzer",
        "copy_to": "dimensions_catchall",
        "fields": { "keyword": { "type": "keyword" } }
      },
      "flute_length_str": {
        "type": "text",
        "analyzer": "dimension_analyzer",
        "copy_to": "dimensions_catchall",
        "fields": { "keyword": { "type": "keyword" } }
      }
    }
  }
}

Populate with sample data for testing:

POST /industrial_tools/_bulk
{ "index": { "_id": "1" } }
{ "sku": "EM-001", "product_name": "2-Flute Carbide Endmill", "product_type": "endmill", "cutter_diameter_str": "1/4", "shank_diameter_str": "1/4", "overall_length_str": "2-1/2", "flute_length_str": "3/4", "popularity_score": 95.0, "is_featured": true, "sales_rank": 1 }
{ "index": { "_id": "2" } }
{ "sku": "EM-002", "product_name": "4-Flute Carbide Endmill", "product_type": "endmill", "cutter_diameter_str": "1/2", "shank_diameter_str": "1/2", "overall_length_str": "3", "flute_length_str": "1/4", "popularity_score": 80.0, "is_featured": false, "sales_rank": 2 }
{ "index": { "_id": "3" } }
{ "sku": "DR-001", "product_name": "HSS Twist Drill Bit", "product_type": "drill", "cutter_diameter_str": "1/4", "shank_diameter_str": "1/4", "overall_length_str": "4", "flute_length_str": "2-1/2", "popularity_score": 70.0, "is_featured": false, "sales_rank": 5 }
{ "index": { "_id": "4" } }
{ "sku": "EM-003", "product_name": "Ball Nose Endmill", "product_type": "endmill", "cutter_diameter_str": "3/8", "shank_diameter_str": "1/4", "overall_length_str": "2", "flute_length_str": "1", "popularity_score": 60.0, "is_featured": false, "sales_rank": 8 }
{ "index": { "_id": "5" } }
{ "sku": "EM-004", "product_name": "Single Flute Endmill", "product_type": "endmill", "cutter_diameter_str": "1/4", "shank_diameter_str": "3/8", "overall_length_str": "2", "flute_length_str": "1/2", "popularity_score": 88.0, "is_featured": true, "sales_rank": 3 }
{ "index": { "_id": "6" } }
{ "sku": "DR-002", "product_name": "Cobalt Drill Bit", "product_type": "drill", "cutter_diameter_str": "1/2", "shank_diameter_str": "1/2", "overall_length_str": "1/4", "flute_length_str": "3", "popularity_score": 50.0, "is_featured": false, "sales_rank": 12 }

Searching using only “1/4”:

GET /industrial_tools/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            { "match": { "cutter_diameter_str":  { "query": "1/4", "boost": 5 } } },
            { "match": { "shank_diameter_str":   { "query": "1/4", "boost": 4 } } },
            { "match": { "flute_length_str":     { "query": "1/4", "boost": 3 } } },
            { "match": { "overall_length_str":   { "query": "1/4", "boost": 2 } } },
            { "match": { "dimensions_catchall":  { "query": "1/4", "boost": 1 } } }
          ],
          "minimum_should_match": 1
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity_score",
            "factor": 0.1,
            "modifier": "log1p",
            "missing": 1
          }
        },
        {
          "filter": { "term": { "is_featured": true } },
          "weight": 1.5
        },
        {
          "gauss": {
            "sales_rank": {
              "origin": 1,
              "scale": 10,
              "decay": 0.5
            }
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

Search using “1/4 endmill”:

GET /industrial_tools/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "bool": {
                "must": [
                  { "match": { "dimensions_catchall": { "query": "1/4", "boost": 3 } } },
                  { "match": { "product_type": { "query": "endmill", "boost": 3 } } }
                ],
                "boost": 4
              }
            },
            { "match": { "cutter_diameter_str":  { "query": "1/4", "boost": 5 } } },
            { "match": { "shank_diameter_str":   { "query": "1/4", "boost": 4 } } },
            { "match": { "flute_length_str":     { "query": "1/4", "boost": 3 } } },
            { "match": { "overall_length_str":   { "query": "1/4", "boost": 2 } } },
            { "match": { "dimensions_catchall":  { "query": "1/4", "boost": 1 } } },
            { "match": { "product_type":         { "query": "endmill", "boost": 2 } } },
            { "match": { "product_name":         { "query": "1/4 endmill", "boost": 1 } } }
          ],
          "minimum_should_match": 1
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity_score",
            "factor": 0.1,
            "modifier": "log1p",
            "missing": 1
          }
        },
        {
          "filter": { "term": { "is_featured": true } },
          "weight": 1.5
        },
        {
          "gauss": {
            "sales_rank": {
              "origin": 1,
              "scale": 10,
              "decay": 0.5
            }
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

The approach uses a bool query with should clauses to search across all dimensional attributes at once via a dimensions_catchall field (which every dimension field feeds into via copy_to), while still awarding extra score to documents where 1/4 lands on a more commercially important attribute like cutter_diameter_str over overall_length_str. On top of that, a function_score wrapper multiplies the text relevance score by commercial signals like popularity, featured status, and sales rank — so between two equally dimensionally-matched products, the one that sells better or is featured by the business will always surface higher.

However, rather than maintaining two separate queries, it would be simper to handle this in your application layer and detect whether the search string contains a product type token or not, and conditionally include or exclude the inner must block and the product_type/product_name clauses before sending the query to OpenSearch. That way there is one query template that adapts to whatever the user types.

Hope this helps

@Anthony Thanks for the detailed example, the function_score with tiered boosting makes sense for a small set of attributes, but our situation is a bit different.

We have 100+ measurable attributes across the catalog, and any given product might only have 10-15 of them. An endmill has cutter diameter, shank diameter, flute length, overall length, helix angle, etc. A saw blade has width, arbor diameter, tooth count, plate thickness. A tap has pitch diameter, thread size, chamfer length. There’s very little overlap between product types.

So we can’t realistically define a static boost per attribute — we’d be comparing apples to oranges. Cutter diameter being boosted at 5 is meaningless for a saw blade that doesn’t even have that attribute.

But the deeper question remains: when a user searches for 1/4, how do we determine which attribute they’re referring to? Even within a single product type, 1/4 could match 3-4 different attributes.

Is there a pattern for this kind of attribute disambiguation at query time?

Or is this fundamentally something that has to be solved outside OpenSearch?

@carlosjimenezdev I’m not sure I follow your question. If you are asking can OpenSearch somehow determine which product the user is looking for based on the search “1/4”, the answer is of course - no. You can however manually add popularity_score to the products which will be returned in the order you determined if the user has not provided the product name. Without having access to the information about the popularity of the product (i.e. index with top sales etc), OpenSearch will not be able to determine this.