Mapping - Fields Type English or Custom Analyzers

I have learned that custom analyzers are a somewhat new feature, but my question pertains to what is currently allowed.

With Elastic search this can be specified:

PUT my-index-000001

{
  "mappings": {
    "properties": {
      "text": { 
        "type": "text",
        "fields": {
          "english": { 
            "type":     "text",
            "analyzer": "english"
          }
        }
      }
    }
  }
}

Of importance is the line with “english” inside of fields. In order to use analyzer: “english” must we install a custom plugin?

Thank you,

I have found according to Elastic they have it now as a built in analyzer. Is it true that I would need to somehow find this analyzer and implement as a custom analyzer for open search?

Hi, You can also use a language value with the analyzer field in OpenSearch.
Here is the reference documentation. . see full-text queries, Options table.
hope that helps

1 Like

@developerbear - I don’t think you’d have to do that. Try checking out the documentation here - it shows the various analyzers that we allow. I realize it says <language> - I’ll see if I can’t get a full list of languages we support, but english should be supported without a plugin.

Have you tried create this mapping on OpenSearch and received an error?

Nate

1 Like

Thank you. Yeah that was my next question. What languages but you knew already :). Also thanks for the reply, looking forward to learning more about Open Search. Better than some places where there is no response.

1 Like

Thanks for that.

I am slightly confused now. You see, in documentation this is done during the query, you specify the analyzer for the language. For my case I’m using it for mapping before the index is made then it can be used for the query. In your case you’re saying if I explicitly map something, it is not required, the English analyzer is native to text search?

If if it’s native, hey that is even better. But if it comes to other languages, would we need to specify this or what languages are also built into the code.

Thanks again. So based on what I’m learning you’re saying we don’t have to specify a mapping for this if we want to explicitly define mappings and it should work for any field with type “keyword”?

Elastic for some reason shows their example by specifying it in the mapping then using it in a query.

For more clarity, the example is the last one here:

1 Like

I’m not sure whether it needs to be specified in the mapping or at query time or both, although I imagine a bit of testing would reveal the answer to that.

This would be an awesome subject for some learning material. Do you write? :rofl:

(seriously - we’re always accepting community blog posts if there’s use cases and/or learning material that’s helpful to everyone)

Nate

1 Like

Yeah I was hoping someone knew the answer write off the bat because it is difficult to find. Actually I was searching originally for all the different types of “fields” that could be used when doing explicit mappings. The functionality looked like something I could use on my own test project, so I was hoping there was some answers. I checked Elastic’s Git and they’ve had that library since around 2017-2018, so maybe its already implement but I’ll have to do more testing…

In regards to writing - maybe. Just really not sure what that would entail and how much I signed up for more than I bargained for like anything in life. :smile:

I write you edit? Haha.

1 Like

from the top of my head:
if you define an analyzer in the mapping it’s used on the document content when ingesting new documents, i.e. the content gets analyzed and the transformed values get stored in the index (so you can query them later). if you define an analyzer in a query it’s used on the query string.

AFAIK this also means that if you use two completely separate analyzers it might be that you don’t get a match even though you enter the same text as in the document. i’d have to test around for a real example, but a made-up example might be that you have an analyzer which transforms “kittens” to “cat” (plural → singular + dictionary) in the indexing phase but you use another analyzer in the query which transforms “kittens” to “kitten” (only plural → singular). so you’re essentially searching for “kitten” and won’t get a match because you’d have to search for “cat”.

here’s an a random article from one of the first search results: Introduction to Analysis and analyzers in Elasticsearch | by Arun Mohan | elasticsearch | Medium
note that most documentation about elasticsearch still applies to OpenSearch as the latter is based on Elasticsearch 7.10.2 and not much has diverged (yet).

and yes, the OpenSearch docs could go into a lot more details here :slight_smile: (though i’m no expert on it, so i’m the wrong one to contribute this).

2 Likes

I am just getting back to this topic. I found out that the index using analyzer of English does work when you go to explicitly map things out.

Searching works similar to Elastic’s example :slight_smile: .


{
  "indexName": "listings1a",
  "searchBody": {
    "query": {
      "multi_match": {
        "query": "foxes",
        "fields": [
          "Title.raw",
          "Title.english"
        ]
      }
    }
  }
}

Where Title was the examples test.

This is good at least it works so I cannot complain. Seems that results are very fast once you use the analyzer for finding the base of words.

I know documentation and writing is still in the works, but I think if we had more examples of fields in mappings it might help others realize the potential of what can be done because many people like to organize the data before starting a database.

Thus they will know the data types and other info about each property before creating things.

This is helpful.

I ran a test though using two different indexes. For some reason when I use analyzer (without explicitly specifying analyzer in the mapping), I get different results compared to what I expect. I expect the analyzer to do the same thing as when I explicitly defined it, but instead, it only shows 1 result which is the base.

In the examples case it was only coming back with Fox rather than Foxes and Fox. I am not exactly sure why this is. But I think explicitly mapping it out might have an advantage here if you need this specific functionality.

Maybe someone else can repeat the test and re-verify, but I’ve tried in various ways to include the analyzer in the query, and it only comes back with the base phrase (which means the analyzer works) instead of both the base phrase and the full phrase.

Thanks for all of this discourse! We should file a github issue under the documentation-website repo pointing to this thread and asking for more clarifying documentation.

Great stuff!

Nate

Hi @developerbear,
I’ve tested your query (map and specify lang. analyzer) in the Dashboards and it works with each language we support. You can explicitly specify a language with the
“analyzer”: “” syntax. We support these language values for the Analyzer (docs to be updated shortly!) :
english
arabic
armenian
basque
bengali
brazilian
bulgarian
catalan
czech
danish
dutch
english
estonian
finnish
french
galician
german
greek
hindi
hungarian
indonesian
irish
italian
latvian
lithuanian
norwegian
persian
portuguese
romanian
russian

1 Like