I am running this in docker without -d and logs have not been much help. I have actually worked out what the problem was - even though I thought I was deploying the model(doing the setup via a script) it was not being deployed - deploying the model manually afterwards seems to have made it work
I do still get this error occasionally. I have got running a index with a neural search and a search pipeline which uses a huggingface model for the embedding and openai for the chat completion - generally it works quite well but I do still get the error a I first mentioned sometimes
I think another reason you may get this error is that the context contains to many tokens. I have reduced the context value from 5 to 2 and I get the error a lot less