Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.3
Describe the issue:
Trying to write to AOSS 2.3 from Pyspark / Scala using the following Opensearch connector
However keep receiving the following exceptions:
An error was encountered: An error occurred while calling o278.save. : org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: org.opensearch.spark.sql. Please find packages at
[https://spark.apache.org/third-party-projects.html`](https://spark.apache.org/third-party-projects.html`). at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:725) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:647) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:697) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:863) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:257) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.ClassNotFoundException: org.opensearch.spark.sql.DefaultSource at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:633) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:633) at scala.util.Failure.orElse(Try.scala:224) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:633) … 15 more`
Configuration:
Steps to Reproduce:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, from_json
spark = SparkSession.builder.appName(“OpenSearchExample”).config(“spark.jars”, “home/opensearch-spark-2.13-1.2.0-SNAPSHOT.jar”).getOrCreate()
df = spark.createDataFrame([(1, “value1”), (2, “value2”)], [“id”, “value”])
df.show()
df.write
.format(“org.opensearch.spark.sql”)
.option(“inferSchema”, “true”)
.option(“opensearch.nodes”, “https://xxxxxxx.us-east-1.aoss.amazon.com”)
.option(“opensearch.port”, “9200”)
.option(“opensearch.net.http.auth.user”, “admin”)
.option(“opensearch.net.http.auth.pass”, “admin”)
.option(“opensearch.net.ssl”, “true”)
.option(“opensearch.net.ssl.cert.allow.self.signed”, “true”)
.option(“opensearch.batch.write.retry.count”, “9”)
.option(“opensearch.http.retries”, “9”)
.option(“opensearch.http.timeout”, “18000”)
.mode(“append”)
.save(“test-index”)
Relevant Logs or Screenshots: