## Summary
This document proposes extending the OpenSearch Migration Assistant …to support cloud-agnostic deployments, enabling customers running OpenSearch on GCP, Azure, or bare-metal Kubernetes to use the same migration tooling currently available to AWS users. The changes build on existing abstractions in the codebase rather than requiring architectural changes.
## Motivation
The Migration Assistant is the most mature open-source tool for migrating Elasticsearch clusters to OpenSearch. However, its deployment and storage layers are currently tied to AWS services, which limits adoption among the significant portion of the OpenSearch community running on other cloud providers or on-premises infrastructure.
Aiven is an AWS partner that operates managed data infrastructure across AWS, GCP, Azure, and smaller regional cloud providers. Aiven is truly cloud-agnostic — our customers choose the provider that best fits their needs, and we deploy workloads wherever they need them. Our OpenSearch customers migrating from Elasticsearch need a reliable migration path regardless of where their target cluster runs. Today, customers running on non-AWS providers simply cannot use the Migration Assistant. This proposal does not seek to diminish the AWS deployment path — it seeks to extend the same quality of migration tooling to the broader OpenSearch community.
Making the Migration Assistant cloud-agnostic would:
- Expand the tool's addressable user base significantly
- Align with OpenSearch's identity as a vendor-neutral, community-driven project
- Enable managed service providers to offer integrated migration tooling
- Reduce the barrier to OpenSearch adoption for non-AWS users
## Current State
The codebase already has meaningful abstraction points that make this work tractable:
### Existing Cloud-Agnostic Layers
- **Core pipeline** (`coreUtilities`, `transformation`, `RfsCommon`, `RfsPipeline`, `MetadataMigration`) has zero AWS dependencies
- [`SourceRepo`][SourceRepo] interface in `RfsCommon` abstracts snapshot storage with [`FileSystemRepo`][FileSystemRepo] and [`S3Repo`][S3Repo] implementations
- [`SnapshotCreator`][SnapshotCreator] abstract class supports filesystem and [`S3SnapshotCreator`][S3SnapshotCreator] variants
- [`RequestTransformer`][RequestTransformer] / [`IAuthTransformerFactory`][IAuthTransformerFactory] interfaces make authentication pluggable (SigV4, BasicAuth, NoAuth)
- [`BlobSource`][BlobSource] functional interface in `RfsCommon` further abstracts blob reading from any storage backend
- **Python console** uses a strategy/factory pattern ([`factories.py`][factories]) with ECS, Kubernetes, and Docker backends for orchestration ([`backfill_base.py`][backfill_base], [`replayer_base.py`][replayer_base])
- **Prometheus metrics** source already exists alongside CloudWatch ([`metrics_source.py`][metrics_source])
- **Kubernetes Secrets** support exists alongside AWS Secrets Manager ([`cluster.py`][cluster])
- **Helm charts** in [`deployment/k8s/`][helm] include `valuesForLocalK8s.yaml` and AWS-specific toggles
### Remaining AWS-Coupled Areas
1. [`S3Repo`][S3Repo] in `SnapshotReader` — hardcodes `S3AsyncClient` construction (but implements [`SourceRepo`][SourceRepo] interface)
2. [`S3TupleSink`][S3TupleSink] in `TrafficCapture/tupleSink` — no abstraction layer for tuple output destination
3. **`TrafficReplayer`** main class — inline `S3AsyncClient` creation for tuple output
4. **`captureKafkaOffloader`** — MSK IAM auth with no alternative
5. **Helm charts** — some AWS-specific defaults and assumptions remain in the non-toggled paths
## Proposal
### Phase 1: Cloud-Agnostic Object Storage
**Goal:** Enable snapshot reading and tuple storage on GCS and Azure Blob Storage.
#### 1a. Snapshot Storage Backends
Implement [`SourceRepo`][SourceRepo] for GCS and Azure Blob Storage in `SnapshotReader`:
- `GcsRepo implements SourceRepo` — using the Google Cloud Storage client library
- `AzureBlobRepo implements SourceRepo` — using the Azure Storage Blob SDK
Implement corresponding [`SnapshotCreator`][SnapshotCreator] subclasses:
- `GcsSnapshotCreator extends SnapshotCreator`
- `AzureBlobSnapshotCreator extends SnapshotCreator`
The existing [`SourceRepo`][SourceRepo] and [`SnapshotCreator`][SnapshotCreator] abstractions mean no changes to `RfsCommon`, `RfsPipeline`, or `DocumentsFromSnapshotMigration` are required.
#### 1b. Tuple Sink Abstraction
Introduce a `TupleSink` interface (or extend the existing sink pattern) with:
- [`S3TupleSink`][S3TupleSink] (existing, refactored behind interface)
- `GcsTupleSink`
- `AzureBlobTupleSink`
- `FileSystemTupleSink` (useful for local testing)
Update `TrafficReplayer` to accept the sink via dependency injection rather than constructing `S3AsyncClient` inline.
#### 1c. Python Console Snapshot Support
Add `GcsSnapshot` and `AzureBlobSnapshot` implementations alongside the existing `S3Snapshot` and `FileSystemSnapshot` in the console's [factory dispatch][factories] ([`snapshot.py`][snapshot]).
### Phase 2: Cloud-Agnostic Kafka Authentication
**Goal:** Remove the hard dependency on MSK IAM auth for Kafka connectivity.
- Make `captureKafkaOffloader` support SASL/SCRAM and mTLS as first-class authentication methods (these are standard Kafka auth mechanisms supported by all managed Kafka providers)
- Ensure the Helm charts can configure Kafka brokers without MSK-specific properties
- The console's `StandardKafka` and `ScramKafka` classes already handle this on the Python side ([`kafka.py`][kafka])
### Phase 3: Infrastructure Provisioning and Lifecycle
**Goal:** Provide full infrastructure automation — provisioning and deprovisioning — on GCP, Azure, and other providers, matching the turnkey experience that the AWS CDK path provides today.
The AWS implementation provisions the entire stack (VPC, ECS/Fargate, MSK, S3, ALB, IAM, security groups) and tears it all down when the migration is complete. Non-AWS users must have the same experience: a single command to stand up all migration infrastructure, and a single command to remove it.
The existing repo structure already supports this cleanly. Deployment methods are organized as siblings:
```
deployment/
├── cdk/ # AWS CDK (TypeScript)
├── k8s/ # Helm charts (Kubernetes)
├── migration-assistant-solution/ # AWS one-click CloudFormation
└── terraform/ # NEW — multi-cloud provisioning
├── gcp/
├── azure/
└── modules/ # Shared modules (K8s config, Helm install)
```
Adding `deployment/terraform/` is purely additive — the CDK and CloudFormation paths remain untouched.
#### 3a. Infrastructure Provisioning Modules
Create infrastructure-as-code modules that provision the complete migration infrastructure per cloud provider:
- **GCP:** GKE cluster, VPC, Cloud Storage bucket, firewall rules, IAM service accounts, Kafka (self-managed on GKE or Confluent Cloud)
- **Azure:** AKS cluster, VNet, Blob Storage container, NSGs, managed identities, Kafka (self-managed on AKS or Event Hubs)
- **Generic/bare-metal:** Documentation and scripts for environments without a managed K8s offering
Each module should:
- Accept a minimal configuration (source endpoint, target endpoint, auth credentials, cloud region)
- Provision a K8s cluster with the correct storage classes, networking, and access to source/target clusters
- Install the Helm charts with the appropriate provider-specific values
- Expose a single teardown command that cleanly deprovisions all resources
#### 3b. Refactor Helm Charts for Multi-Cloud
The existing charts already use a base + overlay pattern with AWS-specific templates gated behind `aws.configureAwsEksResources` in a dedicated `templates/resources/aws/` directory. The refactoring formalizes and extends this pattern.
**Add cloud-specific resource directories and values overlays:**
```
templates/resources/
├── aws/ # Already exists, gated by aws.configureAwsEksResources
├── gcp/ # NEW, gated by gcp.configureGkeResources
├── azure/ # NEW, gated by azure.configureAksResources
└── objectStore/ # Refactored from s3/, with cloud-conditional templates
valuesEks.yaml # Already exists
valuesGke.yaml # NEW
valuesAks.yaml # NEW
```
**Specific changes:**
- **Object storage:** Refactor `templates/resources/s3/` (which currently uses `aws s3` CLI directly for bucket creation and deletion) into a cloud-conditional `objectStore/` directory with templates for GCS and Azure Blob alongside the existing S3 templates
- **Storage classes:** Add GCP Persistent Disk (`pd-ssd`) and Azure Managed Disk (`premium-lrs`) StorageClass definitions in the new cloud-specific directories, following the existing `aws/gp3StorageClass.yaml` pattern
- **Node autoscaling:** Gate existing Karpenter NodePool/NodeClass templates and affinity rules so they only apply on EKS. Add equivalent configurations for GKE Node Auto-Provisioning and AKS Cluster Autoscaler
- **Argo Workflows artifact storage:** Currently configured for S3. Add GCS and Azure Blob as artifact repository options, selectable via the values overlay
- **Observability:** The base OTEL collector config already uses generic Prometheus/OTLP exporters. Ensure the GKE and AKS overlays configure appropriate exporters (e.g., Google Cloud Monitoring, Azure Monitor) rather than inheriting the CloudWatch EMF exporters from the EKS overlay
- **Certificate management:** The default path already uses a cloud-agnostic self-signed CA chain via cert-manager. AWS PCA integration is opt-in. No changes needed for non-AWS providers unless they want to integrate with Google CA Service or Azure Key Vault
### Phase 4: End-to-End Validation
**Goal:** Verify the complete migration workflow — including infrastructure provisioning and teardown — on non-AWS providers.
- Establish a CI test matrix covering GKE and AKS deployments
- Test the full lifecycle: provision infrastructure, run migration (metadata + backfill + live capture), deprovision
- Test with Aiven for OpenSearch as the target cluster
- Document provider-specific configuration and troubleshooting
## What This RFC Does NOT Propose
- **Rewriting the CDK deployment** — the AWS CDK path remains as-is for AWS-native users
- **Removing AWS support** — all changes are additive
- **New migration capabilities** — the scope is portability of existing features, not new functionality
## Aiven's Commitment
Aiven is prepared to:
- Contribute engineering resources to implement and test these changes
- Maintain the GCS and Azure storage backends going forward
- Provide CI infrastructure for multi-cloud testing
- Contribute documentation for non-AWS deployment scenarios
## Open Questions
1. **Module organization:** Should GCS/Azure implementations live in `SnapshotReader` alongside `S3Repo`, or in separate Gradle modules (e.g., `SnapshotReaderGcs`, `SnapshotReaderAzure`) to avoid adding cloud SDK dependencies to the default build?
2. **Helm chart structure:** Should provider-specific values files live in the main chart or as separate sub-charts?
3. **CI ownership:** How should multi-cloud CI be structured? Aiven can host the GCP/Azure test infrastructure, but the test definitions should live in the main repo.
4. **Release cadence:** Should multi-cloud support be gated behind a feature flag initially, or ship as generally available from the start?
5. **IaC tooling:** Terraform is the assumed default for infrastructure provisioning in Phase 3, but alternatives like OpenTofu, Pulumi, or Crossplane are also viable. What does the community prefer?
## References
- [OpenSearch Migration Assistant Documentation](https://docs.opensearch.org/3.0/migration-assistant/)
- [opensearch-project/opensearch-migrations](https://github.com/opensearch-project/opensearch-migrations)
- [Existing K8s Helm charts](https://github.com/opensearch-project/opensearch-migrations/tree/main/deployment/k8s)
[SourceRepo]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/RfsCommon/src/main/java/org/opensearch/migrations/bulkload/common/SourceRepo.java#L6
[BlobSource]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/RfsCommon/src/main/java/org/opensearch/migrations/bulkload/common/BlobSource.java#L10-L11
[S3Repo]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/SnapshotReader/src/main/java/org/opensearch/migrations/bulkload/common/S3Repo.java#L23
[FileSystemRepo]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/SnapshotReader/src/main/java/org/opensearch/migrations/bulkload/common/FileSystemRepo.java#L12
[SnapshotCreator]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/RFS/src/main/java/org/opensearch/migrations/bulkload/common/SnapshotCreator.java#L15
[S3SnapshotCreator]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/RFS/src/main/java/org/opensearch/migrations/bulkload/common/S3SnapshotCreator.java#L10
[RequestTransformer]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/RfsHttp/src/main/java/org/opensearch/migrations/bulkload/common/http/RequestTransformer.java#L9
[IAuthTransformerFactory]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/TrafficCapture/trafficReplayer/src/main/java/org/opensearch/migrations/transform/IAuthTransformerFactory.java#L7
[S3TupleSink]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/TrafficCapture/tupleSink/src/main/java/org/opensearch/migrations/replay/sink/S3TupleSink.java#L38
[factories]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/factories.py#L48
[backfill_base]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/backfill_base.py#L28
[replayer_base]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/replayer_base.py#L52
[metrics_source]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/metrics_source.py#L66
[cluster]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/cluster.py#L130
[snapshot]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/snapshot.py#L74
[kafka]: https://github.com/opensearch-project/opensearch-migrations/blob/cae2480e4f205a64fe09f45974a7ffb114dcdda7/migrationConsole/lib/console_link/console_link/models/kafka.py#L158
[helm]: https://github.com/opensearch-project/opensearch-migrations/tree/cae2480e4f205a64fe09f45974a7ffb114dcdda7/deployment/k8s