Forming an OpenSearch 2 cluster with transport security enabled (mTLS) and a custom certificate authority

Why?

Although I've supported Elasticsearch/OpenSearch for a number of years in my professional life, I've never had the opportunity to use the security plugins.

All that changed rather abruptly when we decided to run OpenSearch on Kubernetes using the OpenSearch Kubernetes Operator. Suddenly, the security plugins are required.

Rather than try to figure out the chart and the security plugins at the same time, I decided to deploy the security plugins in a more traditional VM-based OpenSearch deploy first. Was that a good idea? You can skip to "conclusions" if you're curious ;) . Otherwise, let's get into the details.

Environment

3x Debian 12 VMs with 2.5 GB RAM, 1 vCPU, and 40 GB disk running in Proxmox Virtual Environment. I've installed OpenSearch via its Debian packages.

OpenSearch Security Concepts That Took Me Awhile to Understand

AKA, "Things I wish I'd've known before I got started."

1. Making changes to the config files and restarting OpenSearch is not enough to apply changes

After updating the security config (a bunch of YAML files found in /etc/opensearch/opensearch-security if you installed via package), you have to manually run the securityadmin.sh shell script as described here. The script arguments are not very easy to follow, so let's break down the exact arguments I'm using in my lab:

# apply-security-admin.sh
#!/usr/bin/env bash
/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh \
  --resolve-env-vars \
  --hostname {{ ansible_hostname }} \
  -cacert {{ cert_path }}/admin.ca.crt \
  -cert {{ cert_path }}/admin.crt \
  -key {{ key_path }}/admin.key \
  --clustername {{ cluster_name }} \
  --accept-red-cluster \
  --diagnose \
  -cd /etc/opensearch/opensearch-security/
  • --resolve-env-vars: Source environment variables
  • --hostname: The hostname you'll use when attempting a connection. This is not only important from a network standpoint, but also from a mutual TLS standpoint (see below).
  • cacert, cert, key: this should tip you off that we're using mTLS to authenticate (sadly, it took me awhile to realize this.)
  • --clustername: self-explanatory
  • --accept-red-cluster: Don't check cluster status before applying security updates. This could be dangerous if you apply security updates because the security config actually lives in an index. If your cluster nodes don't agree on the security index state, users could lose access to your cluster. This is not a good setting for production clusters.
  • --diagnose: Debug-level output. Causes securityadmin.sh to produce a log called securityadmin_diag_trace_${TIMESTAMP}.txt.
  • -cd: The directory that contains the OpenSearch security config files.

2. The securityadmin script authenticates via the transport (cluster) port via mutual TLS

How does a script that needs to set up security keep from locking itself out? How can it work securely without an existing security config?

OpenSearch cleverly sidesteps these potential problems by:

  • Logging in via the transport port instead of the HTTP API.

  • Using auth settings that live directly in the main opensearch.yml instead of within the security config files. Specifically, these settings:

  • plugins.security.authcz.admin_dn: Controls which distinguished names (DN) are considered admins. The security admin script must present a certificate matching one of the DNs, otherwise it will be rejected by the cluster. These DNs can't overlap with DNs defined in plugins.security.nodes_dn ; OpenSearch will reject the connection as insecure if you try to use the same certs for both.

  • plugins.security.nodes_dn: Controls which distinguished names (DN) are considered part of the cluster. The cluster will only allow nodes that present certificates matching these DNs to join the cluster.

Troubleshooting in the "Real World"

You're probably not going to run the security demo in production. Since the docs don't cover much beyond that, there's a period of trial and error. Luckily, the OpenSearch APIs are very good about producing helpful error messages. Here's a few problems I had, along with their solutions:

1. Using the node certificate instead of admin

apply-security-admin.sh is the wrapper script I shared above. Here's a what happens when you try to run it with a node certificate:

[root@opensearch-α ~]# ./apply-security-admin.sh
WARNING: nor OPENSEARCH_JAVA_HOME nor JAVA_HOME is set, will use /usr/bin/java
Security Admin v7
Will connect to opensearch.service.rl:9200 ... done
Connected as "[email protected]"
ERR: "[email protected]" is not an admin user
Seems you use a node certificate. This is not permitted, you have to use a client certificate and register it as admin_dn in opensearch.yml

According to the error messages, this used to be allowed, but recent versions of OpenSearch don't let you use the same certs for node and admin operations.

You can verify the distinguished name using openssl or any number of tools. I prefer using the gnutls utility, certtool:

[root@opensearch-α ~]# cat /etc/ssl/certs/node.crt | certtool -i | grep "Subject:"
        Subject: CN=node@opensearch-a.host.rl

2. Private Key format for OpenSearch must be pkcs8

As a java application, OpenSearch allows the use of Java keystore for PKI. However, the developers were nice enough to support the more common PEM certificate formats as well. But the private key format needs to be pkcs8 instead of the more common sec1 format.

What does success look like?

Once everything is working, the cluster will form. You should see log lines similar to

.98.61}{172.19.98.61:9300}{dimr}{shard_indexing_pressure_enabled=true} elect leader, {opensearch-a.host.rl}{lSsEWeGyTY-xjphhPy0Emg}{hAFesxHIQ7WxBAw6qzX7
EQ}{172.19.98.60}{172.19.98.60:9300}{dimr}{shard_indexing_pressure_enabled=true} elect leader, _BECOME_CLUSTER_MANAGER_TASK_, _FINISH_ELECTION_], term:
13, version: 200, delta: cluster-manager node changed {previous [], current [{opensearch-a.host.rl}{lSsEWeGyTY-xjphhPy0Emg}{hAFesxHIQ7WxBAw6qzX7EQ}{172.
19.98.60}{172.19.98.60:9300}{dimr}{shard_indexing_pressure_enabled=true}]}

At which point, you should be able to access the cluster via its HTTP REST API:

curl -u ${PW} https://opensearch.service.rl:9200/_cat/nodes
172.19.98.60 47 85 1 0.11 0.08 0.03 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-a.host.rl
172.19.98.61 26 82 2 0.20 0.11 0.03 dimr cluster_manager,data,ingest,remote_cluster_client * opensearch-b.host.rl
172.19.98.62 15 86 4 0.01 0.02 0.00 dimr cluster_manager,data,ingest,remote_cluster_client - opensearch-c.host.rl

Conclusions

So, was my decision to avoid Kubernetes and try to figure out OpenSearch security plugins in a more familiar, VM-based environment a good one? Well...yes and no.

The Kubernetes operator runs securityadmin.sh on your behalf. Observing the operator doing its thing actually helped me realize the workflow. But I'm still glad I did it, because now I know I have some idea what it might look like to enable security plugins on a production cluster. And I hope it helps you as well!