SpecMesh OS - Governance with AWS MSK, August 2023
– Neil Avery (Ex-Confluent)
Working with MSK
A walkthrough showing how SpecMesh hides topic provisioning and messy Kafka ACL configuration to create an effortless experience
Confluent Cloud and AWS MSK are the two main players in the world of Cloud-based Kafka. While MSK is introducing features like tiered storage, connectors, and making it more AWS-native (Lambdas), the IAM-based governance supported by MSK is often considered clunky and cumbersome. Confluent’s RBaC also has its limitations. However, I’m going to show you how SpecMesh changes all of this. It is designed to work on any Apache Kafka runtime that supports the SimpleAclAuthorizer.
Technical TLDR;
Run a cluster; associate user/secrets/principles in the Secrets Manager with the cluster; use these user/secret pairs in the client connections. SpecMesh will create the ACLs to ensure Access control works and you have self-service governance!
A really short Kafka security primer
The importance of governance:
- Control access to your company data
- Rich control mechanisms for Read/Write/Configure access of topic data
- Self-governance mechanisms are needed - scale the organisation, don’t create central team congestion
- Should be simple; most ACL/Governance is anything but simple
Kafka Security is multi-faceted:
- Authorization – who? - aka the
principle
oruser
- Access – what? resource and how are ‘they’ trying to access it)
- Wire encryption
- Encryption at rest
Whats the problem?
Kafka governance is painful and complicated and good tools don’t really exist. CFLT wrap a set of predefined roles into what they call RBaC - but it’s nothing as comprehensive as AWS IAM. Unfortunately, IAM, while being incredibly powerful and technically superior - is also incredibly finicky and challenging to setup and get right.
MSK IAM
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kafka-cluster:Connect",
"kafka-cluster:AlterCluster",
"kafka-cluster:DescribeCluster"
],
"Resource": [
"arn:aws:kafka:region:Account-ID:cluster/MSKTutorialCluster/*"
]
},
{
"Effect": "Allow",
"Action": [
"kafka-cluster:*Topic*",
"kafka-cluster:WriteData",
"kafka-cluster:ReadData"
],
"Resource": [
"arn:aws:kafka:region:Account-ID:topic/MSKTutorialCluster/*"
]
},
{
"Effect": "Allow",
"Action": [
"kafka-cluster:AlterGroup",
"kafka-cluster:DescribeGroup"
],
"Resource": [
"arn:aws:kafka:region:Account-ID:group/MSKTutorialCluster/*"
]
}
]
}
Confluent RBaC API https://docs.confluent.io/platform/current/security/rbac/rbac-config-using-rest-api.html
confluent iam rbac role-binding create \
--principal User:<my-user-name> \
--role SystemAdmin \
--kafka-cluster-id <kafka-cluster-id>
Wouldn’t it be nice if governance had something as simple as a shared file system to configure - say
$ chmod ugo+RWX /usr/myapp/stuff
SpecMesh achieves this via the use of structured topics that are part of the AsyncAPI spec for your app. It is a collection of structured topics that are modelled in the spec. The template goes
<domain-id><public|private|protected><topic-name>
A spec looks like this:
id: 'urn:acme.lifestyle.onboarding'
channels:
_public.user_signed_up:
publish:
message:
payload:
$ref: "/schema/simple.schema_demo._public.user_signed_up.avsc"```
Opinions matter
Due to Kafka’s lack of opinionated topic structures, org’s will choose something that works for them, or often times nothing at all – flat topic structures are the norm. This also means that its unlikely that there is no obvious relationship between ACLs and topics - that relationship might be stored in code or scripts. SpecMesh forces the use of structure - even without SpecMesh this is something that many orgs already do! Hint: its common to public
, private
keywords to help ACL writing and formulation.
MSK security options for Authentication and Authorization
AWS MSK supported security mechanisms include:
- mTLS: https://catalog.workshops.aws/msk-labs/en-US/securityencryption/tlsmauth
- SASL/SCRAM: (authentication) https://catalog.workshops.aws/msk-labs/en-US/securityencryption/saslscram
- IAM: https://catalog.workshops.aws/msk-labs/en-US/securityencryption/iam
SpecMesh works with vendor agnostic SASL/SCRAM authentication (cross-vendor, multi-language) _ACLs (access control) - but not IAM (authentication and access control).
MSK has a few limitations
- AWS want you to use IAM - however this only works with Java clients using their jars
- Non-Java clients (RUST, Go) will need to use SASL/SCRAM with MSK - IAM wont work without building your own IAM integration.
- IAM isnt that easy to setup and/or automate
- Broker property: super.users parameter is not supported on MSK (this property bypasses all ACL Checks - see https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala). A workaround is to use multi-authentication (below)
MSK Multi-Authentication - SASL/SCRAM + IAM
When multi-authentication is enabled on MSK cluster authorization depends on which of those access control methods a client is using to access MSK cluster.
Let’s consider the following example: both IAM and SASL/SCRAM are enabled, and ‘client A’ is accessing the MSK cluster via IAM authentication, while ‘client B’ is accessing the cluster via SASL/SCRAM. In this scenario, you can still invoke Apache Kafka ACL APIs and add ACLs for an MSK cluster that uses IAM access control. However, ACLs stored in Apache ZooKeeper will have no effect on authorization for IAM roles. Therefore, access and authorization for ‘client A,’ which uses IAM auth, will be controlled by the IAM policy alone, as the added ACLs do not affect ‘client A’ in any way, even though they are added.
On the other hand, when a client is using non-IAM authentication, the added ACLs (including the “allow.everyone.if.no.acl.found” setting) will have an effect. In this case, authorization will be controlled by ACLs. So, when ‘client B,’ which uses SASL/SCRAM, attempts to perform any operations, it will be validated against the ACLs that were added.
In short, to fill in the gaps in the above table:
Authn & Authz mech | Kafka client authn | Kafka client authz | Kafka ACL behaviour | Property allow.everyone.if.no.acl.found | |
---|---|---|---|---|---|
SASL/IAM clients | SASL/IAM | IAM | No effect | No effect | |
SASL/SCRAM clients | SASL/SCRAM | ACLs | Applies/Does have an effect | Applies/Does have an effect |
Follow on using a provided Spec
Ideally, you can follow these steps with your own Spec. Grab some from the SpecMesh ApacheKafka demo repository and create your own repo.
asyncapi: '2.5.0'
id: 'urn:acme.simple_range.life_enhancer'
info:
title: ACME Life Enhancer
version: '1.0.0'
description: |
ACMEs Life enhancer records and predicts how ones life will change due to many events that are experienced - see http://acme.org/life_range for more info
license:
name: Apache 2.0
url: 'https://www.apache.org/licenses/LICENSE-2.0'
servers:
test:
url: test.mykafkacluster.org:8092
protocol: kafka-secure
description: Test broker
channels:
_public.user_signed_up:
bindings:
kafka:
envs:
- staging
- prod
partitions: 3
replicas: 1
configs:
cleanup.policy: delete
retention.ms: 999000
publish:
summary: Inform about signup
operationId: onSignup
message:
bindings:
kafka:
schemaIdLocation: "payload"
schemaFormat: "application/vnd.apache.avro+json;version=1.9.0"
contentType: "application/octet-stream"
payload:
$ref: "/schema/acme.simple_range.life_enhancer._public.user_signed_up.avsc"
Requirements
- Admin access to your AWS console (ability to create MSK cluster, start an instance, configure IAM, roles, start instances, configure secrets etc)
- Basic understanding of Kafka broker (broker properties), and client eco-system (producer, consumer, client.properties)
SASL/SCRAM on MSK has the following limitations:
super.users
parameter is not supported- MSK only supports SCRAM-SHA-512 authentication
- An MSK cluster can have up to 1000 users
- You must use an AWS KMS key with your Secret
- You cannot use a Secret that uses the default Secrets Manager encryption key with Amazon MSK
- You can’t use an asymmetric KMS key with Secrets Manager
- You can associate up to 10 secrets with a cluster at a time using the BatchAssociateScramSecret operation
- The name of secrets associated with an Amazon MSK cluster must have the prefix
AmazonMSK_
- Secrets associated with an Amazon MSK cluster must be in the same Amazon Web Services account and AWS region as the cluster
Source: https://docs.aws.amazon.com/msk/latest/developerguide/msk-password.html#msk-password-limitations
Steps
1. Start your MSK Kafka cluster with SASL/SCRAM AND IAM
- Sign in to the AWS Management Console and open the Amazon MSK console
- Choose Create cluster
- Enter a cluster name, and leave all other settings unchanged
- From the table under All cluster settings, copy the values of the following settings and save them because you need them later in this tutorial: VPC, Subnets, Security groups associated with VPC
- Choose Create cluster
Note: Creation will take about 15 minutes.
Later we will change the default env configuration for the SimpleAclAuthorizer: KAFKA_ALLOW_EVERYONE_IF_NO_ACL_FOUND: “TRUE”
Source: https://docs.aws.amazon.com/msk/latest/developerguide/msk-configuration.html
2. Make the cluster public and enable SASL + IAM
a. Navigate to the AWS MSK console
b. Choose the MSK cluster you just created in Step 1
c. Click on the Properties tab
d. In the Security settings section, choose Edit
e. Check the checkbox next to SASL/SCRAM authentication and IAM
f. Click Save changes
You can find more details about updating a cluster’s security configurations here.
Create a Symmetric Key
a. Now go to the AWS Key Management Service (AWS KMS) console
b. Click Create Key
c. Choose Symmetric and click Next
d. Give the key and Alias and click Next
e. Under Administrative permissions, check the checkbox next to the AWSServiceRoleForKafka and click Next
f. Under Key usage permissions, again check the checkbox next to the AWSServiceRoleForKafka and click Next
g. Click on Create secret
h. Review the details and click Finish
You can find more details about creating a symmetric key here.
Store a new Secret
a. Go to the AWS Secrets Manager console
b. Click Store a new secret
c. Choose Other
type of secret (e.g. API key) for the secret type
d. Under Key/value pairs click on Plaintext
e. Paste the following in the space below it and replace
{
"username": "<your-username>",
"password": "<your-password>"
}
f. On the next page, give a Secret name that starts with AmazonMSK_
g. Under Encryption Key, select the symmetric key you just created in the previous sub-section from the dropdown
h. Go forward to the next steps and finish creating the secret. Once created, record the ARN (Amazon Resource Name) value for your secret
You can find more details about creating a secret using AWS Secrets Manager here.
Associate secrets with MSK cluster
a. Navigate back to the AWS MSK console and click on the cluster you created in Step 1
b. Click on the Properties tab
c. In the Security settings section, under SASL/SCRAM authentication, click on Associate secrets
d. Paste the ARN you recorded in the previous subsection and click Associate secrets
Create the cluster’s configuration
a. Go to the AWS CloudShell console
b. Create a file (eg. msk-config.txt) with the following line
allow.everyone.if.no.acl.found = false
c. Run the following AWS CLI command, replacing
aws kafka create-configuration --name "MakePublic" \
--description "Set allow.everyone.if.no.acl.found = false" \
--kafka-versions "2.6.2" \
--server-properties fileb://<config-file-path>/msk-config.txt
You can find more information about making your cluster public here.
3. Create a client machine (using IAM)
If you already have a client machine set up that can interact with your cluster, then you can skip this step. If not, you can create an EC2 client machine and then add the security group of the client to the inbound rules of the cluster’s security group from the VPC console. You can find more details about how to do that here.
4. Install Apache Kafka on Client machine
We install Apache Kafka to test and check ACLs, Topic creation etc. You can find more information about how to do that here.
5. Create the domain/user secret (principle) for use with SASL/SCRAM
SpecMesh specs have an ‘id’ - this ‘id’ is used as the principle (user). This way SpecMesh will automatically generate and apply ACLs that govern which ‘id’s can access various topics.
Create the SASL/SCRAM secrets for each id and associate them with the cluster.
https://catalog.workshops.aws/msk-labs/en-US/securityencryption/saslscram/authorization
With SASL/SCRAM the username/secret is stored within the secrets-manger. The username is is also called the principle
, it and the secret are provided to clients (via properties files). Each secret is associated with your cluster. When the client connects to MSK, the broker retrieves the credentials from SecretsManager and applies the associated ACLs
6. Create client properties files with security credentials
Configure Kafka (Java) clients with the appropriate SASL/SCRAM credentials as shown on the following pages
https://catalog.workshops.aws/msk-labs/en-US/securityencryption/saslscram/authorization#client-setup
Associate secrets: https://catalog.workshops.aws/msk-labs/en-US/securityencryption/saslscram/authorization
7. Provision Specs
Run the SpecMesh CLI provision
command against each Spec. This will create Topics, Publish schemas and configure ACLs according to the Spec. The ‘id’ of the spec matches the principle.
- Log onto the client machine**
- Checkout/Pull the PR
- Execute SpecMesh CLI
provision
via docker - as shown here
% docker run --rm -v "$(pwd)/resources:/app" ghcr.io/specmesh/specmesh-build-cli provision -bs kafka:9092 -sr http://schema-registry:8081 -spec /app/simple_schema_demo-api.yaml -schemaPath /app
Note: the docker runtime can access the broker and schema registry url
Use the SpecMesh CLI provision
- learn more here: https://github.com/specmesh/specmesh-build/blob/main/cli/README.md
Learn more here
8. Verify Topics and ACLs exist
From the client machine (it has super-user permissions via IAM, MSK service role):
- list topics:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
- list ACLs:
kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --list --topic SommeTopic
See:
Conclusion
Kafka ACLs, RBaC and IAM are challenging to manage in a large scale deployment. SpecMesh simplifies ACLs using a familiar model of hierarchies. Do you really need RBaC when you have private/public/protected. This model is much simpler.
Resources:
MSK setup: https://materialize.com/docs/ingest-data/amazon-msk/
Acls: https://docs.aiven.io/docs/products/kafka/concepts/acl
Quirks: https://medium.com/dev-genius/amazon-msk-tips-quirks-7b1e56d53296
Debugging: https://docs.confluent.io/platform/current/kafka/authorization.html#debug-using-authorizer-logs
https://www.confluent.io/en-gb/blog/event-streaming-benefits-increase-with-greater-maturity
https://www.infoq.com/news/2019/06/bounded-context-eric-evans/