<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Konstantin Troshin</title>
    <description>The latest articles on Forem by Konstantin Troshin (@konstl000).</description>
    <link>https://forem.com/konstl000</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F901431%2Fe17035de-750e-4a79-81eb-f5937252cc36.png</url>
      <title>Forem: Konstantin Troshin</title>
      <link>https://forem.com/konstl000</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/konstl000"/>
    <language>en</language>
    <item>
      <title>Fortifying federated access to AWS via OIDC</title>
      <dc:creator>Konstantin Troshin</dc:creator>
      <pubDate>Fri, 12 Aug 2022 22:02:00 +0000</pubDate>
      <link>https://forem.com/fmegroup/fortifying-federated-access-to-aws-via-oidc-2995</link>
      <guid>https://forem.com/fmegroup/fortifying-federated-access-to-aws-via-oidc-2995</guid>
      <description>&lt;p&gt;In order to avoid management of numerous long-term IAM users, AWS&lt;br&gt;
provides federated access options that include &lt;a href="https://wiki.oasis-open.org/security/FrontPage"&gt;SAML2.0&lt;/a&gt; and &lt;a href="https://openid.net/connect/"&gt;OIDC&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers.html"&gt;&lt;strong&gt;identity providers&lt;/strong&gt; (IDP)&lt;/a&gt;. Whereas the SAML option is used by many of our customers and there are &lt;a href="https://www.google.com/search?q=aws+saml+access+keycloak"&gt;numerous examples&lt;/a&gt; of how to set it up , the &lt;a href="https://www.google.com/search?q=aws+oidc+access+keycloak"&gt;examples of use of OIDC&lt;/a&gt; are much scarcer. Thus, while selecting our own method of access federation, we decided to try OIDC out to get better understanding of its limits and advantages and be able to better advise our customers.&lt;/p&gt;
&lt;h2&gt;
  
  
  Differences between SAML and OIDC identity federation
&lt;/h2&gt;

&lt;p&gt;To demonstrate the key differences between OIDC and SAML, I have created a small &lt;a href="https://github.com/fmeAG/keycloak-aws"&gt;repo&lt;/a&gt; that allows to deploy &lt;a href="https://www.keycloak.org/"&gt;Keycloak&lt;/a&gt; on an EC2 instance and then configure the SAML and OIDC clients to use with AWS. &lt;br&gt;
For those unfamiliar with Keycloak, it is an open source Identity&lt;br&gt;
and Access Management tool sponsored by RedHat and widely used by many of our customers and ourselves as an &lt;strong&gt;identity provider&lt;/strong&gt;. Among other features, Keycloak supports SAML and OIDC protocols for identity management and provides user federation via LDAP that allows to use it with an existing user base from an Active Directory. After &lt;a href="https://github.com/fmeAG/keycloak-aws#deployment"&gt;deployment&lt;/a&gt; of Keycloak and configuring the &lt;a href="https://github.com/fmeAG/keycloak-aws#access-to-aws-with-saml"&gt;SAML&lt;/a&gt; and &lt;a href="https://github.com/fmeAG/keycloak-aws#access-to-aws-via-oidc"&gt;OIDC&lt;/a&gt; clients, we can use Keycloak to login into AWS. &lt;br&gt;
The SAML login can be performed by going to &lt;code&gt;https://auth.${TF_VAR_root_dn}/realms/awsfed/protocol/saml/clients/amazon-aws&lt;/code&gt; where &lt;code&gt;${TF_VAR_root_dn}&lt;/code&gt; is the subdomain you need to create before the deployment. After entering the credentials for the user &lt;code&gt;testuser&lt;/code&gt; that is created by the deployment scripts, we get redirected to the AWS console for the AWS account to which Keycloak has been deployed. If we would have assigned multiple roles to the same Keycloak group (or multiple groups to &lt;code&gt;testuser&lt;/code&gt;), a page like the one below would appear (which would look familiar to everyone who already used SAML federation with AWS).&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cjYIQC1z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/phgbuaer4bluucn6z0ls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cjYIQC1z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/phgbuaer4bluucn6z0ls.png" alt="SAML choice window" width="880" height="613"&gt;&lt;/a&gt;&lt;br&gt;
If you like to experiment and have deployed everything from the &lt;a href="https://github.com/fmeAG/keycloak-aws"&gt;repo&lt;/a&gt;, you can go to the network tab of the development tools of the browser, find the &lt;code&gt;saml&lt;/code&gt; document there and copy its contents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--abOJYkft--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xnuwav514xouap5ag2ot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--abOJYkft--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xnuwav514xouap5ag2ot.png" alt="SAML assertion" width="880" height="1215"&gt;&lt;/a&gt;&lt;br&gt;
Save the contents as &lt;code&gt;aws-saml/assertion&lt;/code&gt; and run the &lt;code&gt;saml.sh&lt;/code&gt; from the same folder. If you are fast enough (per default, the SAML assertion for AWS is valid only for 5 minutes), the assuming should work for the first role but fail for the second. If you look at the trust policies for the corresponding roles (whose names should end with &lt;code&gt;_Federated_Admin-SAML&lt;/code&gt; and &lt;code&gt;_Federated_Admin-SAML2&lt;/code&gt;, respectively), you will see that those are identical and allow the &lt;em&gt;AssumeRoleWithSAML&lt;/em&gt; operation for the same SAML provider. So, why is access granted for the first and denied for the second role? This is because AWS actually checks the SAML assertion itself for the presence of the role that you try to assume. Looking at the &lt;a href="https://github.com/fmeAG/keycloak-aws/blob/main/aws-saml/kccommands.sh"&gt;script&lt;/a&gt; we ran to configure Keycloak, we can see these two lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kcadm create "clients/$clientId/roles" -r ${REALM_NAME} -s "name=$(terraform output -raw role_arn),$(terraform output -raw provider_arn)" -s 'description=AWS Access'
kcadm add-roles -r ${REALM_NAME} --gname "${GROUP_NAME}" --cclientid 'urn:amazon:webservices'  --rolename "$(terraform output -raw role_arn),$(terraform output -raw provider_arn)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These lines create an entry for the first role (the one without 2) in Keycloak and map this role to a group &lt;code&gt;aws_access&lt;/code&gt; that is later assigned to our &lt;code&gt;testuser&lt;/code&gt;. Thus, this role shows up in the SAML assertion and can be assumed. Since the same thing does not happen for the second role, the access to it is denied to testuser (of course, this would change if you created the corresponding entry and mapping in Keycloak for this one too).&lt;/p&gt;

&lt;p&gt;But what about OIDC? Running the &lt;code&gt;./oidc.sh&lt;/code&gt; script from the &lt;code&gt;aws-oidc&lt;/code&gt; folder, we can see that our &lt;code&gt;testuser&lt;/code&gt; can assume the role for which our OIDC provider is listed in the trust policy. A closer look at this &lt;a href="https://github.com/fmeAG/keycloak-aws/blob/main/aws-oidc/iam.tf#L5"&gt;policy&lt;/a&gt; shows that it contains only two things: the ARN of the OIDC provider and the &lt;strong&gt;client ID&lt;/strong&gt; as &lt;em&gt;aud&lt;/em&gt;. This corresponds to what AWS Console is doing&lt;br&gt;
if a role is created there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--usFKT02M--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ciifa7jrokexmsszdhtr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--usFKT02M--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ciifa7jrokexmsszdhtr.png" alt="AWS Console IAM Role TP OIDC" width="880" height="290"&gt;&lt;/a&gt;&lt;br&gt;
Also note that (as opposed to the SAML case), there was no need to do anything in Keycloak after running terraform scripts in the &lt;code&gt;aws-oidc&lt;/code&gt; folder. What does this mean? Well, in the case of OIDC, AWS does not check for any role or group assignments in the &lt;strong&gt;ID token&lt;/strong&gt;. The only two things that matter with the default settings are the &lt;strong&gt;IDP&lt;/strong&gt; itself (which is defined by the URL and the thumbprint as you can clearly see from the &lt;a href="https://github.com/fmeAG/keycloak-aws/blob/main/aws-oidc/openid.tf"&gt;openid.tf&lt;/a&gt; file) and the &lt;strong&gt;client ID&lt;/strong&gt; (defined in the &lt;code&gt;aud&lt;/code&gt; section of the trust policy).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "exp": 1657326250,
  "iat": 1657322650,
  "auth_time": 0,
  "jti": "a valid id must be here",
  "iss": "https://our.domain/realms/somerealm",
  "aud": "THISISWHATMATTERS",
  "sub": "typically_this_is_the_user_id",
  "typ": "ID",
  "azp": "the_same_as_aud",
  "session_state": "another id is here",
  "at_hash": "some stuff",
  "sid": "and yet another id",
  "email_verified": true,
  "groups": [
    "group1",
    "group2",
    "group3",
    "group4",
    "group5"
  ],
  "preferred_username": "some_user",
  "email": "some_user@our.domain",
  "username": "some_user"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This all means that any user that has access to the corresponding&lt;br&gt;
Keycloak realm can assume any role that trusts the &lt;strong&gt;IDP&lt;/strong&gt; which is not very granular or secure and way inferior to SAML, right? Well, that would be so if not for a very important thing - the way I used OIDC in this example is not how it is supposed to be used. Let's look at the &lt;code&gt;oidc.sh&lt;/code&gt; script more closely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function getClientSecret(){
  kcadm get -r ${REALM_NAME} "clients/$(getClientId ${1})/client-secret" | jq -r '.value'
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, I use &lt;code&gt;kcadm.sh&lt;/code&gt; (which is containerized and kind of hidden behind source &lt;code&gt;../kcadm.sh&lt;/code&gt;) to get the &lt;strong&gt;client secret&lt;/strong&gt; for the Keycloak OIDC client. This operation requires admin rights and would be equal to a Keycloak administrator giving a &lt;strong&gt;client secret&lt;/strong&gt; to a user in a regular context. This secret is then used together with the username and password for &lt;code&gt;testuser&lt;/code&gt; to directly get the &lt;strong&gt;ID token&lt;/strong&gt; that is in turn submitted to AWS STS. Of course, as a Keycloak admin I would never do this in the non-test environment because the &lt;strong&gt;client secret&lt;/strong&gt; (which is bound to the &lt;strong&gt;client ID&lt;/strong&gt; that is, in turn, specified in the IAM trust policy) is not meant to be available for the users. But what is it for then? Looking at the &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html"&gt;AWS documentation on the OIDC topic&lt;/a&gt;, we can see that it mentions an &lt;strong&gt;identity broker&lt;/strong&gt;. And this &lt;strong&gt;identity broker&lt;/strong&gt; (which is not provided by AWS as in the case of SAML) is actually what the &lt;strong&gt;client ID and secret&lt;/strong&gt; are destined for.&lt;br&gt;
So, what is an &lt;strong&gt;identity broker&lt;/strong&gt; anyway? An &lt;strong&gt;identity broker (IB)&lt;/strong&gt; is an application that should function as a link between AWS and Keycloak and take over the management of user rights (it should know which user should be able to assume what role). A proper OIDC login flow should be started by the &lt;strong&gt;IB&lt;/strong&gt; that redirects the user to the &lt;strong&gt;IDP&lt;/strong&gt; (Keycloak in our case) which, after verifying the user credentials, provides the &lt;strong&gt;ID Token&lt;/strong&gt; for that user to the &lt;strong&gt;IB. &lt;/strong&gt;The &lt;strong&gt;IB&lt;/strong&gt; uses &lt;strong&gt;client ID and secret&lt;/strong&gt; to authenticate itself against the &lt;strong&gt;IDP&lt;/strong&gt;. As you also can see from the oidc.sh script, it would be a bad idea to provide the &lt;strong&gt;ID token&lt;/strong&gt; to the user because a combination of the role ARN and the &lt;strong&gt;ID token&lt;/strong&gt; is all you need to assume a role with OIDC.&lt;br&gt;
Instead, the &lt;strong&gt;IB&lt;/strong&gt; should check if the user has access to a requested role and then use the &lt;strong&gt;ID token&lt;/strong&gt; to get the temporary AWS credentials (by using the &lt;em&gt;AssumeRoleWithWebIdentity&lt;/em&gt; operation) and then return these credentials to the user (or use them to get the login URL for the AWS console). In my demo above, I use cURL as an &lt;strong&gt;IB&lt;/strong&gt; which is obviously a very poor choice for a production environment since it grants access to any role to any user.&lt;/p&gt;
&lt;h2&gt;
  
  
  Hardening the OIDC-based roles
&lt;/h2&gt;

&lt;p&gt;Whereas use of a proper &lt;strong&gt;identity broker&lt;/strong&gt; minimizes the risk of the OIDC access to AWS being misused, the experiments above brought me to the question whether it is possible to get AWS STS to look at the user attributes from the &lt;strong&gt;ID token&lt;/strong&gt; and not only at the &lt;strong&gt;client ID&lt;/strong&gt; (&lt;em&gt;aud&lt;/em&gt;) and the &lt;strong&gt;IDP&lt;/strong&gt; itself. Looking at &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html"&gt;the documentation for GitHub&lt;/a&gt; (which also uses OIDC) as IDP, I saw that there is another attribute - &lt;em&gt;sub&lt;/em&gt; - that is used in trust policies. For Keycloak, the default value of &lt;em&gt;sub&lt;/em&gt; is the user ID, which is not very useful, but Keycloak has mappers that can be assigned to&lt;br&gt;
clients and can override the defaults. Experimenting with mappers, I discovered that it is indeed possible to get Keycloak to provide any LDAP user attribute (we use LDAP user federation in our environment) as &lt;em&gt;sub&lt;/em&gt; to AWS. The only caveat here is that this attribute needs to be a string, so that it is not directly possible to use group memberships (which would be arrays) to additionally secure the trust policies. It is, however, possible to use the &lt;em&gt;StringLike&lt;/em&gt; operator to match substrings. Using this operator, it is possible to check for LDAP groups with AWS STS as long as those are stringified. For instance, the following trust policy checks for a certain group (provided by terraform as &lt;code&gt;${var.group}&lt;/code&gt;) in a group string looking like this:&lt;br&gt;
&lt;code&gt;-group1-group2-group3-...&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": aws_iam_openid_connect_provider.oidc.arn
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${var.oidc_provider}:aud": var.client_id
        },
        "StringLike":{
          "${var.oidc_provider}:sub": ["*-${var.group}-*"]
        }
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, what could this group string come from? One option would be to write a custom plugin for Keycloak and another would be to let the &lt;strong&gt;IB&lt;/strong&gt; (which is a custom app) handle this. My &lt;a href="https://github.com/fmeAG/keycloak-aws"&gt;repo&lt;/a&gt; actually contains such a custom mapper (next section also discusses that a bit more in detail) that should be active in your Keycloak if you deployed it as described above. To see the mapper in action, we can run the &lt;code&gt;./oidc_protected.sh&lt;/code&gt; script from the &lt;code&gt;aws-oidc&lt;/code&gt; folder. As you will see, you would be able to assume the first role but not the second one.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hzVg3WCV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yr9d2fnwzqw08xvx3z89.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hzVg3WCV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yr9d2fnwzqw08xvx3z89.png" alt="Protected Role test" width="880" height="167"&gt;&lt;/a&gt;&lt;br&gt;
Why? Let's take a look at the &lt;a href="https://github.com/fmeAG/keycloak-aws/blob/main/aws-oidc/openid.tf"&gt;trust policies&lt;/a&gt;: the one for the first role contains the &lt;code&gt;aws_access&lt;/code&gt; group which as we know is assigned to our &lt;code&gt;testuser&lt;/code&gt;, the one for the second role refers to the &lt;code&gt;aws_access_exclusive&lt;/code&gt; group which does not even exist in Keycloak yet. So, even though our user had a valid &lt;strong&gt;ID Token&lt;/strong&gt;, it was not possible to assume a protected role because this token did not contain the correct group. If you want to verify that the access will be granted once you create the corresponding group and assign it to the &lt;code&gt;testuser&lt;/code&gt;&lt;br&gt;
and also to look at the new Keycloak UI (which is at preview for&lt;br&gt;
Keycloak 18.0.2), you can do so at &lt;code&gt;https://auth.${TF_VAR_root_dn}&lt;/code&gt;.&lt;br&gt;
In this case, you would need to use the admin credentials (admin and &lt;code&gt;${TF_VAR_keycloak_password}&lt;/code&gt;defined in &lt;code&gt;export.sh&lt;/code&gt;). Once the group is created and assigned, the access works as expected. Sweet!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nIzriMTI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4whckmy5wfop44g6zwwf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nIzriMTI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4whckmy5wfop44g6zwwf.png" alt="Keycloak Admin UI" width="880" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--M5mVsAJD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/21apods1ohzvjx8b3tku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--M5mVsAJD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/21apods1ohzvjx8b3tku.png" alt="Assuming Role with OIDC" width="880" height="459"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Developing custom mappers for Keycloak
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.keycloak.org/docs/latest/server_development/#_script_providers"&gt;Keycloak documentation&lt;/a&gt; mentions "JavaScript" providers which can be used to create custom mappers. As I read JavaScript, I was expecting to write something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function stringifyGroups(groups){ 
    return groups.reduce((current, element)=&amp;gt;{ 
        return current+"-"+element; 
    }, "")+"-"; 
} 
token.setOtherClaims("sub",stringifyGroups(token.groups));
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and then place this into a .jar file as described in the documentation.&lt;br&gt;
It turned out that it does not work like that at all. Firstly, the&lt;br&gt;
custom scripts are disabled by default as I found out looking at the Keycloak logs. To fix this, one needs either to activate the preview functions or enable the &lt;em&gt;scripts&lt;/em&gt; option alone as described &lt;a href="https://github.com/keycloak/keycloak-documentation/blob/main/server_installation/topics/profiles.adoc"&gt;here&lt;/a&gt;.&lt;br&gt;
Secondly, "JavaScript" turned out to be very Java-based and needs to call functions from the corresponding Java classes of Keycloak:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;var res="";
var forEach = Array.prototype.forEach;
forEach.call(user.getGroupsStream().toArray(), function (group) {
  res=res+"-"+group.getName();
});
res=res+"-";
exports=res;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://github.com/fmeAG/keycloak-aws"&gt;repo&lt;/a&gt; shows how it all comes together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;In conclusion, both SAML2 and OIDC are great options of access federation for AWS and have their advantages and drawbacks. If you decide to use OIDC like us, you need an &lt;strong&gt;identity broker (IB)&lt;/strong&gt; that provides a link between an &lt;strong&gt;IDP&lt;/strong&gt; (such as Keycloak) and AWS. It would be unwise and potentially dangerous to provide &lt;strong&gt;ID tokens&lt;/strong&gt; directly to the federated users, because a combination of such a token with a role ARN is usually enough to be able to assume that role. Of course, it would be even more unwise to provide an AWS-trusted &lt;strong&gt;client ID&lt;/strong&gt; to the users. A combination of the StringLike operator and the Keycloak mappers can be used to increase the security of OIDC-Federated AWS accounts by restricting the access to the roles to certain user attributes such as group membership similarly to how the SAML2 federation works.&lt;/p&gt;

</description>
      <category>oidc</category>
      <category>aws</category>
      <category>saml</category>
      <category>keycloak</category>
    </item>
    <item>
      <title>Certbot as an init container for AWS ECS.</title>
      <dc:creator>Konstantin Troshin</dc:creator>
      <pubDate>Mon, 08 Aug 2022 17:00:00 +0000</pubDate>
      <link>https://forem.com/fmegroup/certbot-as-an-init-container-for-aws-ecs-2n5p</link>
      <guid>https://forem.com/fmegroup/certbot-as-an-init-container-for-aws-ecs-2n5p</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.tourl"&gt;&lt;/a&gt;&lt;br&gt;
Encryption in transit has become a security standard for most&lt;br&gt;
network-based applications and is requested by the majority of our&lt;br&gt;
customers for all applications we help them to build or manage. Most of the modern applications support TLS out of the box but require the certificate and the corresponding private key to be provided externally.&lt;br&gt;
In some cases (for example, for intranet apps), self-signed certificates (or certificates signed by an internal CA) are sufficient, but if the application is internet-facing and needs to be used without additional steps on the client side, a certificate signed by a commonly trusted certificate authority (CA) is required. For AWS-based applications (as you may have guessed from the title, AWS are a main focus of this post), AWS Certificate Manager (ACM) can be used in combination with a load balancer to provide an amazon-signed certificate. This simple and efficient method is not applicable, however, if the certificate and the corresponding private key need to be provided to the application directly instead of an AWS-managed load balancer. This can be the case if the application is using TLS in combination with its own protocol which would make TLS termination on the load balancer impossible. &lt;a href="https://letsencrypt.org/"&gt;Let's Encrypt&lt;/a&gt; is an open CA that provides trusted certificates which can be acquired by using a tool that supports the ACME protocol. In this case, the certificate and private key can then be provided to the application directly and used also for custom TLS-based protocols. &lt;a href="https://certbot.eff.org/"&gt;Certbot&lt;/a&gt; is one of such tools and can be used to obtain the TLS credentials.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s2eDXALi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wvv45rzg3eijv9jvym1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s2eDXALi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wvv45rzg3eijv9jvym1t.png" alt="le" width="880" height="648"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The use case
&lt;/h2&gt;

&lt;p&gt;Recently, I have been asked to provide a publically accessible &lt;a href="https://neo4j.com/product/neo4j-graph-database"&gt;Neo4j database&lt;/a&gt; to use for development purposes. Since a Neo4j installation is available as a docker container, I chose to use AWS ECS to run it (a Kubernetes-based solution such as EKS would be quite an overkill for such a simple use case). To start things up, I deployed a &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html"&gt;Network Load Balancer&lt;/a&gt; (NLB) with three listeners and the corresponding target groups for ports 7474 (HTTP), 7473 (HTTPS), and 7687 (bolt). To improve security of the database, I decided to activate TLS for the bolt and HTTPS endpoints.&lt;br&gt;
Neo4j provides support for both &lt;a href="https://neo4j.com/developer/kb/setting-up-ssl-with-docker/"&gt;out of the box&lt;/a&gt;, but requires the certificates to be provided externally. My initial approach was to use TLS listeners in combination with an Amazon-signed ACM certificate and TLS target groups to talk to the Neo4j container. I used openssl to create a self-signed certificate and provided it via an ECS mount point to Neo4j. This worked just fine for the HTTPS endpoint but did not for bolt which is, however, crucial for the Neo4j clients. It has become clear that TLS termination would not work for this use case and that I needed to use TCP listeners and target groups and to provide the publically facing certificate directly to Neo4j. Since the request of the customer included a wish that the database can be easily accessed by the clients without much configuration on their side, I also wanted this certificate to be publically trusted. In many of our k8s-based projects, we use cert-manager which can directly obtain Let's Encrypt (LE) certificates, which brought me to the idea of using LE for my current task. Thinking about k8s and init containers, I also remembered reading some stuff about &lt;a href="https://aws.amazon.com/about-aws/whats-new/2019/03/amazon-ecs-introduces-enhanced-container-dependency-management/?nc1=h_ls"&gt;container dependencies in ECS&lt;/a&gt;, so I came to an idea of using a certbot docker container as such an "init container" for my Neo4j database. The schematic architecture is depicted below and includes an EC2 ECS host on which three containers should run: first the &lt;strong&gt;certbot&lt;/strong&gt; container is started that can request the certificate for the corresponding domain. Once the certificate and the private key are there, the certbot container exits successfully upon which the second container (copier) is started. This container just needs a shell (I decided to use &lt;strong&gt;debian:latest&lt;/strong&gt; for this purpose) and its purpose is to copy the certificate and the private key into the folders and under the file names Neo4j expects. Upon the successful exit of this container, the &lt;strong&gt;Neo4j&lt;/strong&gt; container is finally started.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3n57HaoX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/59rczy688es0v8w9w9zq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3n57HaoX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/59rczy688es0v8w9w9zq.png" alt="Architechture" width="792" height="626"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To achieve the correct order of the containers, AWS ECS supports the&lt;br&gt;
&lt;strong&gt;dependsOn&lt;/strong&gt; attribute - a list of &lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDependency.html"&gt;&lt;em&gt;ContainerDependency&lt;/em&gt; objects&lt;/a&gt;&lt;br&gt;
that in turn consist of &lt;strong&gt;containerName&lt;/strong&gt; and &lt;strong&gt;condition&lt;/strong&gt;. The&lt;br&gt;
condition attribute allows to specify whether the previous container&lt;br&gt;
should have started (START), exited (COMPLETE), ran successfully&lt;br&gt;
(SUCCESS) or is passing docker health checks (HEALTHY). In the present&lt;br&gt;
use case, SUCCESS is the correct &lt;strong&gt;condition,&lt;/strong&gt; since both certificate&lt;br&gt;
retrieval and copy are crucial for the Neo4j container to work properly&lt;br&gt;
(the copier container is called debian here):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  {
    "dependsOn": [
      {
        "containerName": "certbot",
        "condition": "SUCCESS"
      },
      {
        "containerName": "debian",
        "condition": "SUCCESS"
      }
    ],
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Routing to certbot
&lt;/h2&gt;

&lt;p&gt;A small challenge for the architecture above is to ensure that certbot can solve the HTTP challenge of Let's encrypt which is a part of the ACME protocol (this is needed to verify that the domain for which the certificate is requested is indeed controlled by us). The problem here is that if targets of type &lt;em&gt;instance&lt;/em&gt; are used with the load balancer (which makes sense for an ECS EC2 host), health checks are mandatory. On the other hand, since certbot is running only for a short time, it itself cannot be used for health checks on port 80. Also, LE expects the domain to be already routable to certbot requesting the certificate which means that a typical registration delay that load balancers have is not acceptable in this case. As a result, the instance should be&lt;br&gt;
registered at the corresponding target group of the NLB and already healthy &lt;em&gt;before&lt;/em&gt; the certbot container is even started. To address this issue, I decided to use a simple trick based on a small &lt;a href="https://github.com/konstl000/handshaker"&gt;handshaker app&lt;/a&gt;. This app provides a golang-based http server listening on a specified port that simply replies "OK" to any request and can be deployed as a scratch-based docker container (or a binary). Since the app cannot block the port 80 (which is required by certbot once it is ready to accept the HTTP challenge), I configured the corresponding target group (TG80) to forward to port 80 but health check on another port (6666) which I then assigned to handshaker. To ensure the correct timing, I included starting the app into the &lt;em&gt;user data&lt;/em&gt; script of the ECS EC2 instance and made terraform (with which the whole infrastructure is built) to register the auto scaling group that deploys these instances at TG80.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -d -e HEALTH_CHECK_PORT=6666 -p 6666:6666 \
${SOME_ACCOUNT_ID}.dkr.ecr.eu-central-1.amazonaws.com/handshaker:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As expected, shortly after terraform apply, the instance was registered at TG80 and became healthy. After this, I used aws cli to scale the ECS service to 1 task (I usually initially deploy the ECS services with the task count of 0, so that the whole infrastructure such as load balancers, instances, Route53 entries, etc. is available before the containers are even started).&lt;/p&gt;

&lt;p&gt;To my delight, certbot successfully requested the certificate, passed the HTTP challenge and stored the results in a shared folder mounted via a mount point. After this, the following script ran in the copier container followed by the successful start of Neo4j.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#!/bin/bash

#The le folder will be mounted from the host and filled by certbot
cp /le/live/"${DOMAIN}"/cert.pem /home/neo4j/certificates/bolt/public.crt
cp /le/live/"${DOMAIN}"/privkey.pem /home/neo4j/certificates/bolt/private.key
#from here, we just need to create some more copies
cp /home/neo4j/certificates/bolt/private.key /home/neo4j/certificates/https/
cp /home/neo4j/certificates/bolt/public.crt /home/neo4j/certificates/https/
cp /home/neo4j/certificates/bolt/public.crt /home/neo4j/certificates/bolt/trusted/
cp /home/neo4j/certificates/bolt/public.crt /home/neo4j/certificates/https/trusted/

chown -R 7474 /home/neo4j/certificates #so that the neo4j user can read 'em
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jZgTnNZR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3vqsxblsseglxzmh21ui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jZgTnNZR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3vqsxblsseglxzmh21ui.png" alt="cert view" width="880" height="1089"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternatives
&lt;/h2&gt;

&lt;p&gt;Of course, the described approach is not the only way to get a&lt;br&gt;
certificate from LE and provide it to a Neo4j container (or another application). Some of the simple alternatives I can immediately think of would be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Run certbot directly on the EC2 host instead of a container&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use k8s/k3s/k0s in combination with cert-manager&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build a custom container that has certbot inside of it&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That being said, I think that the init container approach shows a way of using ECS similar to k8s pods and can be successfully applied to other ECS-based solutions. Also, it allows to use the upstream containers which makes upgrades seamless as opposed to the "one custom container" approach. In case you wonder, how the hell I could run a custom bash script within an upstream debian container -- I just used a mount point to mount a folder from the host that has been created and filled by the &lt;em&gt;user data&lt;/em&gt; script during the EC2 deployment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
mkdir -p /home/xtra #prepare the xtra folder that will be attached to the debian contaner
echo "${CERT_SCRIPT}" | base64 -d &amp;gt;/home/xtra/copy_certs.sh
chmod +x /home/xtra/copy_certs.sh
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Scaling
&lt;/h2&gt;

&lt;p&gt;In the current example, I used an auto scaling group with just one&lt;br&gt;
instance in it, which allows all the mount points to be folders on this instance. Of course, the local folder solution would not scale well. In this case, however, EFS can be used instead, so that the certificate and the key would be requested just once by one of the certbots (certbot exits automatically if a valid certificate is already present), but can then be used by all of the corresponding Neo4j containers. All other services used in the infrastructure above (NLB, NAT Gateway, ECS) support horizontal scaling, so that a solution based on this approach can be scaled out with ease.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;In conclusion, AWS ECS provides a nice option to include k8s-like "init containers" by using container dependencies and non-essential containers. Those can be employed for a variety of purposes including generation of TLS certificates with a certbot container. The TLS credentials can be then immediately provided to an application running in the essential container on the same host resulting in a publically trusted secured app.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>containers</category>
      <category>ecs</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
