Skip to content

Commit

Permalink
readme-update
Browse files Browse the repository at this point in the history
  • Loading branch information
zacharyblasczyk committed Oct 9, 2024
2 parents 291fe9d + 5c9bdcd commit 8494997
Show file tree
Hide file tree
Showing 18 changed files with 63 additions and 339 deletions.
32 changes: 27 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ Upgrades must be executed in step-wise fashion from one version to the next. You

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 4.0 |
| <a name="provider_aws"></a> [aws](#provider\_aws) | 4.67.0 |

## Modules

Expand Down Expand Up @@ -180,12 +180,14 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="input_bucket_kms_key_arn"></a> [bucket\_kms\_key\_arn](#input\_bucket\_kms\_key\_arn) | n/a | `string` | `""` | no |
| <a name="input_bucket_name"></a> [bucket\_name](#input\_bucket\_name) | n/a | `string` | `""` | no |
| <a name="input_bucket_path"></a> [bucket\_path](#input\_bucket\_path) | path of where to store data for the instance-level bucket | `string` | `""` | no |
| <a name="input_clickhouse_endpoint_service_id"></a> [clickhouse\_endpoint\_service\_id](#input\_clickhouse\_endpoint\_service\_id) | The service ID of the VPC endpoint service for Clickhouse | `string` | `""` | no |
| <a name="input_controller_image_tag"></a> [controller\_image\_tag](#input\_controller\_image\_tag) | Tag of the controller image to deploy | `string` | `"1.14.0"` | no |
| <a name="input_create_bucket"></a> [create\_bucket](#input\_create\_bucket) | ######################################### External Bucket # ######################################### Most users will not need these settings. They are ment for users who want a bucket and sqs that are in a different account. | `bool` | `true` | no |
| <a name="input_create_elasticache"></a> [create\_elasticache](#input\_create\_elasticache) | Boolean indicating whether to provision an elasticache instance (true) or not (false). | `bool` | `true` | no |
| <a name="input_create_vpc"></a> [create\_vpc](#input\_create\_vpc) | Boolean indicating whether to deploy a VPC (true) or not (false). | `bool` | `true` | no |
| <a name="input_custom_domain_filter"></a> [custom\_domain\_filter](#input\_custom\_domain\_filter) | A custom domain filter to be used by external-dns instead of the default FQDN. If not set, the local FQDN is used. | `string` | `null` | no |
| <a name="input_database_binlog_format"></a> [database\_binlog\_format](#input\_database\_binlog\_format) | Specifies the binlog\_format value to set for the database | `string` | `"ROW"` | no |
| <a name="input_database_engine_version"></a> [database\_engine\_version](#input\_database\_engine\_version) | Version for MySQL Auora | `string` | `"8.0.mysql_aurora.3.05.2"` | no |
| <a name="input_database_engine_version"></a> [database\_engine\_version](#input\_database\_engine\_version) | Version for MySQL Aurora | `string` | `"8.0.mysql_aurora.3.07.1"` | no |
| <a name="input_database_innodb_lru_scan_depth"></a> [database\_innodb\_lru\_scan\_depth](#input\_database\_innodb\_lru\_scan\_depth) | Specifies the innodb\_lru\_scan\_depth value to set for the database | `number` | `128` | no |
| <a name="input_database_instance_class"></a> [database\_instance\_class](#input\_database\_instance\_class) | Instance type to use by database master instance. | `string` | `"db.r5.large"` | no |
| <a name="input_database_kms_key_arn"></a> [database\_kms\_key\_arn](#input\_database\_kms\_key\_arn) | n/a | `string` | `""` | no |
Expand All @@ -199,14 +201,16 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS cluster kubernetes version | `string` | n/a | yes |
| <a name="input_eks_policy_arns"></a> [eks\_policy\_arns](#input\_eks\_policy\_arns) | Additional IAM policy to apply to the EKS cluster | `list(string)` | `[]` | no |
| <a name="input_elasticache_node_type"></a> [elasticache\_node\_type](#input\_elasticache\_node\_type) | The type of the redis cache node to deploy | `string` | `"cache.t2.medium"` | no |
| <a name="input_enable_dummy_dns"></a> [enable\_dummy\_dns](#input\_enable\_dummy\_dns) | Boolean indicating whether or not to enable dummy DNS for the old alb | `bool` | `false` | no |
| <a name="input_enable_operator_alb"></a> [enable\_operator\_alb](#input\_enable\_operator\_alb) | Boolean indicating whether to use operatore ALB (true) or not (false). | `bool` | `false` | no |
| <a name="input_enable_clickhouse"></a> [enable\_clickhouse](#input\_enable\_clickhouse) | Provision clickhouse resources | `bool` | `false` | no |
| <a name="input_enable_yace"></a> [enable\_yace](#input\_enable\_yace) | deploy yet another cloudwatch exporter to fetch aws resources metrics | `bool` | `true` | no |
| <a name="input_external_dns"></a> [external\_dns](#input\_external\_dns) | Using external DNS. A `subdomain` must also be specified if this value is true. | `bool` | `false` | no |
| <a name="input_extra_fqdn"></a> [extra\_fqdn](#input\_extra\_fqdn) | Additional fqdn's must be in the same hosted zone as `domain_name`. | `list(string)` | `[]` | no |
| <a name="input_kms_clickhouse_key_alias"></a> [kms\_clickhouse\_key\_alias](#input\_kms\_clickhouse\_key\_alias) | KMS key alias for AWS KMS Customer managed key used by Clickhouse CMEK. | `string` | `null` | no |
| <a name="input_kms_clickhouse_key_policy"></a> [kms\_clickhouse\_key\_policy](#input\_kms\_clickhouse\_key\_policy) | The policy that will define the permissions for the clickhouse kms key. | `string` | `""` | no |
| <a name="input_kms_key_alias"></a> [kms\_key\_alias](#input\_kms\_key\_alias) | KMS key alias for AWS KMS Customer managed key. | `string` | `null` | no |
| <a name="input_kms_key_deletion_window"></a> [kms\_key\_deletion\_window](#input\_kms\_key\_deletion\_window) | Duration in days to destroy the key after it is deleted. Must be between 7 and 30 days. | `number` | `7` | no |
| <a name="input_kms_key_policy"></a> [kms\_key\_policy](#input\_kms\_key\_policy) | The policy that will define the permissions for the kms key. | `string` | `""` | no |
| <a name="input_kms_key_policy_administrator_arn"></a> [kms\_key\_policy\_administrator\_arn](#input\_kms\_key\_policy\_administrator\_arn) | The principal that will be allowed to manage the kms key. | `string` | `""` | no |
| <a name="input_kubernetes_alb_internet_facing"></a> [kubernetes\_alb\_internet\_facing](#input\_kubernetes\_alb\_internet\_facing) | Indicates whether or not the ALB controlled by the Amazon ALB ingress controller is internet-facing or internal. | `bool` | `true` | no |
| <a name="input_kubernetes_alb_subnets"></a> [kubernetes\_alb\_subnets](#input\_kubernetes\_alb\_subnets) | List of subnet ID's the ALB will use for ingress traffic. | `list(string)` | `[]` | no |
| <a name="input_kubernetes_instance_types"></a> [kubernetes\_instance\_types](#input\_kubernetes\_instance\_types) | EC2 Instance type for primary node group. | `list(string)` | <pre>[<br> "m5.large"<br>]</pre> | no |
Expand All @@ -228,6 +232,7 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="input_network_private_subnets"></a> [network\_private\_subnets](#input\_network\_private\_subnets) | A list of the identities of the private subnetworks in which resources will be deployed. | `list(string)` | `[]` | no |
| <a name="input_network_public_subnet_cidrs"></a> [network\_public\_subnet\_cidrs](#input\_network\_public\_subnet\_cidrs) | List of private subnet CIDR ranges to create in VPC. | `list(string)` | <pre>[<br> "10.10.0.0/24",<br> "10.10.1.0/24"<br>]</pre> | no |
| <a name="input_network_public_subnets"></a> [network\_public\_subnets](#input\_network\_public\_subnets) | A list of the identities of the public subnetworks in which resources will be deployed. | `list(string)` | `[]` | no |
| <a name="input_operator_chart_version"></a> [operator\_chart\_version](#input\_operator\_chart\_version) | Version of the operator chart to deploy | `string` | `"1.3.4"` | no |
| <a name="input_other_wandb_env"></a> [other\_wandb\_env](#input\_other\_wandb\_env) | Extra environment variables for W&B | `map(any)` | `{}` | no |
| <a name="input_parquet_wandb_env"></a> [parquet\_wandb\_env](#input\_parquet\_wandb\_env) | Extra environment variables for W&B | `map(string)` | `{}` | no |
| <a name="input_private_link_allowed_account_ids"></a> [private\_link\_allowed\_account\_ids](#input\_private\_link\_allowed\_account\_ids) | List of AWS account IDs allowed to access the VPC Endpoint Service | `list(string)` | `[]` | no |
Expand Down Expand Up @@ -262,7 +267,7 @@ Upgrades must be executed in step-wise fashion from one version to the next. You
| <a name="output_eks_node_count"></a> [eks\_node\_count](#output\_eks\_node\_count) | n/a |
| <a name="output_eks_node_instance_type"></a> [eks\_node\_instance\_type](#output\_eks\_node\_instance\_type) | n/a |
| <a name="output_elasticache_connection_string"></a> [elasticache\_connection\_string](#output\_elasticache\_connection\_string) | n/a |
| <a name="output_internal_app_port"></a> [internal\_app\_port](#output\_internal\_app\_port) | n/a |
| <a name="output_kms_clickhouse_key_arn"></a> [kms\_clickhouse\_key\_arn](#output\_kms\_clickhouse\_key\_arn) | The Amazon Resource Name of the KMS key used to encrypt Weave data at rest in Clickhouse. |
| <a name="output_kms_key_arn"></a> [kms\_key\_arn](#output\_kms\_key\_arn) | The Amazon Resource Name of the KMS key used to encrypt data at rest. |
| <a name="output_network_id"></a> [network\_id](#output\_network\_id) | The identity of the VPC in which resources are deployed. |
| <a name="output_network_private_subnets"></a> [network\_private\_subnets](#output\_network\_private\_subnets) | The identities of the private subnetworks deployed within the VPC. |
Expand Down Expand Up @@ -301,6 +306,23 @@ For more information on the available sizes, see the [Cluster Sizing](#cluster-s
If having the cluster scale nodes in and out is not desired, the `kubernetes_min_nodes_per_az` and
`kubernetes_max_nodes_per_az` can be set to the same value to prevent the cluster from scaling.

This upgrade is also intended to be used when upgrading eks to 1.29.

We have also upgraded the following Kubernetes addons:

- MySQL Aurora (8.0.mysql_aurora.3.07.1)
- redis (7.1)
- external-dns helm chart (v1.15.0)
- aws-efs-csi-driver (v2.0.7-eksbuild.1)
- aws-ebs-csi-driver (v1.35.0-eksbuild.1)
- coredns (v1.11.3-eksbuild.1)
- kube-proxy (v1.29.7-eksbuild.9)
- vpc-cni (v1.18.3-eksbuild.3)

> :warning: Please remove the `enable_dummy_dns` and `enable_operator_alb` variables
> as they are no longer valid flags. They were provided to support older versions of
> the module that relied on an alb not created by the ingress controller.
### Upgrading from 3.x -> 4.x

- If egress access for retrieving the wandb/controller image is not available, Terraform apply may experience failures.
Expand Down
2 changes: 0 additions & 2 deletions examples/byo-vpc-eks/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,12 @@ variable "bucket_kms_key_arn" {
default = ""
}


variable "allowed_inbound_cidr" {
default = ["0.0.0.0/0"]
nullable = false
type = list(string)
}


variable "allowed_inbound_ipv6_cidr" {
default = ["::/0"]
nullable = false
Expand Down
3 changes: 0 additions & 3 deletions examples/byo-vpc/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ module "wandb_infra" {
public_access = true
external_dns = true

enable_dummy_dns = var.enable_dummy_dns
enable_operator_alb = var.enable_operator_alb

deletion_protection = true

create_vpc = false
Expand Down
12 changes: 0 additions & 12 deletions examples/byo-vpc/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -103,18 +103,6 @@ variable "other_wandb_env" {
default = {}
}

variable "enable_operator_alb" {
type = bool
default = false
description = "Boolean indicating whether to use operatore ALB (true) or not (false)."
}

variable "enable_dummy_dns" {
type = bool
default = false
description = "Boolean indicating whether or not to enable dummy DNS for the old alb"
}

variable "vpc_id" {
type = string
description = "VPC network ID"
Expand Down
31 changes: 1 addition & 30 deletions examples/public-dns-external/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ module "wandb_infra" {
allowed_inbound_cidr = var.allowed_inbound_cidr
allowed_inbound_ipv6_cidr = ["::/0"]

eks_cluster_version = "1.26"
eks_cluster_version = "1.29"
kubernetes_public_access = true
kubernetes_public_access_cidrs = ["0.0.0.0/0"]

Expand Down Expand Up @@ -84,35 +84,6 @@ provider "helm" {
}
}

module "wandb_app" {
source = "wandb/wandb/kubernetes"
version = "1.12.0"

license = var.wandb_license

host = module.wandb_infra.url
bucket = "s3://${module.wandb_infra.bucket_name}"
bucket_path = var.bucket_path
bucket_aws_region = module.wandb_infra.bucket_region
bucket_queue = "internal://"
bucket_kms_key_arn = module.wandb_infra.kms_key_arn
database_connection_string = "mysql://${module.wandb_infra.database_connection_string}"
redis_connection_string = "redis://${module.wandb_infra.elasticache_connection_string}?tls=true&ttlInSeconds=604800"

wandb_image = var.wandb_image
wandb_version = var.wandb_version

service_port = module.wandb_infra.internal_app_port

# If we dont wait, tf will start trying to deploy while the work group is
# still spinning up
depends_on = [module.wandb_infra]

other_wandb_env = merge({
"GORILLA_CUSTOMER_SECRET_STORE_SOURCE" = "aws-secretmanager://${var.namespace}?namespace=${var.namespace}"
}, var.other_wandb_env)
}

output "bucket_name" {
value = module.wandb_infra.bucket_name
}
Expand Down
1 change: 0 additions & 1 deletion examples/standard/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ provider "helm" {
}
}


output "bucket_name" {
value = module.wandb_infra.bucket_name
}
Expand Down
30 changes: 4 additions & 26 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -180,27 +180,15 @@ module "app_eks" {
aws_loadbalancer_controller_tags = var.aws_loadbalancer_controller_tags
}

locals {
full_fqdn = var.enable_dummy_dns ? "old.${local.fqdn}" : local.fqdn
extra_fqdn = var.enable_dummy_dns ? [for fqdn in var.extra_fqdn : "old.${fqdn}"] : var.extra_fqdn
}

module "app_lb" {
source = "./modules/app_lb"

namespace = var.namespace
load_balancing_scheme = var.public_access ? "PUBLIC" : "PRIVATE"
acm_certificate_arn = local.acm_certificate_arn
zone_id = var.zone_id
namespace = var.namespace

fqdn = local.full_fqdn
extra_fqdn = local.extra_fqdn
allowed_inbound_cidr = var.allowed_inbound_cidr
allowed_inbound_ipv6_cidr = var.allowed_inbound_ipv6_cidr
target_port = local.internal_app_port
network_id = local.network_id
network_private_subnets = local.network_private_subnets
network_public_subnets = local.network_public_subnets
enable_private_only_traffic = var.private_only_traffic
private_endpoint_cidr = var.allowed_private_endpoint_cidr

Expand All @@ -224,12 +212,6 @@ module "private_link" {
]
}

resource "aws_autoscaling_attachment" "autoscaling_attachment" {
for_each = module.app_eks.autoscaling_group_names
autoscaling_group_name = each.value
lb_target_group_arn = module.app_lb.tg_app_arn
}

locals {
network_elasticache_subnets = var.create_vpc ? module.networking.elasticache_subnets : var.network_elasticache_subnets
network_elasticache_subnet_cidrs = var.create_vpc ? module.networking.elasticache_subnet_cidrs : var.network_elasticache_subnet_cidrs
Expand Down Expand Up @@ -323,12 +305,12 @@ module "wandb" {
"alb.ingress.kubernetes.io/listen-ports" = "[{\\\"HTTPS\\\": 443}]"
"alb.ingress.kubernetes.io/certificate-arn" = local.acm_certificate_arn
},
length(var.extra_fqdn) > 0 && var.enable_dummy_dns ? {
length(var.extra_fqdn) > 0 ? {
"external-dns.alpha.kubernetes.io/hostname" = <<-EOF
${local.fqdn}\,${join("\\,", var.extra_fqdn)}\,${local.fqdn}
EOF
} : {
"external-dns.alpha.kubernetes.io/hostname" = var.enable_operator_alb ? local.fqdn : ""
"external-dns.alpha.kubernetes.io/hostname" = local.fqdn
},
length(var.kubernetes_alb_subnets) > 0 ? {
"alb.ingress.kubernetes.io/subnets" = <<-EOF
Expand All @@ -338,11 +320,7 @@ module "wandb" {

}

app = var.enable_operator_alb ? {} : {
extraEnv = merge({
"GORILLA_GLUE_LIST" = "true"
}, var.app_wandb_env)
}
app = {}

# To support otel rds and redis metrics, we need operator-wandb chart min version 0.13.8 (yace subchart)
yace = var.enable_yace ? {
Expand Down
45 changes: 21 additions & 24 deletions modules/app_eks/add-ons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -27,48 +27,45 @@ resource "aws_iam_role" "oidc" {
assume_role_policy = data.aws_iam_policy_document.oidc_assume_role.json
}



### add-ons for eks version 1.28

### add-ons for eks version 1.29
resource "aws_eks_addon" "aws_efs_csi_driver" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "aws-efs-csi-driver"
addon_version = "v2.0.4-eksbuild.1"
resolve_conflicts = "OVERWRITE"
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "aws-efs-csi-driver"
addon_version = "v2.0.7-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "aws_ebs_csi_driver" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.31.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
cluster_name = var.namespace
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.35.0-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "coredns" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.11"
resolve_conflicts = "OVERWRITE"
cluster_name = var.namespace
addon_name = "coredns"
addon_version = "v1.11.3-eksbuild.1"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "kube_proxy" {
depends_on = [
aws_eks_addon.vpc_cni
]
cluster_name = var.namespace
addon_name = "kube-proxy"
addon_version = "v1.28.8-eksbuild.5"
resolve_conflicts = "OVERWRITE"
cluster_name = var.namespace
addon_name = "kube-proxy"
addon_version = "v1.29.7-eksbuild.9"
resolve_conflicts = "OVERWRITE"
}

resource "aws_eks_addon" "vpc_cni" {
Expand All @@ -77,7 +74,7 @@ resource "aws_eks_addon" "vpc_cni" {
]
cluster_name = var.namespace
addon_name = "vpc-cni"
addon_version = "v1.18.2-eksbuild.1"
addon_version = "v1.18.3-eksbuild.3"
resolve_conflicts = "OVERWRITE"
service_account_role_arn = aws_iam_role.oidc.arn
}
2 changes: 1 addition & 1 deletion modules/app_eks/external_dns/external_dns.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ resource "helm_release" "external_dns" {
name = "external-dns"
namespace = "kube-system"
chart = "external-dns"
version = "1.14.1"
version = "1.15.0"
repository = "https://kubernetes-sigs.github.io/external-dns"

set {
Expand Down
Loading

0 comments on commit 8494997

Please sign in to comment.