Huge Microsoft Outage Linked to CrowdStrike Takes Down Computers Around the World
How To Run The Cheapest Kubernetes Cluster at $1 Per Day
Welcome to the 56th edition of CloudPro! Today, we’ll talk about:
⭐Masterclass:
🔍Secret Knowledge:
A Beginner's Guide to Using Deno with Docker and Docker Compose
High Available k3s kubernetes cluster with keepalived, galera and longhorn
Tales from the cloud trenches: Raiding for AWS vaults, buckets and secrets
⚡Techwave:
Huge Microsoft Outage Linked to CrowdStrike Takes Down Computers Around the World
CI/CD observability: A rich, new opportunity for OpenTelemetry
Grafana Loki and unintended data write attempts to Amazon S3 buckets
Open, Interoperable Storage with Iceberg Tables Now Generally Available
🛠️HackHub: Best Tools for the Cloud
collect client metrics from the Kafka broker using OpenTelemetry
Telegram channels & groups about DevOps, SRE, and Platform Engineering
Cloud-native, AI-powered, document processing pipelines on AWS
Docker images for running Amazon MWAA's Airflow v2.9.2 locally
Cheers,
Editor-in-Chief
Forwarded this Email? Signup Here
⭐MasterClass: Tutorials & Guides
⭐How To Run The Cheapest Kubernetes Cluster at $1 Per Day
George Paw describes how to run a Kubernetes cluster for as low as $1 per day by leveraging Azure AKS. His app, Fakes.io, generates AI-powered fake data, and he sought a cost-efficient, scalable, and self-healing solution for its backend. Kubernetes, with its ability to handle unpredictable workloads, emerged as the ideal choice.
Traditionally, managed Kubernetes services are perceived as expensive, but Paw found that Azure offers a free managed service. By using spot instances for compute resources, which can offer up to 90% savings, and other strategic configurations, he keeps costs minimal. Azure AKS's pricing and free load balancer make it the cheapest option, providing a fully managed, scalable Kubernetes cluster for approximately $1.25 a day.
⭐Boosting engineer productivity in Kubernetes management
Before DevOps automation tools emerged, managing Kubernetes infrastructure required significant manual input. Engineers spent countless hours making decisions about instances, availability zones, and resource sizes. Despite their efforts, understanding and projecting the costs of Kubernetes clusters remained complex and unclear, often leaving DevOps teams disconnected from the financial aspects of their projects.
With the advent of automation tools like CAST AI, the management of Kubernetes has been revolutionized. These tools handle intricate details such as resource provisioning, scaling, and decommissioning automatically, freeing engineers to focus on critical initiatives. This shift not only saves time but also enhances efficiency by allowing automated systems to analyze and adjust variables in real time, ensuring cost-effective growth and enabling engineers to innovate and solve higher-order problems.
⭐My recommended Kubernetes resources for Newbies
Marcus recommended several essential reads, including "The Kubernetes Book" and "Quick Start Kubernetes" by Nigel Poulton, and "The Illustrated Children’s Guide to Kubernetes," which offers surprising depth despite its format. He also suggested tools like Kind for local Kubernetes testing and k9s for an interactive terminal experience.
For online resources, Marcus pointed to official Kubernetes tutorials, Civo Learn, and Kube Academy for their excellent tutorials and workshops. Video content from Rawkode Academy and Anaïs Urlichs, known for their accessible beginner-friendly videos, was also highlighted. Additionally, Marcus emphasized the value of attending conferences like KubeCon and Kubernetes Community Days for networking and further learning. These curated resources provide a solid foundation for anyone starting their Kubernetes journey.
⭐Why Fast Feedback Loops Matter When Working with Kubernetes
In software development, fast feedback loops are crucial for efficient prototyping and innovation. These loops, known as dev loops, allow developers to quickly test and refine their code. However, working with Kubernetes can complicate this process due to the need to create containers, push images, and deploy them, which can be time-consuming and disruptive. The key to maintaining rapid development cycles is to minimize feedback time, enabling faster testing, debugging, and iteration.
Mirrord is a tool designed to address these challenges by allowing developers to run local processes in the context of their remote Kubernetes environment. This means developers can test their code on a cloud environment without the hassle of Dockerization, CI, or deployment. Mirrord mirrors the traffic, environment variables, and file operations from the remote environment to the local process, ensuring that the local environment closely matches production. This approach not only accelerates the development cycle but also reduces the risk of issues during deployment, making Kubernetes development faster and more efficient.
⭐How to create a pipeline for hardening Amazon EKS nodes and automate updates
To create a pipeline for hardening Amazon EKS nodes and automating updates, you use Amazon EC2 Image Builder to apply the Center for Internet Security (CIS) Amazon Linux Benchmark to an Amazon EKS-optimized Amazon Machine Image (AMI). This process involves setting up an Image Builder pipeline, which applies security controls from the CIS benchmarks to the base AMI and then publishes the hardened AMI. The pipeline also runs Amazon Inspector to scan for vulnerabilities, ensuring the AMI adheres to security standards.
Once the hardened AMI is published, a series of AWS Lambda functions and AWS Step Functions update the EKS node groups with the new AMI. These functions create a new launch template version with the hardened AMI, then initiate an update for the node groups. Additionally, a weekly scheduled rule checks for updates to the base AMI and notifies you if a new version is available, prompting you to update the image recipe and rerun the pipeline. This automated workflow ensures that your EKS nodes remain secure and up-to-date, aligning with organizational and regulatory security standards.
🔍Secret Knowledge: Learning Resources
🔍A Beginner's Guide to Using Deno with Docker and Docker Compose
Deno is a modern runtime environment for JavaScript and TypeScript, designed by the creator of Node.js to address its limitations. It emphasizes security, using a permissions model that restricts access to the filesystem and network by default. Deno also supports TypeScript natively, allowing developers to write type-safe code without additional configuration, and it favors the use of ES modules for better modularity.
To run a Deno application using Docker, you first need to create a Deno Fresh app, which is a simple web framework for Deno. After setting up the app, you create a Dockerfile to define the environment and instructions for building the Docker image. The Dockerfile uses a Deno image, sets up the working directory, copies the app files, caches dependencies, and specifies the command to run the Deno app. Once the Dockerfile is ready, you can build the Docker image and run it as a container. For managing multiple containers and setting up more complex environments, you can use Docker Compose, defining services and configurations in a `docker-compose.yml` file. This approach simplifies the process of building, running, and maintaining Deno applications in a containerized environment.
🔍High Available k3s kubernetes cluster with keepalived, galera and longhorn
To set up a highly available k3s Kubernetes cluster on a small 3-node ARM setup, you'll first need to address the single control-plane node issue. If this node fails, the cluster becomes unmanageable. To achieve high availability, you can use keepalived for a high-availability IP, Galera for a MySQL database cluster, and Longhorn for distributed block storage.
Begin by configuring keepalived on all three nodes to provide a high-available IP, which will be used by the Galera database and k3s. Then, set up a Galera cluster with MariaDB to handle the k3s datastore, ensuring database availability across all nodes. Finally, install Longhorn within k3s to manage persistent volumes, ensuring that storage is replicated and accessible even if a node fails. This setup allows the cluster to remain functional even if any two of the three nodes go offline.
🔍Java memory management in containerized environments
Java memory management in Kubernetes can be challenging due to Java's traditional memory handling methods. Java applications use a Java Virtual Machine (JVM) that preallocates a significant portion of system memory. This preallocation doesn't align well with container environments, which often have strict memory limits, causing containers to hit their memory cap and crash. Additionally, the JVM's garbage collection process, particularly with the default G1 garbage collector, can be resource-intensive and doesn't always release unused memory back to the host system.
To mitigate these issues, one can use alternative garbage collectors like Shenandoah or ZGC, which are designed to handle memory more efficiently in containerized environments. Adjusting JVM settings, such as using -XX:GCTimeRatio or -XX:MaxHeapFreeRatio, can also improve memory management. Adopting cloud-native strategies, like running multiple replicas and ensuring failover mechanisms, can help Java applications better tolerate the pauses caused by garbage collection, leading to more stable and efficient operations in Kubernetes.
🔍Multi-node Kubernetes Cluster with Minikube
To set up a multi-node Kubernetes cluster locally using Minikube, start by installing Docker and Minikube on your local machine. Use the Minikube CLI to start a cluster with multiple nodes by specifying parameters like memory, CPU, and the number of nodes. Once the cluster is running, use kubectl to manage and monitor it.
For practical application, deploy a sample Spring Boot app that performs file operations. The app is containerized using the Jib Maven plugin and deployed in a Kubernetes namespace with a persistent volume claim. Additional tools and add-ons, such as Prometheus for monitoring and Gluster for storage, can be installed using Helm to enhance the cluster's functionality.
🔍Tales from the cloud trenches: Raiding for AWS vaults, buckets and secrets
The post discusses a recent campaign where attackers targeted various AWS services, such as AWS Secrets Manager, S3, and S3 Glacier. They used automated tools to try and list secrets and data stored in these services. The attackers employed residential proxies and the Cloudflare WARP VPN to obscure their location, and they used a specific Python library to manually sign their AWS API requests, which is atypical compared to common tools like the AWS CLI.
Interestingly, while they successfully enumerated some S3 buckets and secrets, they did not attempt to exfiltrate the data. This suggests they might be conducting a broad campaign to evaluate potential targets or assess access levels before taking further action. The use of S3 Glacier as a target is particularly notable since it's typically used for backup data, and this is the first instance observed of attackers probing these vaults.
⚡ TechWave: Cloud News & Analysis
⚡Huge Microsoft Outage Linked to CrowdStrike Takes Down Computers Around the World
A recent major outage affecting Windows computers worldwide seems to be linked to a software update from CrowdStrike, a cybersecurity company. This issue has caused various critical services, like banks, airlines, and TV stations, to go offline, showing errors known as Blue Screens of Death (BSODs).
CrowdStrike's update, related to their Falcon Sensor product, appears to have disrupted these systems, though it’s not believed to be due to a malicious attack. The company is working on a fix and has provided a workaround. The problem is currently only affecting Windows devices, and it’s unclear how long it will take to resolve or how widespread the issue is.
Valkey GLIDE is a new open-source client library designed to interface with Valkey and Redis databases. It supports all commands for both Valkey and Redis and aims to provide a reliable and consistent client experience across different programming languages. Built with a core engine in Rust, GLIDE offers features like automatic cluster topology adjustments and optimized error handling, drawing from AWS's extensive experience in managing Redis-compatible services.
By integrating Valkey GLIDE, developers can achieve better performance and reliability in their database operations. The library is available for Java and Python, with plans to support more languages in the future. It simplifies application development by ensuring consistent behavior and best practices across different environments.
⚡CentOS Linux has reached its End of Life
CentOS Linux 7 reached its End of Life (EOL) on June 30, 2024, meaning it will no longer receive updates or security patches. This impacts many organizations still using CentOS 7, which is prevalent in IT environments. Without updates, systems running CentOS 7 face increased security risks and need to migrate to another Linux distribution or implement alternative security measures.
⚡CI/CD observability: A rich, new opportunity for OpenTelemetry
CI/CD pipelines are crucial for modern software delivery, but historically, we haven't had much visibility into their earlier stages like building and testing. Traditionally, observability focused more on the latter stages of deployment, leaving gaps in understanding and optimizing earlier phases. With OpenTelemetry (OTel), there’s a new opportunity to enhance observability throughout the entire CI/CD process.
OpenTelemetry is gaining traction for its ability to standardize and unify observability data across different systems, including CI/CD tools. By leveraging OTel, teams can better track and analyze data from their CI pipelines, gaining insights into issues earlier and improving overall efficiency. This shift allows for proactive problem-solving, reducing downtime and enhancing the robustness of the software delivery process.
⚡Grafana Loki and unintended data write attempts to Amazon S3 buckets
Grafana recently addressed a security issue involving its Loki log aggregation tool. The problem arose when Loki's default configuration for storing logs in S3-compatible buckets led to unauthorized write attempts to Amazon S3 buckets, potentially resulting in unexpected charges. This was due to default bucket names in the Loki Helm chart that, when used improperly, could inadvertently attract write attempts from other AWS customers.
To mitigate this, Grafana updated the Helm chart to remove these default bucket names, ensuring they are only applied when using MinIO. Users are advised to upgrade to the latest versions of the Loki Helm chart or adjust their bucket names to avoid similar issues. AWS has also made changes to its billing policy to address unauthorized S3 write attempts, which now do not incur charges.
⚡Open, Interoperable Storage with Iceberg Tables Now Generally Available
Snowflake has introduced general availability for Iceberg tables, allowing users to manage their data more flexibly and efficiently. Iceberg tables, built on the open-source Apache Iceberg format, enable interoperability across different compute engines, making it easier to integrate and manage data in various storage systems. This update enhances Snowflake's platform by providing better support for open data architectures like data lakes and lakehouses, allowing customers to leverage their existing data without needing to ingest it into Snowflake.
The new functionality includes improved security and governance features, cross-cloud data sharing, and better performance optimizations. It also supports complex data operations, such as metadata evolution and schema flexibility, which are essential for adapting to changing business needs. Snowflake's integration with Iceberg tables aims to simplify data management and enhance the overall data handling experience across different environments.
🛠️HackHub: Best Tools for Cloud
Spilo is a Docker image that combines PostgreSQL and Patroni to create highly available PostgreSQL clusters with automatic failover.
🛠️riferrei/kafka-client-metrics-to-cloudwatch-with-kip-714
KIP-714 enables Apache Kafka clients to push metrics to Amazon CloudWatch, requiring Kafka 3.7.0+ with Kraft mode enabled, and is demonstrated using Docker Compose and OpenTelemetry.
🛠️palark/awesome-devops-telegram
The `awesome-devops-telegram` repository lists curated Telegram channels and groups focused on DevOps, SRE, and Platform Engineering.
Project Lakechain is a cloud-native, AI-powered framework for creating and scaling document processing pipelines on AWS with modularity and cost-efficiency.
🛠️aws/amazon-mwaa-docker-images
The `aws-mwaa-docker-images` repository provides Docker images for running Amazon MWAA's Airflow v2.9.2 locally, supporting various build types for development, debugging, and experimentation.
📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want to advertise with us.
If you have any comments or feedback, just reply back to this email.
Thanks for reading and have a great day!