Largest migration in Kubernetes history
And: Kubernetes has completed its largest ever code cleanup
Launch Alert: CloudPro Library Now Live!
CloudPro #48: Largest migration in Kubernetes history
Welcome to a brand new edition of the CloudPro! Today, we’ll talk about:
Masterclass:
How Ahrefs Gets a Billion Dollar-Worth Infrastructure With a 90% Discount
AWS Event Bridge Scheduler as a workaround to SQS Fifo queue limitations
Secret Knowledge:
AWS EC2 Access: Unlocking Seamless Control with Session Manager
Adding flexibility to your deployments with Lambda Web Adapter
Techwave:
Amazon S3 will no longer charge for several HTTP error codes
New models added to the Phi-3 family, available on Microsoft Azure
HackHub: Best Tools for the Cloud
Cheers,
Editor-in-Chief
Forwarded this Email? Signup Here
MasterClass: Tutorials & Guides
Upgrading Hundreds of Kubernetes Clusters
Automating the upgrade process for hundreds of Kubernetes clusters is a big challenge, but Pierre Mavro, the co-founder and CTO at Qovery, has managed to do it. Using his extensive experience and a skilled team, they have successfully automated upgrades for both public and private clouds. Bart Farell interviewed Pierre to learn how he achieved this without spending too much money.
How Ahrefs Gets a Billion Dollar-Worth Infrastructure With a 90% Discount
Ahrefs saved around $1 billion over six years by using colocation instead of cloud services like AWS. They spent $122 million on their own servers and data centers, compared to over $1 billion if they had used AWS. Their high server utilization (86-92%) and a small team of 11 people efficiently manage the infrastructure. Colocation provided the necessary capacity at a much lower cost, allowing Ahrefs to avoid the high expenses associated with cloud services while maintaining performance and growth.
Our Source Control Best Practices
Use clear, human-readable branch names (e.g., name/feature).
Follow specific guidelines for writing commit messages.
Keep commits atomic and descriptive.
Use imperatives in commit messages (e.g., "Add validation").
Use GitHub Code Owners for PR reviews.
Authors decide when to merge after approvals.
Create a ticket for each PR.
Use comments to guide changes, prefer commit-based reviews.
Maintain a clean git history by squashing commits during review.
Best practices for Kubernetes Pod IP allocation in GKE
Kubernetes assigns unique addresses to each part of your computer network, called Pods. But these addresses can run out quickly. One solution is to use different addresses for Pods, freeing up space. For example, you can keep your main network in one area and create extra space for Pods in a different area. This ensures enough room for growth without running out of space. Additionally, a trick called IP masquerading helps maintain smooth communication between different parts of the network, even with the new addresses.
AWS Event Bridge Scheduler as a workaround to SQS Fifo queue limitations
At Trustpilot, they use AWS SQS (Simple Queue Service) for asynchronous processing. They needed to delay processing some messages in a FIFO (First In, First Out) queue but couldn't do it directly because FIFO queues don't support delayed messages.
They found a workaround using AWS EventBridge Scheduler. It allows scheduling events, so they scheduled messages to be sent to the FIFO queue at specific times. This way, they could delay processing without waiting in the FIFO queue.
EventBridge Scheduler was a good fit because they only needed to delay messages occasionally.
They set up a ScheduleGroup to manage these scheduled events efficiently. They didn't worry about exceeding AWS quotas because their use-case was rare and they cleaned up schedules after use.
Secret Knowledge: Learning Resources
Automatic Image Update to Git using Flux and GitHub Actions
Flux automates image updates with:
Image Reflector Controller: Scans repositories and updates metadata.
Image Automation Controller: Updates Git with new image tags and triggers deployments.
Steps to Implement with Flux and GitHub Actions
Bootstrap Flux with `flux bootstrap`.
Define policies for managing image tags.
Automate workflows to build, push images, and trigger Flux deployments.
Use `flux create secret git` for GitHub repository access.
Use `kubectl` to monitor deployment status.
Scalable Web Scraping with Serverless
This guide explains how to create a scalable web scraping infrastructure using the Serverless Framework and AWS services. Serverless computing eliminates the need for managing infrastructure, automatically scales based on demand, and is cost-effective as you only pay for the compute time used.
Explore Metrics | Grafana documentation
Grafana Explore Metrics is a feature that lets you easily look at and analyze data without needing to write complex queries. You can quickly find and analyze metrics from Prometheus or similar sources. Here's how it works:
You can browse metrics without writing any queries.
Easily slice and dice metrics based on their labels to spot anomalies and issues.
It automatically chooses the right visualization for your metric, so you don't have to build it yourself.
It shows other metrics related to the one you're looking at.
You can expand a drawer over a dashboard to see more content without losing your place.
It keeps track of the steps you've taken while exploring metrics.
AWS EC2 Access: Unlocking Seamless Control with Session Manager
AWS Session Manager replaces traditional SSH key-based access to EC2 instances with a secure, centralized, and user-friendly approach. It leverages IAM roles for access control, encrypts sessions by default, logs user actions for auditability, and eliminates the need for open inbound ports. By deploying the SSM Agent on instances, it establishes secure connections, simplifying workflows and enhancing security.
Adding flexibility to your deployments with Lambda Web Adapter
Lambda Web Adapter (LWA) simplifies deploying web apps on Lambda functions without code changes. It adds flexibility by allowing easy transitions between Lambda and ECS Fargate deployments. LWA works as a Lambda extension, forwarding events to your HTTP server. It supports various triggers and packaging formats, making it versatile. With CDK, you can deploy web apps with LWA and transition seamlessly between Lambda and ECS Fargate, ensuring flexibility and ease of deployment.
TechWave: Cloud News & Analysis
Largest migration in Kubernetes history
The Kubernetes project has successfully moved all cloud provider integrations from its core repository to external plugins. This shift was driven by the need to simplify maintenance and establish Kubernetes as a vendor-neutral platform. Approximately 1.5 million lines of code were removed, and core component binary sizes were reduced by about 40%.
Now that this migration is complete, efforts will focus on enhancing Kubernetes' integration with cloud providers, especially in hybrid environments, and improving testing frameworks to ensure compatibility across different providers. It's recommended to migrate to an external cloud provider if still using an older version of Kubernetes, as in-tree cloud providers will be permanently disabled starting in v1.31
CVE alert: If you are using Flux with Azure Storage you may be affected.
A software glitch exposed secret passwords in error logs when connecting to Azure Blob Storage. Attackers could exploit this to access the storage. Fixed in version 1.2.5 of the software. No workaround except changing the authentication method. Discovered and fixed by Jagpreet Singh Tamber from the Azure Arc team.
GitLab Readies Enterprise Edition of AI Tools for DevOps
GitLab is introducing an enterprise edition of its AI tools for DevOps. This includes features like automated security vulnerability detection and fixing, issue summarization, and collaboration enhancements. The platform also offers a dashboard for tracking AI feature usage and impact. GitLab Duo Enterprise supports self-hosted model deployments for secure environments.
The update also includes a CI/CD catalog for discovering and reusing pre-built components, observability tools, project planning features, and a secrets manager. GitLab is also releasing a version for Google Cloud Platform's private cloud for compliance needs. This move reflects a growing trend of using AI in DevOps workflows to manage increasing codebase sizes efficiently.
Amazon S3 will no longer charge for several HTTP error codes
Amazon S3 is changing its billing policy for certain HTTP error codes. Customers will no longer be charged for unauthorized requests they didn't initiate. This means bucket owners won't incur charges for HTTP 403 (Access Denied) errors from outside their AWS account. The change applies to all S3 buckets and regions.
New models added to the Phi-3 family, available on Microsoft Azure
Microsoft Azure has added new models to its Phi-3 family, available for use on their platform. These models include Phi-3-vision, which combines language and vision capabilities, alongside previously announced Phi-3-small and Phi-3-medium. These models are optimized for generative AI applications, offering strong reasoning with limited compute requirements. They are available through Azure AI's models as a service offering. The Phi-3 models are designed to run on various hardware and platforms, including mobile and web deployments. They have been used in various applications, such as assisting Indian farmers and improving math tutoring. Additionally, Phi-3-vision is the first multimodal model in the family, capable of reasoning over text and images.
HackHub: Best Tools for Cloud
Prevent cloud misconfigurations and find vulnerabilities during build-time in infrastructure as code, container images and open source packages with Checkov by Bridgecrew.
Vulnerability scanner for container images and filesystems
Cloudflare Tunnel client (formerly Argo Tunnel)
Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster.
AliyunContainerService/log-pilot
Collect logs for docker containers