Welcome to a brand new edition of the CloudPro! Today, we’ll talk about:
Masterclass:
Secret Knowledge:
Techwave:
Introducing GPT-4o: OpenAI’s new flagship multimodal model now in preview on Azure
How Slack adopted Karpenter to increase Operational and Cost Efficiency
HackHub: Best Tools for the Cloud
Condenses cloud events and maps them to user input actions in the management console UI
Wait for multiple services to become available with zero dependencies
Cheers,
Editor-in-Chief
Forwarded this Email? Signup Here
MasterClass: Tutorials & Guides
Argo CD and Flux CD are not the only GitOps Tools for Kubernetes
This blog post explains that Sveltos is a strong tool for managing Kubernetes applications, especially when used with Flux CD. It compares Sveltos to Argo CD, noting that Sveltos is more efficient with resources, supports multiple tenants, and is flexible in deployment and templating. Although Argo CD is more popular and has more contributors, Sveltos has unique features and growth potential.
Forgetting the Customer: Monitoring technical metrics without understanding their impact on customer experience is ineffective. Always prioritize tracking customer outcomes.
Environment Inconsistency: Ensure observability tools and configurations are consistent across all environments to identify issues early and prevent production incidents.
Not Understanding Your Ecosystem: Know your dependencies and consumers to avoid blind spots and focus on critical components.
No Consistent Trace ID: Use a unique trace ID across all components to track customer interactions in complex systems.
The Big Dumb Metric: Avoid overly aggregated metrics that lack actionable insights. Track specific, meaningful interactions.
Hacking on PostgreSQL is Really Hard
Hacking on PostgreSQL is challenging for several reasons, both technical and social. The technical difficulty of writing correct patches is high, as even experienced developers struggle to catch all bugs before committing. For example, the author describes their own experience working on incremental backup, which, despite thorough testing, still required numerous fixes after being committed.
The complexity of the codebase, the need for precise and exhaustive testing, and the high standards for patch acceptance make development demanding. This environment creates a bottleneck for contributions, as committers must invest substantial time in reviewing and fixing patches, making them cautious about accepting new contributions.
Additionally, the pressure to avoid mistakes affects community interactions, leading to frustration among contributors and reviewers. As a result, becoming and remaining a committer requires significant ongoing effort, limiting the pool of active committers and slowing down the rate at which new features are added and stabilized.
Serverless technology, like AWS Lambda introduced in 2014, abstracts away infrastructure management, letting developers focus on writing code rather than handling servers. However, this abstraction can be deceptive, creating an illusion of simplicity while hiding the complexities of distributed systems.
In a serverless environment, functions are distributed, invoked via queues, and subject to constraints like cold starts and throttling. Cold starts cause delays when no instances are available, while throttling limits the number of concurrent function executions, resulting in potential errors and increased latency. Developers must also manage various components and configurations, handle out-of-order or duplicate messages, and perform retry logic.
While serverless reduces the need for traditional operational skills, it increases the demand for expertise in distributed system design. This shift can complicate development, as fine-grained, asynchronous applications require careful handling of run-time behaviors and configurations, which are often complex and platform-specific.
How to handle execution timeouts in AWS Step Functions
Task State Timeouts:
Use the Catch clause to handle States.Timeout errors and perform automated remediation.
Execution Timeouts:
EventBridge: For Standard Workflows, use EventBridge rules to catch TIMED_OUT events and trigger a Lambda function. This is not applicable for Express Workflows.
CloudWatch Logs: For both Standard and Express Workflows, use CloudWatch log subscriptions to send timeout events to a Lambda function. Extract details from the logs.
Nested Workflows: Nest the state machine inside a parent Standard Workflow. This works for both Standard and Express Workflows and provides access to input and output.
Secret Knowledge: Learning Resources
I built a WordPress AI plugin to make authors more productive
The author developed a WordPress plugin that integrates AI content generation, using Amazon Bedrock. This plugin aims to boost authors' productivity by streamlining content creation within the WordPress editor.
To utilize the plugin, users need:
AWS account to access Amazon Bedrock
Docker installed on their development machine
Familiarity with basic JavaScript
The plugin setup involves several steps:
Setting up an IAM user with appropriate permissions and obtaining access keys.
Enabling foundation models in the desired AWS region.
Setting up a WordPress environment using Docker.
Writing the initial PHP file for the plugin.
Adding the AWS SDK for PHP v3 to the plugin using Composer.
Throughout the process, security best practices are emphasized, such as least-privilege IAM policies and encryption of credentials (though plaintext storage is used for simplicity in this tutorial). Additionally, caching is implemented for improved performance, and error handling ensures smooth user experience.
Rightsizing Your Lambdas: Lambda Power Tuning & Compute
Rightsizing your Lambdas means optimizing them for both performance and cost efficiency. Cloud computing and serverless technology have changed how we manage infrastructure, but they bring challenges, especially in cost management. To manage costs effectively, we must understand how each resource contributes to costs and find ways to optimize them. Two tools from AWS, Lambda Power Tuning and Cost Optimizer, can help with this.
Lambda pricing depends on three main factors: the number of executions, time per execution, and allocated memory. You pay per execution, with longer execution times costing more. Allocating more memory than needed leads to unnecessary charges, while too little memory causes execution failure. So, finding the right balance is crucial.
Building a GitOps CI/CD Pipeline with GitHub Actions
This guide explains how to set up a GitOps-based CI/CD pipeline using GitHub Actions, focusing on SOC 2 compliance. The pipeline involves two git repositories: one for the application code and artifacts, and another for defining infrastructure resources and configurations. Changes go through pull requests for controlled deployment.
The publish flow involves opening a pull request for changes, running automated tests, merging changes to the main branch, and publishing artifacts. A workflow triggers these steps on pull requests and pushes to the main branch.
The deploy flow in the infrastructure repository updates environments based on configurations. Changes also go through pull requests and automated tests. Multiple environments are managed through directories, not branches. Push-based deployment is used, but pull-based deployment is an option. Rollback involves reverting changes via pull requests. Hotfixes follow a similar process with a dedicated workflow.
How OpenTelemetry records errors
Errors are unexpected issues hindering program execution, while exceptions disrupt the normal flow of a program. OTel's specification provides guidelines for error handling across languages, standardizing implementation.
OTel records errors on spans, the building blocks of distributed traces, and logs, structured, timestamped messages emitted by services. Spans can be enhanced with metadata and span events, providing descriptive information about errors. Logs can also capture errors and correlate them with traces.
Different observability backends may visualize OTel errors differently, but trace and log correlation should enable navigation between errors and associated traces/logs.
Autoscaling Kubernetes workloads with KEDA
This article explains how to autoscale Kubernetes workloads on Amazon EKS using KEDA and Amazon Managed Service for Prometheus metrics. It guides through setup, deployment, and testing, showing how to leverage KEDA to scale based on real-time demand signals like metrics. The integration enables efficient, automated scaling tailored to application needs, promoting dynamic resource allocation and cost optimization.
TechWave: Cloud News & Analysis
Join Kubernetes as a SIG Docs reviewer
Joining SIG Docs means using your knack for clarity to make Kubernetes more accessible. As a reviewer, you'll help maintain top-notch documentation, expand your Kubernetes know-how, and grow your network. No prior experience needed; just dive in, learn, and contribute. Start by familiarizing yourself with SIG Docs, join the community on Slack, review pull requests, attend meetings, and shadow experienced reviewers. Your journey doesn't stop there; progress through the contributor ladder and make a lasting impact on the Kubernetes community.
Introducing GPT-4o: OpenAI’s new flagship multimodal model now in preview on Azure
GPT-4o combines text, vision, and audio capabilities, offering a more immersive AI experience. GPT-4o is accessible through Azure OpenAI Service, focusing initially on text and vision inputs with plans to expand to audio and video. It's designed for efficiency, cost-effectiveness, and opens up various possibilities for customer service, analytics, and content creation.
How Slack adopted Karpenter to increase Operational and Cost Efficiency
Slack adopted Karpenter, an open-source cluster autoscaler, to boost operational efficiency and cut costs in their Amazon EKS environment. Before Karpenter, managing multiple Autoscaling Groups (ASGs) became cumbersome, slowing down upgrades and increasing complexity. Karpenter automates node provisioning based on pod requirements, optimizing instance types and eliminating idle instances.
With Karpenter, Slack saw faster node provisioning, improved upgrade procedures, and reduced costs. They achieved 12% savings on their EKS compute cost and are working on further enhancements like managed Karpenter and customizing kubelets.
What OpenTofu 1.7 Means for DevSecOps
OpenTofu 1.7.0 brings significant advancements for DevSecOps practitioners. Key highlights include end-to-end state encryption, allowing secure storage of state files across any storage backend. This release also offers flexible encryption passphrase management, supporting environment variables or integration with robust key management systems like AWS KMS or GCP KMS.
While OpenTofu lacks built-in policy enforcement like HashiCorp Sentinel, it supports integration with third-party tools like Open Policy Agent (OPA) for embedding security policies alongside configurations. Additionally, OpenTofu 1.7.0 introduces dynamic provider-defined functions, enabling providers to offer resources and native functions directly within OpenTofu code.
Other notable features include the ability to mark specific resources for removal from state files and loopable import blocks for streamlined resource migration. With detailed migration guides and comprehensive documentation, OpenTofu 1.7.0 promises an easy upgrade path from Terraform for DevSecOps practitioners.
Embrace Brings OpenTelemetry to Mobile Developers
Embrace has integrated OpenTelemetry into its iOS and Android SDKs, offering mobile developers a powerful tool for comprehensive mobile observability. This collaboration combines Embrace's insights into user experiences with OpenTelemetry's transparent and extensible data collection capabilities.
With this integration, engineers gain full visibility into mobile user sessions, enabling them to quickly identify and address performance issues. Embrace's SDKs now allow for sending logs and spans to any OTLP-capable tracing and logging backend, with metrics support coming soon. This partnership enhances mobile telemetry, providing context-aware data critical for maintaining highly performant mobile apps.
AI-powered insights for continuous profiling
Flame graph AI in Grafana Cloud uses artificial intelligence to simplify and improve the analysis of flame graphs, which are complex visualizations of profiling data. This AI tool helps developers quickly identify performance bottlenecks, understand their root causes, and suggests fixes. In tests, the AI outperformed most human users in interpreting flame graphs accurately.
This feature has now been integrated into Grafana Cloud Profiles, making it easier for developers to optimize their code and improve performance. Additionally, a GitHub integration allows for even more detailed analysis by linking flame graph nodes directly to source code, enabling targeted optimizations
HackHub: Best Tools for Cloud
Glasskube is a new Kubernetes package manager currently in beta. It simplifies the installation, upgrading, configuration, and management of Kubernetes cluster packages, reducing complexity and streamlining maintenance tasks. With a user-friendly UI and CLI, Glasskube offers automated updates, dependency awareness, and integration with GitOps tools like ArgoCD or Flux. It also features a central package repository and planned additions like Cluster Scan for insights into cluster packages and Version Pinning for package version control.
Permiso-io-tools/CloudConsoleCartographer
Cloud Console Cartographer is an open-source tool introduced at Black Hat Asia. It condenses cloud events like CloudTrail logs, mapping them to user input actions in the management console UI for easier analysis. This tool helps defenders by parsing relevant data from numerous events generated by a single user action, providing context and visibility into the user's activity.
devkit-io/serverless-lambda-cron-cdk
The AWS Lambda Cron Jobs Starter Kit is a ready-to-use package for setting up cron jobs with AWS Lambda. It includes deployment code using AWS CDK, a CI/CD pipeline, and the Lambda function's source code.
Once deployed, the Lambda function will run based on the schedule defined in the CDK code. You can adjust the schedule by editing the code. The source code for the Lambda function can be found in the source folder and can be modified as needed.
Talm is a tool similar to Helm, but specifically for Talos Linux. It simplifies managing Talos using GitOps principles. Key features include automatic discovery of node information, customization through templates, GitOps-friendly patch generation, and simplicity of use. Talm offers commands for upgrading nodes, showing differences, and re-templating files. It also supports talosctl commands with added file options.
is_ready is a program designed to wait until multiple addresses become accessible, useful for coordinating the startup of interconnected services like Docker containers. It stands out for its ability to wait for multiple addresses and being a self-contained binary, meaning it doesn't rely on external dependencies.
If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want to advertise with us.
If you have any comments or feedback, just reply back to this email.
Thanks for reading and have a great day!