CloudPro #21: The biggest data migration in the history of the planet: Gmail’s migration to Spanner

Get smarter about cloud and security

Packt

Nov 03, 2023

Welcome to a brand new edition of the CloudPro! Today, we’ll talk about:

Masterclass:

Secret Knowledge:

Techwave:

From the Cloud World:

HackHub:

Eraser: clean and secure Kubernetes nodes for containerized applications
permify: open-source authorization service inspired by Google Zanzibar
oneinfra: Kubernetes as a Service
vagrant: tool for building and distributing development environments
client-go: Go client for Kubernetes

Cheers,

Shreyans Singh

Editor-in-Chief

PS: I hope you will enjoy today's newsletter! I’m all ears for your thoughts – the good, the great, and the "meh." Share your feedback and snag a free Packt eBook. It's a win-win. Can't wait to hear what you think!

Share your feedback and get a free Packt eBook!

SAFe® for DevOps Practitioners ($37.99 Value) FREE for limited time!

Discover how the DevOps approach with Scaled Agile Framework helps you develop and deliver high-quality, secured solutions with a reduced risk of production failures with this step-by-step guide.

Download this limited time offer before it expires on November 10th.

Download Now!

⭐ MasterClass: Tutorials & Guides

⭐Biggest data migrations in the history of the planet: Gmail’s migration to Spanner

They had to ensure that Gmail had to keep working smoothly throughout the transition. To do this, they created a special system that allowed them to switch each user's email account from the old system to Spanner without any interruptions. This required carefully managing the transfer of data and making sure that users could still access their emails at all times.
During the migration, they encountered some unusual and unexpected issues, which they nicknamed "Spannacles." These were rare and unique problems, like emails with very long attachment names or unusual characters in emails. They had to come up with custom solutions for each of these issues to ensure a smooth transition.
The benefits of this migration included reducing the amount of work required to maintain the email system, allowing the team to focus on improving Gmail's features rather than managing the database.

⭐DevOps is Bullsh*t. A Critique of How We've Fooled Ourselves for Years.

The article is critical of the way DevOps is often implemented in organizations. It argues that DevOps has devolved into a bureaucratic and ineffective system, where engineers are forced to do operations work they don't want to do. The author suggests that operations tasks should be handled by experts in a separate team rather than by individual engineers. The article also discusses the challenges of planning, coding, building, testing, and deploying in the context of DevOps and proposes that organizations should focus on platform engineering and enabling developer self-service. The author emphasizes the need for expertise and the development of a great internal developer platform.

⭐He contributed the same fix to Terraform and OpenTofu. What hapened next:

The author encountered a bug in Terraform related to the use of KMS key aliases in the S3 backend configuration. The bug was introduced in the 1.6.0 version of Terraform. They decided to contribute a fix for this issue to both Terraform and OpenTofu.
The article highlights the different experiences the author had when contributing to these projects. OpenTofu responded positively and engaged in a friendly and insightful discussion about the issue. The fix was quickly accepted as a necessary regression.
In contrast, Terraform's response was described as odd and frustrating. The author felt that their contribution was not properly acknowledged, and a Terraform team member merged the fix without proper communication, taking credit for the change.

⭐The Future of Kubernetes: Rancher RKE2 and Cilium CNI:

This article discusses the future of Kubernetes with a focus on Rancher RKE2 and Cilium CNI.

RKE2 Selection: Choosing the right Kubernetes distribution depends on your specific use case. RKE2 is highlighted for its robustness in configuring clusters with a strong emphasis on security and compliance.
Cilium CNI Significance: Cilium is not just another Container Networking Interface. It offers various features, including load balancing and egress gateways.
The article focuses on deploying RKE2 with Cilium CNI, aiming for kube-proxyless operation. Cilium employs eBPF to streamline packet routing, eliminating the need for time-consuming iptables rule evaluations. This results in improved network performance and a reduced host load, capable of handling higher packet volumes.

⭐Security considerations for running containers on Amazon ECS:

This article provides six best practices for enhancing security when running containers on Amazon Elastic Container Service (ECS).

Shared Responsibility Model: AWS and customers share security responsibilities, with AWS managing infrastructure and the customer managing container security.
Access Management: Use IAM policies, roles, and automated pipelines for access control.
Network Security: Isolate containers with network segmentation, use encryption, separate VPCs, and AWS PrivateLink for security.
Secrets Management: Store secrets in AWS Secrets Manager or Parameter Store and mount them using sidecar containers.
Task and Runtime Security: Secure container images, enable ECR tag immutability, avoid privileged mode, and disable ECS Exec for production environments.
Logging and Monitoring: Monitor root-user activities, task changes, and container activity metrics. Utilize VPC Flow Logs for network traffic analysis.
Security Compliance: Ensure compliance with data sensitivity, company objectives, and regulations, using AWS Security Hub for monitoring.

🔍Secret Knowledge: Learning Resources

🔍How to upgrade hundreds of Kubernetes clusters:

This content explains how to upgrade multiple Kubernetes clusters without causing disruptions. Pierre shares insights on the process, tools, and testing strategies for large-scale upgrades. Key takeaways include staying informed about Kubernetes changes, safe Helm chart upgrades, handling Custom Resource Definitions (CRDs), testing for API deprecations, and the role of automation. Pierre also shares his experience managing stateful applications on thousands of nodes.

🔍Systems that support WAY more revenue than they should:

Interesting discussion on reddit where people are sharing examples of systems that generated significant revenue despite being quite makeshift. Here’s one:

This company had a marketing analytics system that evaluated the effectiveness of ads using data from ad vendors.
Their entire tech stack ran on a single, relatively large AWS server (4xlarge).
This server hosted:
1. The sole MySQL database.
2. The complete ETL (Extract, Transform, Load) pipeline.
3. A client SFTP dropbox.
4. An nginx server with a few ingestion API apps.
Shockingly, they had no backups for their MySQL database, and deployments were done by FTPing into the server.
With just this single server, they managed to pull in $8 million in revenue during the first year, despite a highly unconventional setup.

🔍K8s is not the platform – or is it and we all misunderstood?

This article discusses how Kubernetes (K8s) has evolved to become a central component in building developer platforms. Initially seen as a low-level tool, K8s now plays a key role in platform development. Custom Resource Definitions (CRDs) in K8s have enabled the creation of custom APIs and resources, making it possible to build higher-level abstractions without writing extensive code. The article highlights the Crossplane project as an example of how K8s can orchestrate cloud resources without coding. Overall, K8s provides a foundation for creating developer platforms by leveraging its API language, ecosystem, and controller capabilities. It has transitioned from being a container orchestration tool to a crucial tool for building versatile developer platforms.

🔍Why Container Runtimes Still Matter:

This article discusses the significance of container runtimes, specifically focusing on the CRI-O container runtime project in the context of Kubernetes. While Kubernetes manages container orchestration, it relies on container runtimes to handle container execution. Two major container runtimes, CRI-O and containerd, have matured, providing stability and flexibility for Kubernetes users. These runtimes are essential in improving cloud flexibility, security, and performance. The article also mentions ongoing developments in container runtimes, such as managing devices and plugins, securing the supply chain, and rewriting components in languages like Rust.

🔍Running the top open source Vector Database on AWS: What They Don't Tell You in the Quickstart Guide

This article discusses how to set up the Milvus vector database on AWS using EKS (Elastic Kubernetes Service) in a more advanced way than described in the Milvus documentation. The article provides steps for creating an EKS cluster, customizing the configuration, managing access permissions, and deploying Milvus using Helm charts with a values.yaml file. It emphasizes the need for security and efficiency considerations during the setup process. After following these steps, you'll have a Milvus vector database running on AWS EKS, ready for further scaling and usage.

⚡ TechWave: Cloud News & Analysis

⚡Fast, Easy and Secure LLM App Development With Snowflake Cortex

Snowflake is introducing new tools to make it easier for users to work with generative AI (GenAI) and large language models (LLMs) securely. Here's a simplified breakdown of what they're offering:

Snowflake Cortex: An intelligent, fully managed service that provides access to AI models and LLMs, allowing organizations to quickly analyze data and build AI applications within Snowflake's secure environment.
Snowpark Container Services: This runtime environment enables developers to deploy and manage custom containerized workloads and models within Snowflake's secure infrastructure.
Use AI in Seconds: Snowflake is making it easy for users to utilize cutting-edge LLMs without the need for custom integrations or frontend development.
Build LLM Apps in Minutes: Developers can create LLM applications customized with their data in just minutes, thanks to Snowflake Cortex Functions.
Streamlit in Snowflake: Teams can rapidly develop LLM apps with minimal Python code and no frontend expertise.
Snowpark Container Services: This allows developers to build and deploy custom UIs, fine-tune open-source LLMs, and more within Snowflake's secure boundary.

⚡Google launches AI search for retailers powered by VertexAI and ElasticSearch:

Retailers aim to improve their website and app search functionality to provide a better customer experience. Traditional search systems often fall short as they return basic keyword-based results. To address this, Google Cloud and Elasticsearch have collaborated to introduce a more advanced, generative AI-powered search experience.

Here's a brief overview of the solution:

Enhanced Search Experience: This new search approach goes beyond simple keyword matching and offers a conversational experience. It helps customers who may not have a clear idea of what they want and tailors responses based on their unique needs.
Architecture: The system combines Google Cloud's generative AI services and Elasticsearch's search capabilities. It consists of two main phases: initial query processing and response generation.
Contextual Information: Elasticsearch is used to gather rich context information from a retailer's domain-specific customer data. This data includes real-time updates from various internal sources, embedding creation, vector search, and result ranking.
Generative AI: Vertex AI Conversation, powered by Dialogflow CX, is used to create conversational responses. The system selects a generative model, fine-tunes it as needed, deploys it on Google Cloud, and integrates it into the search process. Security and privacy measures are in place to ensure responsible AI use.

⚡Deploy and fine-tune foundation models in Amazon SageMaker JumpStart with two lines of code:

This article introduces a simplified version of the Amazon SageMaker JumpStart SDK, which makes it easy to build, train, and deploy foundation models. The article demonstrates how to use this simplified SDK to get started with foundation models in just a few lines of code.

Amazon SageMaker JumpStart provides pre-trained models for various problem types in machine learning.
You can easily deploy and fine-tune these models using the simplified SageMaker JumpStart SDK.
The article provides code examples for deploying a foundation model (Flan T5 XL) and invoking it for text summarization.
It also showcases how to fine-tune and deploy a model using the JumpStartEstimator class.
The SDK allows customization based on specific requirements, such as changing instance types and input payload formats.

🌐From the Cloud World:

🌐Gateway API v1.0: GA Release:

The Gateway API v1.0 has been released, marking a significant milestone for Kubernetes. Key features like Gateway, GatewayClass, and HTTPRoute have graduated to GA, making this API more stable. The release focuses on enhancing the API's existing features and preparing for future improvements. Over 170 contributors have helped develop this collaborative Kubernetes API, ensuring its continued growth. The Gateway API's future includes stabilizing and expanding experimental features, such as support for service mesh and additional route types like GRPCRoute and TCPRoute.

🌐Accelerate your CI/CD with Arm-based hosted runners in GitHub Actions

GitHub Actions is introducing Arm-based hosted runners, powered by Arm's Ampere Altra processors, to accelerate software development across various domains like embedded edge, IoT, and cloud infrastructure, offering efficiency and sustainability benefits, with a private beta set for January 2024.

🌐OpenCost Expands Its Horizon: Introducing Multi-Cloud Cost Monitoring!

OpenCost is expanding to offer an open-source solution for monitoring and managing costs across multiple cloud platforms, simplifying expense tracking and optimization for businesses using services like Google Cloud, AWS, and Azure, with plans to support more platforms in the future. This expansion provides a unified view of cloud expenses and aligns with industry standards, making it easier for organizations to control and optimize their cloud costs.

🌐Karpenter graduates to beta:

Karpenter, a Kubernetes node manager by AWS, has transitioned from its alpha version to the beta stage, which means it's becoming more stable and mature with API changes and enhancements, making it easier for users to manage their Kubernetes clusters. The upgrade involves changes in API names and functionality for better usability and compatibility.

🌐Introducing Azure Bastion Developer: Secure and cost-effective access to your Azure Virtual Machines:

Azure Bastion Developer is a new low-cost service from Microsoft that provides secure and hassle-free access to your Azure Virtual Machines, addressing common developer pain points like discovery, usability, and cost, making it a must-try solution for developers.

🛠️HackHub: Best Tools for Cloud

🛠️Eraser is designed to help maintain clean and secure Kubernetes nodes for containerized applications. Kubernetes is a popular tool for managing containers, but as clusters grow, unused or vulnerable container images can become a problem. Eraser automates the process of cleaning up these unnecessary or outdated images, improving security, performance, and cost efficiency.

🛠️Permify is an open-source authorization service that helps manage fine-grained permissions in applications and services, offering flexibility in authorization language, database storage, testing, multi-tenancy support, and performance analysis, benefiting scenarios such as unified access control, future-proofing, micro-service infrastructure, and handling complex authorization.

🛠️oneinfra is a Kubernetes as a Service platform that lets you create or use Kubernetes clusters at scale on any platform or service provider with support for various Kubernetes versions and easy setup on Linux environments.

🛠️Vagrant is a tool for creating and managing development environments that can run on local virtualized platforms, cloud services, or containers, making it easy to set up and share development environments across different operating systems.

🛠️client-go is a set of Go clients for communicating with a Kubernetes cluster, allowing developers to interact with the Kubernetes API and manage resources in a cluster. It's used to create and manage applications in Kubernetes.