Recovery semantics - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

在大规模Kubernetes集群上实现高SLO的方法

Healthy Terminating Pod Number Daily Report Validation Housekeepi ng High Available Fast Recovery Display Board Alert Analysis Platform Weekly Report SLO： Indicate the cluster is healthy or there • orphaned pod directories/volumes • orphaned cgroups • orphaned net device and so on, node recovery system can cleanup those dirty data or alert cluster admins to process dirty data manually. Unhealthy ntoller failedPodContr oller Detector Strategy Unhealthy node list Fast Taint Weight Adjust Recovery Manual Handling Improve Auto Human experience Improve of strategy …… 1. Collect data from

0 码力 | 11 页 | 4.01 MB | 1 年前
3
QCon北京2017/智能化运维/Self Hosted Infrastructure：以自动运维 Kubernetes 为例

member Kubelet Pods API Server Scheduler Controller Manager etcd operator etcd Disaster Recovery Node failure in HA deployments (Kubernetes) Partial loss of control plane components (Kubernetes) the entire control plane (Kubernetes) Permanent loss of control plane (External tool) Disaster Recovery Permanent loss of control plane ● Similar situation to initial node bootstrap, but utilizing start a temporary replacement api-server ○ Could be binary, static pod, new tool, bootkube, etc. ● Recovery once etcd+api is available can be done via kubectl (as seen previously) Self-Driving Kubernetes

0 码力 | 73 页 | 1.58 MB | 1 年前
3
Operator Pattern 用 Go 扩展 Kubernetes 的最佳实践

Observerbility 日志、系统指标等采集、分析；监控配置与报警；性能指标收集与分析等等。 Backup & Restore 备份策略、备份方式、恢复方式、备份管理等等。 Disaster Recovery & High Availability Failover/Switchover、多可用区、数据恢复等等。 Security & Compliance 访问控制、审计、安全链接、加密存储等等。

0 码力 | 21 页 | 3.06 MB | 9 月前
3

共 3 条前往

页

大规规模大规模 Kubernetes 集群实现 SLO 方法 QCon 北京 2017 智能智能化运维 Self Hosted Infrastructure 自动为例 Operator Pattern Go 扩展最佳实践

分类

语言

格式

在大规模Kubernetes集群上实现高SLO的方法

QCon北京2017/智能化运维/Self Hosted Infrastructure：以自动运维 Kubernetes 为例

Operator Pattern 用 Go 扩展 Kubernetes 的最佳实践