GPU Resource Management On JDOS
GPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3Node Operator: Kubernetes Node Management Made Simple
Node Operator: Kubernetes Node Management Made Simple 陈俊(Joe), Ant Financial Agenda • Background and Motivation • Introduction of Operators • Node-Operator • Advanced Topic: • Upgrade Master & Node Components reliably • Canary Rollout • Master & Node Component Versions Management Motivation: Work Order Deployment Worker Order • Upgrade Nodes Versions • Upgrade Node 10.10 Complicated architecture Work order deployment system can not meet the requirements of resource management. Operator Observe Action Analyze • Observe: watch desired resource and actual resource0 码力 | 18 页 | 11.70 MB | 1 年前3vmware组Kubernetes on vSphere Deep Dive KubeCon China VMware SIG
placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes Memory Architecture 12 Why should you care about NUMA? Memory intensive workloads Nearly all database servers (e.g. Oracle, MongoDB), present a workload which will attempt to detect and consume as0 码力 | 25 页 | 2.22 MB | 1 年前3VMware SIG Deep Dive into Kubernetes Scheduling
placement options, for both control plane and worker nodes. 2 levels of scheduling and resource management are active. Currently no automatic scheduling integration occurs, that is, Kubernetes is not to solve potential issues with CPU and memory intensive workloads Kubernetes default resource management How it works Extending the functionality of Kubernetes Using vSphere DRS with Kubernetes High Memory Architecture 12 Why should you care about NUMA? Memory intensive workloads Nearly all database servers (e.g. Oracle, MongoDB), present a workload which will attempt to detect and consume as0 码力 | 28 页 | 1.85 MB | 1 年前3Model and Operate Datacenter by Kubernetes at eBay (提交版)
2010 2015- Now 2010- Now 2012- Now Bare metals Way to Kubernetes Search Grid Hadoop PoP Database Frontend VM Kubernetes plays magic api etcd kind: metadata: spec: control loop control loop Kubernetes You have your compute node now, all you need is to configure it by a configuration management orchestration. We use SaltStack. Let’s model a datacenter running Kubernetes Onboard Provision KafkaCluster, HadoopCluster, MongoDB, ESCluster …… Fleet (Compute, Network, Storage) Configuration Management Infrastructure Service Application Service Recap We are hiring! xnxin@ebay.com cmei@ebay.com0 码力 | 25 页 | 3.60 MB | 1 年前3Kubernetes开源书 - 周立
关于Node的⼀般信息,如内核版本、Kubernetes版本(kubelet和kube-proxy版本)、Docker版本(如果使⽤了Docker 的话)、OS名称。信息由Kubelet从Node收集。 Management(管理) 与 pods 、 services 不同,Node不是由Kubernetes创建的:它是由Google Compute Engine等云提供商在外部创建 的,或存在于物理机或虚 ⽌,除⾮web-0已经Running and Ready。 Pod Management Policies(Pod管理策略) 在Kubernetes 1.7及更⾼版本中,StatefulSet允许您放松其排序保证,同时通过 .spec.podManagementPolicy 字段保留其 唯⼀性和身份保证。 OrderedReady Pod Management(OrderedReady的Pod管理) OrderedReady OrderedReady Pod管理是StatefulSet的默认值。 它实现了上述 ⾏为。 Parallel Pod Management(并⾏ Pod管理) Parallel Pod管理告诉StatefulSet Controller 并⾏启动或终⽌所有Pod,并且不要等待Pod在启动或终⽌另⼀个Pod之 前变为“Running”和“Ready”或完全终⽌。 Update Strategies(更新策略)0 码力 | 135 页 | 21.02 MB | 1 年前3秘钥管理秘钥Turtles all the way down - Securely managing Kubernetes Secrets
Accessible by users who shouldn’t have access, e.g., CEO ○ Stored in public storage buckets Secret management requirements Identity Require strong identities and least privilege Auditing Verify the use security against penetration. Similarly, poor key management may easily compromise strong algorithms.” NIST SP 800-57, Recommendation for Key Management Keys get old Key rotation ● Key rotation is meant stored cardholder data against disclosure and misuse. 3.6 Fully document and implement all key-management processes and procedures for cryptographic keys used for encryption of cardholder data, including0 码力 | 52 页 | 2.84 MB | 1 年前3QCon北京2017/智能化运维/Self Hosted Infrastructure:以自动运维 Kubernetes 为例
distributed system Self driving infrastructure Topics ● Cluster management systems ● Today’s problems with operating cluster management systems ● A self-driving approach Motivation: microservices components ○ dynamic dependencies ○ fast deployment iteration ● Solution: automation Cluster management system ● Automation ○ Scheduling ○ Deployment ○ Healing ○ Discovery/load balancing ○ Scaling Kubernetes? ● Operational expertise around app management in k8s extends to k8s itself ○ E.g. scaling ● Bootstrapping simplified ● Simply cluster life cycle management ○ E.g. updates ● Upstream improvements0 码力 | 73 页 | 1.58 MB | 1 年前3KubeCon2020/腾讯会议大规模使用Kubernetes的技术实践
way to release stateful service Ø Advanced scheduling to improve service stability Ø Quota management to optimize resource orchestration efficiency Ø High performance and comprehensive autoscaling systems like Route System, CMDB, CI, Security Platform, etc. • Declarative application lifecycle management. • Support big data and AI jobs. • Optimize the isolation of resources, and improve resource Service Mesh. • Large-scale and high-performance autoscaling capabilities. • Multi-tenant and quota management. • etc. TKEx Architecture EKS (Elastic Kubernetes Service) TKE (Tencent Kubernetes Engine)0 码力 | 19 页 | 10.94 MB | 1 年前3k8s操作手册 2.3
③创建Secret secret资源是区分命名空间的 ★命令行方式创建secret 创建账号密码验证secret # kubectl create secret generic database-auth --from-literal=username=root --from- literal=password=passwd123 创建存储于某文件的认证secret # kubectl kubernetes.io/tls 2 7s database-auth Opaque 2 10m myssh-key-secret Opaque 2 3m43s # kubectl get secret database-auth -oyaml #secret的文本信息均以base64编码 "2023-12-05T21:19:48Z" name: database-auth namespace: default resourceVersion: "1167" uid: 38790a7e-30ee-4b75-8132-18bca66ca512 type: Opaque ★基于清单文件创建secret # cat > database-auth2.yaml <0 码力 | 126 页 | 4.33 MB | 1 年前3
共 33 条
- 1
- 2
- 3
- 4