GPU Resource Management On JDOS
GPU Resource Management On JDOS 梁永清 liangyongqing1@jd.com 提供的服务 1. 用于实验的 GPU 容器 2.基于 Kubeflow 的机器学习训练服务 3.模型管理和模型 Serving 服务 Experiment Training Serving 均基于容器,不对业务方直接提供 GPU 物理机 GPU 实验 JDOS 常规的容器服务0 码力 | 11 页 | 13.40 MB | 1 年前3Operator Pattern 用 Go 扩展 Kubernetes 的最佳实践
user leveraging the status block of the Custom Resource Configuration of the workload • Operator provides configuration via the spec section of the Custom Resource • Operator reconciles configuration and performance metrics about the Operand Alerting and Events • Operand sends useful alerts • Custom Resources emit custom events Auto-scaling • Operator scales the Operand up under increased load based on 工程结构 2. 熟悉 K8s Declaretive API 如何设计 3. 熟悉 CR(custom resource)相关事件如何获取 4. 熟悉 Operator Control Loop(即 Reconcile 函数) 如何实现 5. 熟悉如何生成二级资源(Managed Resource) 6. 熟悉如何写 UT 7. 熟悉如何制作 Helm Chart 课后思考题: 10 码力 | 21 页 | 3.06 MB | 9 月前3Kubernetes开源书 - 周立
Installing Kubernetes On-premises/Cloud Providers with Kubespray:https://kubernetes.io/docs/setup/custom- cloud/kubespray/ 03-使⽤Kubespray部署⽣产可⽤的Kubernetes集群(1.11.2) 14 K8s组件 本⽂概述了Kubernetes集群中所需的各种组件。 (Dashboard) Dashboard 是⼀个Kubernetes集群通⽤、基于Web的UI。它允许⽤户管理/排错集群中应⽤程序以及集群本身。 Container Resource Monitoring(容器资源监控) Container Resource Monitoring 将容器的通⽤时序指标记录到⼀个中⼼化的数据库中,并提供⼀个UI以便于浏览该数 据。 Cluster-level Logging(集群级别的⽇志) $GROUP_NAME/$VERSION (例如 apiVersion: batch/v1 )。 ⽀持的API组的完整列表可详⻅:Kubernetes API reference 。 使⽤ custom resources 扩展API有两个⽀持的路径: 1. CustomResourceDefinition 适⽤于⾮常基本的CRUD需求的⽤户。 2. 即将推出:⽤户需要完整的Kubernetes0 码力 | 135 页 | 21.02 MB | 1 年前3用户界面State of the UI_ Leveraging Kubernetes Dashboard and Shaping its Future
pod ● Global search ● Login mechanism ● Settings page ● Support for Cron Jobs ● Redesigned resource creation ● ...and much much more. github.com/kubernetes/dashboard/releases In-progress work it should follow philosophy of K8s, and should be [the foundation] on which we can build our custom command center.” → Survey response → Cluster Operator, running Kubernetes on-prem and in the cloud a huge win.” → Survey response → Cluster Operator, running Kubernetes in GCP and on-prem ● Custom Resource Definitions support ● Service topology view ● Mobile device support ● Cost estimates ● CI/CD0 码力 | 41 页 | 5.09 MB | 1 年前3Kubernetes + OAM 让开发者更简单
Kubernetes 控制器 各种各样的控制器(Controller) 容器 虚拟机 负载均衡 数据库 安全服务 网络 存储 Pod Deployment Service Node Custom Resource 一组容器 一组 Pod 副本 Pod 的访问入口 节点 自定义对象 声明式 API 对象 基础设施层能力 业务运维 平台工程师 业务研发 扩容策略 发布策略 分批策略 访问控制 理想中的应用管理平台 目标一:一个面向用户,应用为中心 CI/CD 流水线 应用 扩容策略 发布策略 分批策略 访问控制 流量配置 Pod Deployment Service Node Custom Resource 业务运维 业务研发 按需绑定 关键词:用户友好,应用层语义和抽象 平台工程师 Controller 目标二:一个高可扩展的应用管理平台 关键词:可插拔,可扩展,模块化,没有抽象程度锁定0 码力 | 22 页 | 10.58 MB | 1 年前3k8s操作手册 2.3
githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom- resources.yaml #创建控制器 Install the Tigera Calico operator and custom resource defini�ons # kubectl create -f �gera-operator.yaml -i 's/192.168.0.0\/16/10.244.0.0\/16/g' custom-resources.yaml #创建资源 Install Calico by crea�ng the necessary custom resource # kubectl create -f custom-resources.yaml #如果需要更改镜像,只能部署0 码力 | 126 页 | 4.33 MB | 1 年前3Kubernetes Native DevOps Practice
Environment variable [] VolumeMounts - Files to be shared or persisted [] Resources - Resource requirement ActiveDeadlineSeconds Timeout of build task Lifecycle - Actions defined Node group of build nodes Node group of user applications Scheduling customization Cluster Resource Auto Scaling kubelet can do image GC DevOps Service DevOps Operator DevOps Operator DevOps ElasticSearch ElasticSearch Prometheus Push Gateway push metric data • Build task can also expose custom metric data • Ephemeral build task can push metric to gateway if needed • Cluster autoscaler0 码力 | 21 页 | 6.39 MB | 1 年前3Go Programming Pattern in Kubernetes Philosophy
Deployment name: nginx-deployment minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 50 • API Object Oriented Programming Core kube-apiserver types register AstaXie Controller astaxie1 OnDelete OnUpdate OnAdd Kubernetes Custom Controller User operation A Real World Example • I want to have a Network object into k8s API Gode Generator • client-gen: generate typed Kubernetes AP client for type • client.Pod.Get().Resource(…).Do() • conversion-gen: seamless upgrades between API versions • apiVersion: k8s.io/v1alpha10 码力 | 29 页 | 2.12 MB | 1 年前3A Day in the Life of a Data Scientist Conquer Machine Learning Lifecycle on Kubernetes
scalable • Training controllers – simplify and manage the deployment of training jobs • TFJob – custom resource to handle drivers and config • Tensorflow, PyTorch, MXNet, Chainer, and more • JupyterHub to0 码力 | 21 页 | 68.69 MB | 1 年前3基于 KUBERNETES 的 容器器 + AI 平台
K8s - 单『控制集群』, 多『⽤用户集群』 • 镜像仓库 - 单『默认仓 库』,多仓库集成 管理理集群和节点 • 技术概览 • cloud provider • custom resource • ansible 管理理镜像仓库 • Cargo (内部项⽬目)- ⽣生产级镜像仓库解决⽅方案,基于 • ⼀一键⾼高可⽤用部署和维护 • 为多租户和复杂权限集成⽽而增强0 码力 | 19 页 | 3.55 MB | 1 年前3
共 36 条
- 1
- 2
- 3
- 4