绕过conntrack,使用eBPF增强 IPVS优化K8s网络性能
Jianmingfan (kenieevan@github) Zhiguohong (honkiko@github) Bypassing conntrack: Optimizing K8s Service By Enhancing IPVS with eBPF Agenda 目录 01 Problems with K8s Service How to optimize 02 Comparison Iptables rules are difficult to debug IPVS mode • Services are organized in hash table • IPVS DNAT • conntrack/iptables SNAT • Pros • O(1) time complexity in control/data plane • Stably runs for two decades decades • Support rich scheduling algorithm • Cons • Performance cost caused by conntrack • Some bugs How to optimize • Guidelines • Use less CPU instructions to process each packet • Don’t monopolize0 码力 | 24 页 | 1.90 MB | 1 年前3腾讯云 Kubernetes 高性能网络技术揭秘——使用 eBPF 增强 IPVS 优化 K8s 网络性能-范建明
管理service • IPVS 仅仅提供了DNAT,还需要借用 iptables+conntrack 做SNAT • 控制面和数据面算法复杂度都是O(1) • 经历了二十多年的运行,比较稳定成熟 • 支持多种调度算法 优势 IPVS mode 不足之处 • 没有绕过conntrack,由此带来了性能开销 • 在k8s的实际使用中还有一些Bug 02 优化的方法 编译成eBPF中间代码 • 注入内核 • 挂载到network traffic control • 报文激发eBPF代码 技术创新点一 • IPVS 对conntrack的功能依赖 • Iptables SNAT • 具体如何绕过conntrack? • 进报文 • 将处理请求的钩子从nf local-in 前移到nf pre-routing • skb的路由指针是NULL • 处理分片 ip_output -> NF postrouting -> ip_finish_output • 修改成: • 对kenel 做了hack,直接访问ip_finish_output IPVS 绕过conntrack 技术创新点二 • 在linux traffic control上挂一段eBPF 代码,在网卡出报文之前做SNAT • 尽量将大部分代码放在eBPF中,方便升级和维护。 • eBPF loader0 码力 | 27 页 | 1.19 MB | 9 月前3North-South Load Balancing of Kubernetes Services with eBPF/XDP
3:30000 httpd httpd 1010101010111 1010101010111 1010101010111 1010101010111 -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FORWARD -d 10.217.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-SERVICES -d 10.99.38.155/32 -p tcp -m comment --comment "default/nginx-59: has no endpoints" -m tcp --dport 80 -j REJECT --reject-with0 码力 | 11 页 | 444.46 KB | 1 年前3Cilium v1.5 Documentation
mark --mark KUBE-MARK-MASQ -j ACCEPT -s 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -d 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT KUBE-SERVICES ! -s synchroniza�on kicked in or un�l pods were restarted. Upgrading from >=1.4.0 to 1.5.y In v1.4, the TCP conntrack table size ct-global-max-entries-tcp ConfigMap parameter was ineffec�ve due to a bug and thus, table u�liza�on below 25%. If needed, the interval can be set to a sta�c interval with the op�on --conntrack-gc-interval . If connec�vity fails and cilium monitor --type drop shows xx drop (CT: Map insertion0 码力 | 740 页 | 12.52 MB | 1 年前3Cilium v1.6 Documentation
mark --mark KUBE-MARK-MASQ -j ACCEPT -s 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -d 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT KUBE-SERVICES ! -s policy_l7_total instead. 1.5 Upgrade Notes Upgrading from >=1.4.0 to 1.5.y 1. In v1.4, the TCP conntrack table size ct-global-max-entries-tcp ConfigMap parameter was ineffective due to a bug and thus utilization below 25%. If needed, the interval can be set to a static interval with the option --conntrack-gc- interval. If connectivity fails and cilium monitor --type drop shows xx drop (CT: Map insertion0 码力 | 734 页 | 11.45 MB | 1 年前3Cilium v1.10 Documentation
mark --mark KUBE-MARK-MASQ -j ACCEPT -s 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -d 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT KUBE-SERVICES ! -s cilium_policy_import_errors_total instead. cilium_datapath_errors_total is removed. Please use cilium_datapth_conntrack_dump_resets_total instead. Label mapName in cilium_bpf_map_ops_total is removed. Please use label label subnet_id and availability_zone instead. New Metrics cilium_datapath_conntrack_dump_resets_total Number of conntrack dump resets. Happens when a BPF entry gets removed while dumping the map is in0 码力 | 1307 页 | 19.26 MB | 1 年前3Cilium v1.7 Documentation
mark --mark KUBE-MARK-MASQ -j ACCEPT -s 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -d 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT KUBE-SERVICES ! -s policy_l7_total instead. 1.5 Upgrade Notes Upgrading from >=1.4.0 to 1.5.y 1. In v1.4, the TCP conntrack table size ct-global-max-entries-tcp ConfigMap parameter was ineffective due to a bug and thus utilization below 25%. If needed, the interval can be set to a static interval with the option --conntrack-gc- interval. If connectivity fails and cilium monitor --type drop shows xx drop (CT: Map insertion0 码力 | 885 页 | 12.41 MB | 1 年前3Cilium v1.8 Documentation
mark --mark KUBE-MARK-MASQ -j ACCEPT -s 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -d 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT KUBE-SERVICES ! -s mark --mark KUBE-MARK-MASQ -j ACCEPT -s 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -d 10.233.64.0/18 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT KUBE-SERVICES ! -s required the following command can be used to check the currently configured maximum number of TCP conntrack entries: sudo grep -R CT_MAP_SIZE_TCP /var/run/cilium/state/templates/ If the maximum number is0 码力 | 1124 页 | 21.33 MB | 1 年前3K8S安装部署开放服务
C. 安装 ipvs 【注】ipvs 将作为 kube-proxy 的代理模式 Step1: 安装 yum install ipvsadm ipset sysstat conntrack libseccomp –y Step2: 加载 cat > /etc/sysconfig/modules/ipvs.modules <conntrack modprobe -- ip_tables modprobe -- ip_set modprobe -- xt_set modprobe -- ipt_set modprobe /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack D. 安装 docker-ce 和 k8s See https://cloud.tencent.com/developer/article/1627330 Step1: 安装 docker-ce 0 码力 | 54 页 | 1.23 MB | 1 年前3CentOS 7 操作命令-基础篇1.2
tcp --syn -j DROP 或: iptables -A INPUT -i ens33 -m conntrack --ctstate NEW,INVALID -j DROP 2.允许已建立连接或有关联的数据包通过 #iptables -A INPUT -i ens33 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT 3.允许访问特定端口 允许访问特定端口 #iptables -A INPUT -i ens33 -p tcp --dport 22 -m conntrack --ctstate NEW -j ACCEPT -m muliport --dports 20,21,22 -m conntrack ...... 4.允许、限制拒绝 icmp #iptables -A INPUT -p icmp --icmp-type echo-request0 码力 | 115 页 | 8.68 MB | 1 年前3
共 58 条
- 1
- 2
- 3
- 4
- 5
- 6