Latency jitter - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Working with Asynchrony Generically: A Tour of C++ Executors

extern unifex::static_thread_pool low_latency; extern unifex::static_thread_pool workers; ex::sender auto accept_and_process_requests() { return ex::on(low_latency.get_scheduler(), accept_request()) request) { process_request(request); }) | unifex::repeat_effect(); } Accept requests on low-latency threads. Process the requests on the worker threads.16 EXAMPLE: TRANSITIONING EXECUTION CONTEXT unifex::static_thread_pool low_latency; extern unifex::static_thread_pool workers; unifex::task accept_and_process_requests() { while (true) { auto request = co_await ex::on(low_latency.get_scheduler()

0 码力 | 121 页 | 7.73 MB | 5 月前
3
Lock-Free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done!

multithreaded/lock-free code is hard… • There are many factors to consider: • Measurement: Throughput vs latency? • Workload: Proportion of reads vs writes • Hotness: Does the data fit in cache? • Contention: reclamation to get performance that is always competitive with both? • More work on optimizing for low latency (see the GitHub, there is some preliminary work!) Thanks to Guy Blelloch and Hao Wei for their collaboration collaboration on some of this work43 Daniel Anderson -- danielanderson.net Bonus Content: Latency44 Daniel Anderson -- danielanderson.net void retire(T* p) • Indicate that an object has been removed

0 码力 | 45 页 | 5.12 MB | 5 月前
3
使用硬件加速Tokio - 戴翔

Enqueue Software Producer Consumer Consumer Consumer • Synchronization latency • Memory/Cache latency • CPU cycles latency DLB ： Dynamic Load Balance DLB Enqueue Logic Head and Tail pointers Load Balancer Producer Producer Consumer Consumer Consumer • No Synchronization latency • No memory/cache latency • No CPU cycles DLB-Assist Channel Intro Hardware Senders Receive Senders Senders

0 码力 | 17 页 | 1.66 MB | 1 年前
3
C++高性能并行编程与优化 - 课件 - 08 CUDA 开启的 GPU 编程

字节）。 https://developer.download.nvidia.cn/CUDA/training/register_spilling.pdf 板块中的线程数量过少：延迟隐藏（ latency hiding ）失效 • 我们说过，每个 SM 一次只能执行板块中的一个线程组（ warp ），也就是 32 个线程。 • 而当线程组陷入内存等待时，可以切换到另一个线程，继续计算，这样一个 bank conflict ） GPU 优化手法总结 • 线程组分歧（ wrap divergence ）：尽量保证 32 个线程都进同样的分支，否则两个分支都会执行。 • 延迟隐藏（ latency hiding ）：需要有足够的 blockDim 供 SM 在陷入内存等待时调度到其他线程组。 • 寄存器打翻（ register spill ）：如果核函数用到很多局部变量（寄存器），则

0 码力 | 142 页 | 13.52 MB | 1 年前
3
绕过conntrack，使用eBPF增强 IPVS优化K8s网络性能

measurement Test topology Test result Service type Short connection cps Short connection P99 latency Long connection pps ClusterIP +40% -31% not available NodePort +64% -47% +22% Test result •

0 码力 | 24 页 | 1.90 MB | 1 年前
3
hazard pointer synchronous reclamation

Michael Watch CPPCON 2021 Talk on Concurrency TS2 The Upcoming Concurrency TS Version 2 for Low-Latency and Lockless Synchronization (with Paul McKenney and Michael Wong)

0 码力 | 31 页 | 856.38 KB | 5 月前
3
C++高性能并行编程与优化 - 课件 - 07 深入浅出访存优化

， cache miss • 伪共享： false sharing • 预取： prefetching • 直写： streaming ， write-through • 延迟隐藏： latency hiding • 陷入空转以等待内存： stall • 循环分块： loop-tiling ， loop-blocking • 寄存器分块： unroll-and-jam ， register-blocking

0 码力 | 147 页 | 18.88 MB | 1 年前
3

共 7 条前往

页

分类

语言

格式

Working with Asynchrony Generically: A Tour of C++ Executors

Lock-Free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done!

使用硬件加速Tokio - 戴翔

C++高性能并行编程与优化 - 课件 - 08 CUDA 开启的 GPU 编程

绕过conntrack，使用eBPF增强 IPVS优化K8s网络性能

hazard pointer synchronous reclamation

C++高性能并行编程与优化 - 课件 - 07 深入浅出访存优化