Heterogeneous Modern C++ with SYCL 2020
1Michael Wong Distinguished Engineer ● Chair of SYCL Heterogeneous Programming Language ● ISO C++ Directions Group past Chair ● Past CEO OpenMP ● ISOCPP.org Director, VP http://isocpp.org/wiki/ leading team developing HIP & CUDA backends for DPC++ Background in C++ programming models for heterogeneous systems Worked on ComputeCpp (SYCL) since its inception Contributor to the Khronos SYCL standard AI/Tensor HW Other BackendsSYCL 2020 is here! Open Standard for Single Source C++ Parallel Heterogeneous Programming SYCL 2020 is released after 3 years of intense work Significant adoption in Embedded0 码力 | 114 页 | 7.94 MB | 5 月前3Sender Patterns to Wrangle Concurrency in Embedded Devices
Sender Patterns to Wrangle Sender Patterns to Wrangle Concurrency in Embedded Devices Concurrency in Embedded Devices Michael Caisse Michael Caisse michael.caisse@intel.com michael.caisse@intel.com0 码力 | 106 页 | 26.36 MB | 5 月前32.4 Go 1.4 runtime
Go 1.4 runtime Gopher China 2015 1. Memory Allocator 2. Garbage Collector 3. Goroutine Scheduler 1. Memory Allocator 内存分配器 base on tcmalloc. 基于成熟方案,性能优秀。随着版本升级, 针对性改进,以期与垃圾回收器更好协作。 核心:自主管理,缓存复用,无锁分配。 阈值触发,并行标记,并发清理。 定期强制回收,释放物理内存。 版本升级,垃圾回收效率总是核心问题。 gogc. 阈值检查,或强制回收。 malloc next_gc 0 gogc runtime.gc() stop start mark sweep stop start mark sweep 0 2 2 1 forcegc 2m 1 mark. 暂停用户逻辑,并行标记。 scheduler thread processor goroutine max. 系统限制,允许调整。 runtime.GOMAXPROCS 调整 P 数量,会导致 G 任务队列重新分布。 M G P scheduler max = 10000 max = 256 runtime/debug.SetMaxThreads 超出限制,会导致进程崩溃。 newproc. 创建新并发任务。0 码力 | 29 页 | 608.57 KB | 1 年前3Khronos APIs for Heterogeneous Compute and Safety: SYCL and SYCL SC
FPGAs AMD GPUs Any CPU SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and more) Any CPU Experimental SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and EXPLORERhttps://godbolt.org/z/jdhKr7e5rExpressiveness and simplicity for heterogeneous programming in modern C++ New Features Unified Shared Memory | Parallel Reductions | Subgroup Operations | Class template0 码力 | 82 页 | 3.35 MB | 5 月前3Code Generation from Unified Robot Description Format for Accelerated Robotics
Performance improvements of more than 500x overthe state-of-the-art Compilertakes in standard Unified Robot Description Format (URDF) files and generates optimized code Setup data structure to optimize0 码力 | 93 页 | 9.29 MB | 5 月前3Leveraging the Power of C++ for Efficient Machine Learning on Embedded Devices
Leveraging the power of C++ for efficient machine learning on embedded devices Adrian Stanciu adrian.stanciu.pub@gmail.com CppCon, 2023 1 / 50About me ◮ I am a software engineer from Romania ◮ I have predictions ◮ Applications: ◮ Computer vision ◮ Medicine ◮ Search engines 6 / 50Embedded devices ◮ Computing devices designed to perform specific tasks within larger systems ◮ Applications: ◮ Consumer power consumption ◮ May have real-time performance constraints 7 / 50Machine learning on embedded devices ◮ Alternative to cloud-based machine learning ◮ Advantages: ◮ Real-time processing ◮ Low latency0 码力 | 51 页 | 1.78 MB | 5 月前3Rust 异步 Runtime 的兼容层 - 施继成
Rust 异步 Runtime 的兼容层 施继成 @ DatenLord Introduce what’s rust async runtime # Rust async runtime Analyze the reason of runtime isolation # Async runtime binding # Compatible layer 1 Create a wheel 2 3 # Rust async runtime 1 Light-weight task • Language and compiler define tasks • How to run it? • When to run it? • How does it deal with the I/O? Rust async runtime Runtime responsibilities it’s multi-thread model Rust async runtime Available Runtimes • Tokio • Async-std • Smol • Monoio Rust async runtime # Async runtime binding 2 Which runtime to choose ? • More adopters • Rich0 码力 | 22 页 | 957.41 KB | 1 年前3Designing an ultra low-overhead multithreading runtime for Nim
Designing an ultra low-overhead multithreading runtime for Nim Mamy Ratsimbazafy mamy@numforge.co Weave https://github.com/mratsim/weave Hello! I am Mamy Ratsimbazafy During the day blockchain/Ethereum multithreading: definitions and use-cases ◇ Parallel APIs ◇ Sources of overhead and runtime design ◇ Minimum viable runtime plan in a weekend 4 Understanding the design space Concurrency vs parallelism hardware threads The same distinctions can be done at a multithreaded language or multithreading runtime level. The problem 8 How to schedule M tasks on N hardware threads? Latency vs Throughput0 码力 | 37 页 | 556.64 KB | 1 年前3Real-Time Unified Data Layers: A New Era for Scalable Analytics, Search, and AI
Real-Time Unified Data Layers: A New Era for Scalable Analytics, Search, and AI v 1.1Table of Contents Introduction 1. The Interconnection of Analytics, Search, and AI 2. What is a Real-Time Unified Data Data Layer? 3. Why Do You Need a Real-Time Unified Data Layer? 4. 5.CrateDB: A Modern Real-Time Unified Data Layer1. Introduction Data teams are facing more challenges than ever. As applications generate and architecture teams must rethink traditional data infrastructures. The future lies in Real-Time Unified Data Layers—platforms that seamlessly support analytics, search, and AI workloads at scale. These0 码力 | 10 页 | 2.82 MB | 5 月前32.1.4 PingCAP Go runtime related problems in TiDB production environment
Go runtime related problems in TiDB production environment About me ● Arthur Mao(毛康力), Senior Engineer@PingCAP ● TiDB core developer (top3 contributor) ● GitBook about golang internals (@tiancaiamao) IO is ready => goroutine wake up == 4.3ms ○ Sometime even 10ms+ latency here! ○ The time spend on runtime schedule is not negligible ● When CPU is overload, which goroutine should be given priority? Analysis longer to be scheduled ● The runtime scheduling does not consider priority ● CPU dense workload could affect IO latency Conclusion Part II - Memory control ● Go Runtime ○ Allocated from OS (mmaped)0 码力 | 56 页 | 50.15 MB | 5 月前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100
相关搜索词
HeterogeneousModernC++withSYCL2020SenderPatternstoWrangleConcurrencyinEmbeddedDevices2.4Go1.4runtimeKhronosAPIsforComputeandSafetySCCodeGenerationfromUnifiedRobotDescriptionFormatAcceleratedRoboticsLeveragingthePowerofEfficientMachineLearningon继成2023RustChinaConf异步兼容DesigninganultralowoverheadmultithreadingNimRealTimeDataLayersNewEraScalableAnalyticsSearchAI2.1PingCAPrelatedproblemsTiDBproductionenvironment