Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
data streams Systems Algorithms Architecture and design Scheduling and load management Scalability and elasticity Fault-tolerance and guarantees State management Operator semantics Window end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs trade-offs one needs to consider when designing and deploying streaming applications 6 Vasiliki Kalavri | Boston University 2020 Grading Scheme (1) • No Exam • 5 in-class quizzes (10%): • Each0 码力 | 34 页 | 2.53 MB | 1 年前3Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 • The JobManager is a single point of failure Flink applications • It keeps metadata about application execution, such as pointers to completed checkpoints. software version 9 Reconfiguration cases ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is accumulated re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management0 码力 | 41 页 | 4.09 MB | 1 年前3监控Apache Flink应用程序(入门)
– 监控Apache Flink应用程序(入门) – 4 原文地址:https://www.ververica.com/blog/monitoring-apache-flink-applications-101 这篇博文介绍了Apache Flink内置的监控和度量系统,通过该系统,开发人员可以有效地监控他们的Flink作 业。通常,对于一个刚刚开始使用Apache Flink进行流处 metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging 1550652804788.1550652804788.1&__hssc=216506377.3.1551426921706&__hsfp=3017175250 hand, if you job’s performance is starting to degrade among the first metrics you want to look at are memory consumption and0 码力 | 23 页 | 148.62 KB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the Boston University 2020 13 • Profitability: under what conditions does the optimization improve performance? • can the decision be automatic? • Safety: under what conditions does the optimization preserve A B C D ??? Vasiliki Kalavri | Boston University 2020 B 21 Profitability • Running two applications together on a single core, one with operators B and C, the other with operators B and D. Redundancy0 码力 | 54 页 | 2.83 MB | 1 年前3Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Elasticity policies and state migration ??? Vasiliki Kalavri | Boston University 2020 Streaming applications are long-running • Workload will change • Conditions might change • State is accumulated to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict requirements 7 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide0 码力 | 93 页 | 2.42 MB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
minimal downtime and fast recovery? • how can we hide recovery side-effects from downstream applications? Vasiliki Kalavri | Boston University 2020 What is a failure? op 1. receive an event 2. store Kalavri | Boston University 2020 Fault-tolerance trade-offs 12 Steady-state overhead • How is performance affected by the fault-tolerance mechanism under normal, failure- free operation? • How much been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive0 码力 | 49 页 | 2.08 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Selectively drop records: • Temporarily trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate results. Slow0 码力 | 43 页 | 2.42 MB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 What is a stream? • In traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is the total packets exchanged between two IP addresses • the collection of IP addresses accessing a web server 12 With some practical value for use-cases with append-only data It preserves all history Summary Today you learned: • stream representations, stream processing models • streaming applications and use-cases • different approaches to data management • the relational streaming model vs0 码力 | 45 页 | 1.22 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional0 码力 | 113 页 | 1.22 MB | 1 年前3Apache Flink的过去、现在和未来
offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ 现在 Flink 1.9 的架构变化 Runtime Distributed Streaming Dataflow Query Processor DAG & StreamOperator offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ 未来 Micro Services O_0 O_1 I_0 I_1 I_2 P_0 P_1 P_2 S_0 S_1 Order Inventory Payment offline Real-time Batch Processing Continuous Processing & Streaming Analytics Event-driven Applications ✔ ✔ ✔ 扫码加入社群 与志同道合的码友一起 Code Up 阿里云开发者社区 Apache Flink China 2群 粘贴二维码 谢谢!0 码力 | 33 页 | 3.36 MB | 1 年前3
共 20 条
- 1
- 2