Scalable Stream Processing - Spark Streaming and Flink
• The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional • The performance of these operation is proportional to the size of the state. ▶ mapWithState • It is executed only on set of keys that are available in the last micro batch. • The performance is proportional0 码力 | 113 页 | 1.22 MB | 1 年前3Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
re-partitioning and migration • minimize communication • keep duration short • minimize performance disruption, e.g. latency spikes • avoid introducing load imbalance • Resource management Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict When and how much to adapt? 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict0 码力 | 41 页 | 4.09 MB | 1 年前3监控Apache Flink应用程序(入门)
metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging 1550652804788.1550652804788.1&__hssc=216506377.3.1551426921706&__hsfp=3017175250 hand, if you job’s performance is starting to degrade among the first metrics you want to look at are memory consumption and your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing the number of task slots per TaskManager (in case of a Standalone setup), by providing0 码力 | 23 页 | 148.62 KB | 1 年前3Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions, predict requirements 7 ▸ Accuracy ▸ no over/under-provisioning ▸ Stability ▸ no oscillations ▸ Performance ▸ fast convergence scaling controller detect symptoms decide whether to scale decide MIMO too complex • Action • predictive, dataflow-wide The output signal is the delay time Performance depends on parameter selection, e.g. poles placement, sampling period, damping Cannot identify0 码力 | 93 页 | 2.42 MB | 1 年前3Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint checkpoint? ??? Vasiliki Kalavri | Boston University 2020 Performance implications 49 How may checkpointing affect application performance? How often to checkpoint? Do we need to checkpoint the complete0 码力 | 81 页 | 13.18 MB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Kalavri | Boston University 2020 Fault-tolerance trade-offs 12 Steady-state overhead • How is performance affected by the fault-tolerance mechanism under normal, failure- free operation? • How much been checkpointed, i.e. the user’s non- deterministic code is not re-executed Bloom filters for performance • Maintaining a catalog of all IDs ever seen and checking it for de-duplication is expensive0 码力 | 49 页 | 2.08 MB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
placement decisions • different algorithms, e.g. hash-based vs. broadcast join • What does performance depend on? • input data, intermediate data • operator properties • How can we estimate the Boston University 2020 13 • Profitability: under what conditions does the optimization improve performance? • can the decision be automatic? • Safety: under what conditions does the optimization preserve0 码力 | 54 页 | 2.83 MB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs one needs to consider when designing and deploying0 码力 | 34 页 | 2.53 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Elasticity Selectively drop records: • Temporarily trades-off result accuracy for sustainable performance. • Suitable for applications with strict latency constraints that can tolerate approximate0 码力 | 43 页 | 2.42 MB | 1 年前3
共 9 条
- 1