High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics
availability, recovery semantics, and guarantees Vasiliki Kalavri | Boston University 2020 Today’s topics • High-availability and fault-tolerance in distributed stream processing • Recovery semantics and guarantees against failures and guarantee correct results after recovery? • how can we ensure minimal downtime and fast recovery? • how can we hide recovery side-effects from downstream applications? Vasiliki failures. 7 Vasiliki Kalavri | Boston University 2020 Recovery types 8 Vasiliki Kalavri | Boston University 2020 Recovery types • Precise recovery (exactly-once) • It hides the effects of a failure0 码力 | 49 页 | 2.08 MB | 1 年前3Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020
K1: Data Stream Processing and Analytics Spring 2020 2/04: Streaming languages and operator semantics Vasiliki Kalavri | Boston University 2020 Vasiliki Kalavri | Boston University 2020 Languages Almost universally supported across streaming systems and languages albeit with various names and semantics • Allow un-blocking the processing of blocking operators by defining bounded portions of the Continuous queries on data streams • New streams (derived) are defined as virtual views in SQL • Semantics are equivalent to having an append-only table to which new tuples are continuously added. 340 码力 | 53 页 | 532.37 KB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
missing, out-of-order, delayed data 4. Guarantee deterministic (on replay) and correct results (on recovery) 5. Combine batch (historical) and stream processing 6. Ensure availability despite failures streams that describe the changing view computed over the input stream according to the relational semantics of the operator. 19 Vasiliki Kalavri | Boston University 2020 • Base streams update relation streams that describe the changing view computed over the input stream according to the relational semantics of the operator. src dest bytes 1 2 20K 2 5 32K src dest total 1 2 20K 2 5 32K sum Results0 码力 | 45 页 | 1.22 MB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
alternatives • Runtime optimizations • load management, scheduling, state management • Optimization semantics, correctness, profitability Topics covered in this lecture ??? Vasiliki Kalavri | Boston University basics 4 Dataflow graph • operators are nodes, data channels are edges • channels have FIFO semantics • streams of data elements flow continuously along edges Operators • receive one or more conditions does the optimization preserve correctness? • maintain state semantics • maintain result and selectivity semantics • Dynamism: can the optimization be applied during runtime or does it0 码力 | 54 页 | 2.83 MB | 1 年前3Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020
URLs that contain malware? • Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem ??? Vasiliki Kalavri | Boston University 2020 URLs that contain malware? • Filter out all compromised passwords? • Remove duplicate tuples on recovery when using upstream backup? The membership problem A hash table requires O(logn) bits per element0 码力 | 74 页 | 1.06 MB | 1 年前3Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Boston University 2020 41 Recovery process 1. Stop and restart the application. All operators have empty state. ??? Vasiliki Kalavri | Boston University 2020 42 Recovery process 1. Stop and restart Vasiliki Kalavri | Boston University 2020 End-to-end exactly once • Flink’s checkpointing and recovery mechanism only resets the internal state of a streaming application • Some result records might Vasiliki Kalavri | Boston University 2020 End-to-end exactly once • Flink’s checkpointing and recovery mechanism only resets the internal state of a streaming application • Some result records might0 码力 | 81 页 | 13.18 MB | 1 年前3监控Apache Flink应用程序(入门)
-release-1.7/monitoring/metrics.html#latency-tracking 2. During periods of high load or during recovery, events might spend some time in the message queue until they are processed by Flink (see previous0 码力 | 23 页 | 148.62 KB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
hash-based • Items are perfectly balanced among workers • No routing table required • Key semantics are not preserved: values of the same key might be routed to different workers • Workers are are responsible for roughly the same amount of keys • No routing table is required • Key semantics preserved: values of the same key are always processed by the same worker • Popular keys cause preserve the key semantics? 11 ??? Vasiliki Kalavri | Boston University 2020 The power of both choices • Applying the power of two choices in a streaming setting and preserving key semantics would require0 码力 | 31 页 | 1.47 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics. 73 / 79 Fault Tolerance (1/2) ▶ Fault tolerance in Spark • RDD re-computation ▶ Fault tolerance consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics. 73 / 79 Fault Tolerance (1/2) ▶ Fault tolerance in Spark • RDD re-computation ▶ Fault tolerance consistent global snapshots (inspired by Chandy-Lamport). • Low runtime overhead, stateful exactly-once semantics. 73 / 79 Fault Tolerance (2/2) ▶ Acks sequences of records instead of individual records. ▶0 码力 | 113 页 | 1.22 MB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
management Scalability and elasticity Fault-tolerance and guarantees State management Operator semantics Window optimizations Filtering, counting, sampling Graph streaming algorithms Vasiliki Kalavri0 码力 | 34 页 | 2.53 MB | 1 年前3
共 13 条
- 1
- 2