Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
along edges Operators • receive one or more input streams • perform tuple-at-a-time, window, logic, pattern matching transformations • output one or more streams of possibly different type A streams in Stream SQL, Scala, Python, Rust, Java… ??? Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 521> <#WorldCup, 480> <#StarWars, 300> <#Brexit> <#Brexit, 521> Stateful • if A is a projection on multiple attributes • if A is an idempotent aggregation Operator separation A A2 A1 Separate operators into smaller computational steps • beneficial if it enables other 0 码力 | 54 页 | 2.83 MB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
• operators can accumulate state, have multiple inputs, express event- time custom window-based logic • some systems, like Timely Dataflow support cyclic dataflows and iterations on streams • Operators w8 w7 Twitter source Extract hashtags Count topics Trends sink w1 w2 w3 w4 w5 w6 w7 w8 Logic Query Plan Deployment 39 Vasiliki Kalavri | Boston University 2020 source sink input port output along edges Operators • receive one or more input streams • perform tuple-at-a-time, window, logic, pattern matching transformations • output one or more streams of possibly different type A0 码力 | 45 页 | 1.22 MB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 520> <#WorldCup, 480> <#StarWars, 300> <#Brexit> <#Brexit, 521> <#WorldCup and used to compute results: a local or instance variable that is accessed by a task’s business logic Operator state is scoped to an operator task, i.e. records processed by the same parallel task flatMap2(TaxiFare fare, Collector > out) throws Exception { // similar logic for processing fare events } } } Java example (cont.) 21 Vasiliki Kalavri | Boston 0 码力 | 24 页 | 914.13 KB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
processing in Apache Beam / Google Cloud Dataflow 2 Vasiliki Kalavri | Boston University 2020 Logic State<#Brexit, 520> <#WorldCup, 480> <#StarWars, 300> Any non-trivial streaming computation machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State <#Brexit, 520> <#WorldCup, 480> <#StarWars, 300> <#Brexit> Any non-trivial streaming machine learning models State in dataflow computations 3 Vasiliki Kalavri | Boston University 2020 Logic State <#Brexit, 521> <#WorldCup, 480> <#StarWars, 300> <#Brexit, 521> Any non-trivial 0 码力 | 49 页 | 2.08 MB | 1 年前3Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
list of all collected elements when evaluated: • They require more space but support more complex logic. • ProcessWindowFunction Window functions 14 Vasiliki Kalavri | Boston University 2020 val minTempPerWindow: Vasiliki Kalavri | Boston University 2020 Advanced transformation functions used to implement custom logic for which predefined windows and transformations might not be suitable: • they provide access to0 码力 | 35 页 | 444.84 KB | 1 年前3Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Operators process stream elements one-by-one. • selection, filtering, projection, renaming. • Logic Operators define rules for complex pattern detection without order constraints. • conjunction of of an item I is satisfied when I is not detected. 10 Vasiliki Kalavri | Boston University 2020 Logic Operators Example Select IStream(S1.A, S2.B) From S1 [Rows 50], S2 [Rows 50] (A & B) || (C & D)0 码力 | 53 页 | 532.37 KB | 1 年前3Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
scale out to process increased load • scale in to save resources • Fix bugs or change business logic • Optimize execution plan • Change operator placement • skew and straggler mitigation • Migrate0 码力 | 41 页 | 4.09 MB | 1 年前3Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
??? Vasiliki Kalavri | Boston University 2020 Algorithm Generalizations The marker-forwarding logic itself guarantees validity: Each local snapshotting action produces markers that separate pre-snapshot0 码力 | 81 页 | 13.18 MB | 1 年前3
共 8 条
- 1