Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston University 2020 • Practical way to perform operations on API, you can use the time characteristic to tell Flink how to define time when you are creating windows. The time characteristic is a property of the StreamExecutionEnvironment: Configuring a time characteristic be applied on a keyed or a non-keyed stream: • Window operators on keyed windows are evaluated in parallel • Non-keyed windows are processed in a single thread To create a window operator, you need0 码力 | 35 页 | 444.84 KB | 1 年前3Streaming languages and operator semantics - CS 591 K1: Data Stream Processing and Analytics Spring 2020
(I) • Time-based (logical) windows define their contents as a function of time. • average price of items bought within the last 5 minutes • Count-based (physical) windows define their contents according Boston University 2020 Window types (II) • Fixed windows have bound which don’t move • events received between 1/1/2019 and 12/1/2019 • Landmark windows have a fixed lower bound and the upper bound advances events since 1/1/2019 • Sliding windows have fixed size but both their bounds advance for new events • last 10 events or event in the last minute • Tumble windows are non-overlapping fixed-size0 码力 | 53 页 | 532.37 KB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
joinedStream = stream1.join(stream2) 27 / 79 Join Operation (2/3) ▶ Stream-stream joins ▶ Joins over windows of the streams. val windowedStream1 = stream1.window(Seconds(20)) val windowedStream2 = stream2 ▶ Use groupBy() and window() to express windowed aggregations. // count words within 10 minute windows, updating every 5 minutes. // streaming DataFrame of schema {time: Timestamp, word: String} val calls format("console").start() query.awaitTermination() 66 / 79 Late Data (3/3) // count words within 10 minute windows, updating every 5 minutes. // streaming DataFrame of schema {timestamp: Timestamp, word: String}0 码力 | 113 页 | 1.22 MB | 1 年前3Streaming in Apache Flink
extractTimestamp(MyEvent event) { return element.getCreationTime(); } } Windows (Not the OS) Global Vs Keyed Windows stream. .keyBy() .window( ) .reduce|a max)); } } Precombine Produce final result Lateness • By default, when using event-time windows, late events are dropped. stream. .keyBy(...) .window(...) .allowedLateness(Time.seconds(10)) 0 码力 | 45 页 | 3.00 MB | 1 年前3Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020
properties 14 Vasiliki Kalavri | Boston University 2020 Watermarks are essential to both event-time windows and operators handling out-of-order events: • When an operator receives a watermark with time • It can then either trigger computation or order received events. 15 Evaluation of event-time windows Vasiliki Kalavri | Boston University 2020 16 http://streamingbook.net/fig/3-2 14 Vasiliki Kalavri0 码力 | 22 页 | 2.22 MB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Software requirements • All assignments assume a UNIX-based setup. • If you are a Windows user, you are advised to use Windows subsystem for Linux (WSL), Cygwin, or a Linux virtual machine to run Flink in0 码力 | 34 页 | 2.53 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
University 2020 Which tuples to drop? • Window-aware load shedding applies shedding to entire windows instead of individual tuples • When discarding tuples at the sources or another point in a query a window-based concept drift. • The metric is defined by computing a similarity metric across windows. 18 ??? Vasiliki Kalavri | Boston University 2020 How many tuples to drop? • The amount of tuples0 码力 | 43 页 | 2.42 MB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
• Stateful operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct… • State is commonly partitioned by key • State can be cleared0 码力 | 45 页 | 1.22 MB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Stateful operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct… • State is commonly partitioned by key • State can be0 码力 | 54 页 | 2.83 MB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
less than (δ-ε)*N. 5 ??? Vasiliki Kalavri | Boston University 2020 Notation (II) • We define windows of size w = 1/ε with increasing numeric ids, starting from 1. • e.g., if ε=0.2, w=5 (5 items0 码力 | 31 页 | 1.47 MB | 1 年前3
共 13 条
- 1
- 2