Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki (Vasia) Kalavri vkalavri@bu.edu CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki Kalavri | Boston University 2020 Mobile game application 4 Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while in the tunnel • results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the extra life •0 码力 | 22 页 | 2.22 MB | 1 年前3Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally over time, rather than being being available in full before its processing begins. • Data streams are high-volume, real-time data that might be unbounded • we cannot store the entire stream in an accessible way • we have to0 码力 | 45 页 | 1.22 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Design Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design issues issues • Continuous vs. micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops0 码力 | 113 页 | 1.22 MB | 1 年前3【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink
0 码力 | 30 页 | 24.22 MB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Uddin Nasir et. al. The power of both choices: Practical load balancing for distributed stream processing engines. ICDE 2015. • Mitzenmacher, Michael. The power of two choices in randomized load balancing0 码力 | 31 页 | 1.47 MB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston single key-value into the DB • Iterator/RangeScan: seek to a specified key and then scan one key at a time from that point (keys are sorted) • Merge: a lazy read-modify-write RocksDB 11 Vasiliki Kalavri operator. Keyed state can only be used by functions that are applied on a KeyedStream: • When the processing method of a function with keyed input is called, Flink’s runtime automatically puts all keyed0 码力 | 24 页 | 914.13 KB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston University continuously along edges Operators • receive one or more input streams • perform tuple-at-a-time, window, logic, pattern matching transformations • output one or more streams of possibly different 1 • a filter operator typically has selectivity < 1 Is selectivity always known at development time? ??? Vasiliki Kalavri | Boston University 2020 Types of Parallelism 7 B A C A B D A A B0 码力 | 54 页 | 2.83 MB | 1 年前3Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston • e.g. joins, holistic aggregates • Compute on most recent events only • when providing real-time traffic information, you probably don't care about an accident that happened 2 hours ago • Recent val maxTemp = sensorData .map(r => Reading(r.id,r.time,(r.temp-32)*(5.0/9.0))) .keyBy(_.id) .timeWindow(Time.minutes(1)) .max("temp") } } 3 Example: Window sensor0 码力 | 35 页 | 444.84 KB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University Information • Instructor: Vasiliki Kalavri • Office: MCS 206 • Contact: vkalavri@bu.edu • Course Time & Location: Tue,Thu 9:30-10:45, MCS B33 • Office Hours: Tue,Thu 11:00-12:30, MCS 206 2 Vasiliki course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems • be proficient in using0 码力 | 34 页 | 2.53 MB | 1 年前3Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/23: Cardinality and frequency estimation corresponding to each of the m bits in the filter: • Increment the corresponding counter every time an element is added • To delete an element, decrease its corresponding counters and unset the corresponding recommended number of counters is ϵ δ ϵ ⋅ n 1 − δ p = ⌈ln 1 δ ⌉ m = ⌈2.71828 ϵ ⌉ Error and space/time trade-offs ??? Vasiliki Kalavri | Boston University 2020 27 Space requirements ??? Vasiliki Kalavri0 码力 | 69 页 | 630.01 KB | 1 年前3
共 406 条
- 1
- 2
- 3
- 4
- 5
- 6
- 41