Stream processing fundamentals - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/23: Stream Processing Fundamentals Vasiliki Kalavri | Boston University 2020 What is a stream? • In In traditional data processing applications, we know the entire dataset in advance, e.g. tables stored in a database. A data stream is a data set that is produced incrementally over time, rather than before its processing begins. • Data streams are high-volume, real-time data that might be unbounded • we cannot store the entire stream in an accessible way • we have to process stream elements on-the-fly0 码力 | 45 页 | 1.22 MB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
Scalable Stream Processing - Spark Streaming and Flink Amir H. Payberah payberah@kth.se 05/10/2018 The Course Web Page https://id2221kth.github.io 1 / 79 Where Are We? 2 / 79 Stream Processing Systems Systems Design Issues ▶ Continuous vs. micro-batch processing ▶ Record-at-a-Time vs. declarative APIs 3 / 79 Outline ▶ Spark streaming ▶ Flink 4 / 79 Spark Streaming 5 / 79 Contribution ▶ Design micro-batch processing • Record-at-a-Time vs. declarative APIs 6 / 79 Spark Streaming ▶ Run a streaming computation as a series of very small, deterministic batch jobs. • Chops up the live stream into batches0 码力 | 113 页 | 1.22 MB | 1 年前3【04 RocketMQ 王鑫】Stream Processing with Apache RocketMQ and Apache Flink
0 码力 | 30 页 | 24.22 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/28: Stream ingestion and pub/sub systems Streaming sources Sockets IoT devices and sensors Databases and KV stores Message queues and brokers Where do stream processors read data from? 2 Challenges • can be distributed • out-of-sync sources may produce unpredictable delays • might be producing too fast • stream processor needs to keep up and not shed load • might be producing too slow or become idle • stream processor should be able to make progress •0 码力 | 33 页 | 700.14 KB | 1 年前3Skew mitigation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
??? Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/16: Skew mitigation ??? Vasiliki Kalavri | Boston University 2020 Lossy Counting • Find all items x in a data stream such that: • freq(x) > δ*N, where N is the number of stream elements • The solution will not contain any item y with frequency: Boston University 2020 Notation (I) Input: a stream of items N: number of items in the stream fe: true frequency of the item e in the input stream f: estimated frequency of item δ: user-defined0 码力 | 31 页 | 1.47 MB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/25: State Management Vasiliki Kalavri | Boston key the stream on the sensor ID val keyedData: KeyedStream[Reading, String] = sensorData .keyBy(_.id) // apply a stateful FlatMapFunction on the keyed stream val alerts: operator. Keyed state can only be used by functions that are applied on a KeyedStream: • When the processing method of a function with keyed input is called, Flink’s runtime automatically puts all keyed0 码力 | 24 页 | 914.13 KB | 1 年前3Streaming optimizations - CS 591 K1: Data Stream Processing and Analytics Spring 2020
| Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 4/14: Stream processing optimizations ??? Vasiliki Kalavri | Boston output one or more streams of possibly different type A series of transformations on streams in Stream SQL, Scala, Python, Rust, Java… ??? Vasiliki Kalavri | Boston University 2020 Logic StateStateful operators 5 • Stateful operators maintain state that reflect part of the stream history they have seen • windows, continuous aggregations, distinct… • State is commonly partitioned 0 码力 | 54 页 | 2.83 MB | 1 年前3Windows and triggers - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 2/11: Windows and Triggers Vasiliki Kalavri | Boston application env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) // ingest sensor stream val sensorData: DataStream[SensorReading] = env.addSource(...) } } Or ProcessingTIme or Vasiliki Kalavri | Boston University 2020 Window operators can be applied on a keyed or a non-keyed stream: • Window operators on keyed windows are evaluated in parallel • Non-keyed windows are processed0 码力 | 35 页 | 444.84 KB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Vasiliki Kalavri | Boston University 2020 CS 591 K1: Data Stream Processing and Analytics Vasiliki (Vasia) Kalavri vkalavri@bu.edu Spring 2020 1/21: Introduction Vasiliki Kalavri | Boston University the course, you will hopefully: • know when to use stream processing vs other technology • be able to comprehensively compare features and processing guarantees of streaming systems • be proficient end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream processing systems work and what factors affect their performance • be aware of the challenges and trade-offs0 码力 | 34 页 | 2.53 MB | 1 年前3Notions of time and progress - CS 591 K1: Data Stream Processing and Analytics Spring 2020
CS 591 K1: Data Stream Processing and Analytics Spring 2020 2/06: Notions of time and progress Vasiliki Kalavri | Boston University 2020 Mobile game application • input stream: user activity minute? 4 Vasiliki Kalavri | Boston University 2020 • Processing time • the time of the local clock where an event is being processed • a processing-time window wouldn’t account for game activity while while the train is in the tunnel • results depend on the processing speed and aren’t deterministic • Event time • the time when an event actually happened • an event-time window would give you the0 码力 | 22 页 | 2.22 MB | 1 年前3
共 1000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 100