Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Time Abnormal System Behavior Detection”, USENIX Security '18 12 Interested in a more research-oriented project? Let’s discuss it during office hours. Vasiliki Kalavri | Boston University 2020 Dataset0 码力 | 34 页 | 2.53 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
sources Files, e.g. transaction logs Sockets IoT devices and sensors Databases and KV stores Message queues and brokers Where do stream processors read data from? 2 Challenges • can be distributed on the network • Failure handling: application needs to be aware of message loss, producers and consumers always online 5 Message queues • Asynchronous point-to-point communication • Lightweight buffer guarantees • Each message is processed only once, by a single consumer • Event retrieval is not defined by content / structure but its order • FIFO, priority producer consumer queue 6 Message brokers Message0 码力 | 33 页 | 700.14 KB | 1 年前3Scalable Stream Processing - Spark Streaming and Flink
} 51 / 79 updateStateByKey vs. mapWithState Example (1/3) ▶ The first micro batch contains a message a. ▶ updateStateByKey • updateFunc = (values: Seq[Int], state: Option[Int]) => Some(sum) • Input: 1 52 / 79 updateStateByKey vs. mapWithState Example (1/3) ▶ The first micro batch contains a message a. ▶ updateStateByKey • updateFunc = (values: Seq[Int], state: Option[Int]) => Some(sum) • Input: 1 52 / 79 updateStateByKey vs. mapWithState Example (1/3) ▶ The first micro batch contains a message a. ▶ updateStateByKey • updateFunc = (values: Seq[Int], state: Option[Int]) => Some(sum) • Input:0 码力 | 113 页 | 1.22 MB | 1 年前3监控Apache Flink应用程序(入门)
based on this event become visible. Once the event is created it is usually stored in a persistent message queue, before it is processed by Apache Flink, which then writes the results to a database or calls including the following: 1. It might take a varying amount of time until events are persisted in the message queue. caolei – 监控Apache Flink应用程序(入门) 进度和吞吐量监控 – 15 4 https://ci.apache.org/projects/flin html#latency-tracking 2. During periods of high load or during recovery, events might spend some time in the message queue until they are processed by Flink (see previous section). 3. Some operators in a streaming0 码力 | 23 页 | 148.62 KB | 1 年前3Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Dataflow worker activities worker 1 worker 2 worker 3 receive message deserialization processing serialization send message waiting waiting 13 ??? Vasiliki Kalavri | Boston University 20200 码力 | 93 页 | 2.42 MB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
O1 O2 N’i I’1 I’2 O’1 O’2 • The communication network ensures order-preserving, reliable message transport, e.g. TCP. • Failures are single-node and fail- stop, i.e. no network partitions or0 码力 | 49 页 | 2.08 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
operators, eventually reaching the data stream sources. • To ensure no data loss, a persistent input message queue, such as Kafka, and enough storage is required. 21 o1 src o2 back-pressure target: 400 码力 | 43 页 | 2.42 MB | 1 年前3Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
own state. 2. Sends a marker out on each of its outgoing channels. a. The marker is a special message that is not recorded in the snapshot but enforces the causal consistency. 3. Starts recording event A happens causally before B and B is pre-snapshot, then A is also pre-snapshot When is a message included in the snapshot? Does the algorithm satisfy causality? ??? Vasiliki Kalavri | Boston0 码力 | 81 页 | 13.18 MB | 1 年前3
共 8 条
- 1