Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
I1 O1 N’i I1 Vasiliki Kalavri | Boston University 2020 Passive Standby • Each primary periodically checkpoints its state and sends it to the secondary 6 Ni primary secondary I1 O1 N’i update IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpbm8lbEgVZcZmU7YheIsvLxP/vHZd85oX1fpNk IHnFGjZWaca9SdWvuDGSZeAWp QoFGr/LV7Scsi1EaJqjWHc9NTZBTZTgTOCl3M40pZSM6wI6lksaog3x26IScWqVPokTZkobM1N8TOY21Hseh7YypGepFbyr+53UyE10FOZdpZlCy+aIoE8QkZPo16XOFzIixJZQpbm8lbEgVZcZmU7YheIsvLxP/vHZd85oX1fpNk0 码力 | 81 页 | 13.18 MB | 1 年前3Flink如何实时分析Iceberg数据湖的CDC数据
2、每次数据D致都要 MERGE 存量数据 。T+ 方GT新3R效性差。 3、不M持CR1ps+rt。 缺点 SCaDk + )=AFa IL()(数据 MER,E .NTO GE=DE US.N, chan>=E ON GE=DE.GE=D.< = chan>=E.GE=D.< WHEN MAT(HE) AN) +LA,=H)H THEN )ELETE WHEN MAT(HE) AN) +LA,<>H)H ebezium 1lHLI W生支持 ./. 数据消费 -- BPDaRDs a mysOl BCB RaAlD sMSPBD .R0,T0 T,-L0 mysOl_AHLlMF HC INT NOT N=LL, LamD ;TRING, CDsBPHNRHML ;TRING, UDHFGR /0.I6,L '0,() ) WIT3 'BMLLDBRMP' = 'mysOl-BCB', 'GMsRLamD' S4aps25t- S4aps25t-2 Meta Data 1NS/RT / UPDAT/ / D/2/T/ 写入 CR/AT/ TA,2/ D;ABl= ( id 1NT N5T NU22, d;E; 1NT N5T NU22, ( 1 (1,2 1 (1,2 D (1,2 1 (1,3 1 (1,2 D (1,2 1 (1,3 1 (3,5 1 (1,20 码力 | 36 页 | 781.69 KB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
2020 A simple system model stream sources N1 NK N2 … input queue output queue primary nodes secondary nodes other apps I1 I2 O1 O2 N’1 N’K N’2 … I’1 I’2 O’1 O’2 6 Vasiliki Kalavri | Boston Boston University 2020 Assumptions Ni primary secondary I1 I2 O1 O2 N’i I’1 I’2 O’1 O’2 • The communication network ensures order-preserving, reliable message transport, e.g. TCP. • Failures much input do we need to re-play? How expensive is it to re-construct the state? How fast can we de-duplicate output? Vasiliki Kalavri | Boston University 2020 Gap Recovery • Restart the operator0 码力 | 49 页 | 2.08 MB | 1 年前3Graph streaming algorithms - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Stefani, Lorenzo De, et al. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD 2017. https://www.kdd.org/ kdd2016/papers/files/rfp0465-de-stefaniA.pdf Further0 码力 | 72 页 | 7.77 MB | 1 年前3Stream ingestion and pub/sub systems - CS 591 K1: Data Stream Processing and Analytics Spring 2020
unsubscribe advertise(): information reg. future events Publish/Subscribe Systems 17 Pub/Sub levels of de-coupling • Space: interacting parties do not need to know each other • Publishers do not know who0 码力 | 33 页 | 700.14 KB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
University 2020 RocksDBStateBackend • Stores all state into embedded RocksDB instances • Accesses require de/serialization • Checkpoints state to a remote file system and supports incremental checkpoints •0 码力 | 24 页 | 914.13 KB | 1 年前3PyFlink 1.15 Documentation
in a sep- arate Flink cluster. Flink is responsible for talking with Kubernetes and allocating and de-allocating TaskManagers depending on the required resources. ./bin/flink run-application \ --target0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
in a sep- arate Flink cluster. Flink is responsible for talking with Kubernetes and allocating and de-allocating TaskManagers depending on the required resources. ./bin/flink run-application \ --target0 码力 | 36 页 | 266.80 KB | 1 年前3Filtering and sampling streams - CS 591 K1: Data Stream Processing and Analytics Spring 2020
What summary would let us compute the statistical variance of this series? 3 var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data values • the sum of the squares of the values • the number of observations var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data of observations • μ = sum / count • var = (sum of squares / count) - μ2 Then var = ∑ (xi − μ)2 N ??? Vasiliki Kalavri | Boston University 2020 A simple and efficient synopsis Suppose that our data0 码力 | 74 页 | 1.06 MB | 1 年前3Cardinality and frequency estimation - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Boston University 2020 Let h be a hash function that maps each stream element into M = log2N bits, where N is the domain of input elements: For each element x, let rank(x) be the number of 0s in 2020 5 Let n be the number of distinct elements in the input stream so far and let R be the maximum value of rank(.) seen so far. ??? Vasiliki Kalavri | Boston University 2020 5 Let n be the number stream so far and let R be the maximum value of rank(.) seen so far. ̂n = 2R Claim: The maximum observed rank is a good estimate of log2n. In other words, the estimated number of distinct elements is equal0 码力 | 69 页 | 630.01 KB | 1 年前3
共 18 条
- 1
- 2