1. 28 8月, 2019 2 次提交
  2. 14 9月, 2018 1 次提交
    • B
      optimizing throughput in Pulsar Presto connector (#2564) · 6ef7acaf
      Boyang Jerry Peng 提交于
      ### Motivation
      
      1. Currently, the presto pulsar connector will read synchronously from bookkeeper when it has run out of entries go process.  Basically, we process a batch of entries and then we read more.  Ideally should be doing reading and processing in parallel to increase throughput.
      
      2. Each split initializes their own ManagedLedgerFactory/Bookkeeper client.  We really just need one bookkeeper client to be shared among threads.
      
      ### Modifications
      1. Rewrote the logic in the Presto Pulsar connector to read async and process in parallel
      
      2. Cache ManagedLedgerFactory to be used across splits
      
      ### Result
      
      I see about 2X throughput improvement on single node as well as cluster (2 brokers, 3 bookies, 4 presto workers including coordinator) on AWS
      6ef7acaf
  3. 07 8月, 2018 1 次提交