1. 30 11月, 2021 2 次提交
  2. 23 11月, 2021 1 次提交
    • W
      mem,mdp: use robIdx instead of sqIdx (#1242) · 980c1bc3
      William Wang 提交于
      * mdp: implement SSIT with sram
      
      * mdp: use robIdx instead of sqIdx
      
      Dispatch refactor moves lsq enq to dispatch2, as a result, mdp can not
      get correct sqIdx in dispatch. Unlike robIdx, it is hard to maintain a
      "speculatively assigned" sqIdx, as it is hard to track store insts in
      dispatch queue. Yet we can still use "speculatively assigned" robIdx
      for memory dependency predictor.
      
      For now, memory dependency predictor uses "speculatively assigned"
      robIdx to track inflight store.
      
      However, sqIdx is still used to track those store which's addr is valid
      but data it not valid. When load insts try to get forward data from
      those store, load insts will get that store's sqIdx and wait in RS.
      They will not waken until store data with that sqIdx is issued.
      
      * mdp: add track robIdx recover logic
      980c1bc3
  3. 22 10月, 2021 1 次提交
  4. 16 10月, 2021 2 次提交
    • Y
      core: use redirect ports for flush (#1121) · f4b2089a
      Yinan Xu 提交于
      This commit removes flush IO for every module. Flush now re-uses
      redirect ports to flush the instructions.
      f4b2089a
    • W
      Add strict mode to reduce mdp mispredict (#1113) · d1fe0262
      William Wang 提交于
      * storeset: fix waitForSqIdx generate logic
      
      Now right waitForSqIdx will be generated for earlier store in the same
      dispatch bundle.
      
      * mdp: add strict wait mode
      
      When loadWaitStrict && loadWaitBit, load will wait in rs until all
      older store addr calculation are finished.
      
      * chore: add storeset_load_strict_wait counter
      d1fe0262
  5. 12 10月, 2021 2 次提交
    • Y
      rs: add IOs for performance counters (#1109) · 485648fa
      Yinan Xu 提交于
      This commit adds IOs for performance counters in reservation stations.
      Only `full` is included for now.
      485648fa
    • W
      mem: update block load logic (#1035) · c7160cd3
      William Wang 提交于
      * mem: update block load logic
      
      Now load will be selected as soon as the store it depends on is ready,
      which is predicted by Store Sets
      
      * mem: opt block load logic
      
      Load blocked by std invalid will wait for that std to issue
      Load blocked by load violation wait for that sta to issue
      
      * csr: add 2 extra storeset config bits
      
      Following bits were added to slvpredctl:
      - storeset_wait_store
      - storeset_no_fast_wakeup
      
      * storeset: fix waitForSqIdx generate logic
      
      Now right waitForSqIdx will be generated for earlier store in the same
      dispatch bundle
      c7160cd3
  6. 11 10月, 2021 1 次提交
  7. 01 10月, 2021 1 次提交
    • Y
      core: update parameters and module organizations (#1080) · 2b4e8253
      Yinan Xu 提交于
      This commit moves load/store reservation stations into the first
      ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
      is also removed from CtrlBlock.
      
      Now the module organization becomes:
      * ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs
      * ExuBlock_1: Fp RS, Fp RF, Fp FUs
      * MemBlock: Load/Store FUs
      
      Besides, load queue has 80 entries and store queue has 64 entries now.
      2b4e8253
  8. 28 9月, 2021 2 次提交
    • Y
      rs: latch jump pc when deq is blocked (#1076) · 085b0af8
      Yinan Xu 提交于
      This commit fixes a bug that causes pc to be wrong values when a jump is
      blocked for issue and a new jump instruction enters reservation station.
      When the jump for issue is blocked, we should latch its pc value because
      the entry has been deallocated from rs (and pc no longer exists in the
      pc mem).
      085b0af8
    • Y
      misc: code clean up (#1073) · 9aca92b9
      Yinan Xu 提交于
      * rename Roq to Rob
      
      * remove trailing whitespaces
      
      * remove unused parameters
      9aca92b9
  9. 27 9月, 2021 1 次提交
    • Y
      rs: add pcMem to store pc for jalr instructions (#1064) · 1d83ceee
      Yinan Xu 提交于
      This commit adds storage for PC in JUMP reservation station. Jalr needs
      four operands now, including rs1, pc, jalr_target and imm. Since Jump
      currently stores two operands and imm, we have to allocate extra space
      to store the one more extra operand for jalr.
      
      It should be optimized later (possibly by reading jalr_target when
      issuing the instruction).
      
      This commit also adds regression check for PC usages. PC should not
      enter decode stage.
      1d83ceee
  10. 20 9月, 2021 1 次提交
    • Y
      rs, fma: separate fadd and fmul issue (#1042) · 65e2f311
      Yinan Xu 提交于
      This commit splits FMA instructions into FMUL and FADD for execution.
      
      When the first two operands are ready, an FMA instruction can be issued
      and the intermediate result will be written back to RS after two cycles.
      Since RS currently has DataArray to store the operands, we reuse it to
      store the intermediate FMUL result.
      
      When an FMA enters deq stage and leaves RS with only two operands, we
      mark it as midState ready at this clock cycle T0.
      
      If the instruction's third operand becomes ready at T0, it can be
      selected at T1 and issued at T2, when FMUL is also finished. The
      intermediate result will be sent to FADD instead of writing back to RS.
      If the instruction's third operand becomes ready later, we have the data
      in DataArray or at DataArray's write port. Thus, it's ok to set midState
      ready at clock cycle T0.
      
      The separation of FMA instructions will increase issue pressure since RS
      needs to issue more times. However, it larges reduce FMA latency if many
      FMA instructions are waiting for the third operand.
      65e2f311
  11. 19 9月, 2021 2 次提交
    • Y
      backend,rs: load balance for issue selection (#1048) · 7bb7bf3d
      Yinan Xu 提交于
      This commit adds load balance strategy in issue selection logic for
      reservation stations.
      
      Previously we have a load balance option in ExuBlock, but it cannot work
      if the function units have feedbacks to RS. In this commit it is
      removed.
      
      This commit adds a victim index option for oldestFirst. For LOAD, the
      first issue port has better performance and thus we set the victim index
      to 0. For other function units, we use the last issue port.
      7bb7bf3d
    • Y
      core: add timer counters for important stages (#1045) · ebb8ebf8
      Yinan Xu 提交于
      This commit adds timer counters for some important pipeline stages,
      including rename, dispatch, dispatch2, select, issue, execute, commit.
      We add performance counters for different types of instructions to see
      the latency in different pipeline stages.
      ebb8ebf8
  12. 12 9月, 2021 1 次提交
    • Y
      backend,rs: move select logic to stage 0 (#1023) · 64056bed
      Yinan Xu 提交于
      This commit moves issue select logic in reservation stations to stage 0
      from stage 1. It helps timing of stage 1, which load-to-load requires.
      
      Now, reservation stations have the following stages:
      
      * S0: enqueue and wakeup, select. Selection results are RegNext-ed.
      * S1: data/uop read and data bypass. Bypassed results are RegNext-ed.
      * S2: issue instructions to function units.
      64056bed
  13. 11 9月, 2021 1 次提交
    • Y
      rs,status: simplify logic to optimize timing (#1020) · c9ebdf90
      Yinan Xu 提交于
      This commit simplifies status logic in reservations stations. Module
      StatusArray is mostly rewritten.
      
      The following optimizations are applied:
      
      * Wakeup now has higher priority than enqueue. This reduces the length
      of the critical path of ALU back-to-back wakeup.
      
      * Don't compare fpWen/rfWen if the reservation station does not have
      float/int operands.
      
      * Ignore status.valid or redirect for srcState update. For data capture,
      these are necessary and not changed.
      
      * Remove blocked and scheduled conditions in issue logic when the
      reservation station does not have loadWait bit and feedback.
      c9ebdf90
  14. 10 9月, 2021 1 次提交
    • Y
      backend, rs: parallelize selection and data read (#1018) · 66c2a07b
      Yinan Xu 提交于
      This commit changes how uop and data are read in reservation stations.
      It helps the issue timing.
      
      Previously, we access payload array and data array after we decide the
      instructions that we want to issue. This method makes issue selection
      and array access serialized and brings critial path.
      
      In this commit, we add one more read port to payload array and data
      array. This extra read port is for the oldest instruction. We decide
      whether to issue the oldest instruction and read uop/data
      simultaneously. This change reduces the critical path to each selection
      logic + read + Mux (previously it's selection + arbitration + read).
      
      Variable oldestOverride indicates whether we choose the oldest ready
      instruction instead of the normal selection. An oldestFirst option is
      added to RSParams to parameterize whether we need the age logic. By
      default, it is set to true unless the RS is for ALU. If the timing for
      aged ALU rs meets, we will enable it later.
      66c2a07b
  15. 02 9月, 2021 1 次提交
    • Y
      rs,mem: support fast load-to-load wakeup and issue (#984) · 718f8a60
      Yinan Xu 提交于
      This PR adds support for fast load-to-load wakeup and issue. In load-to-load fast wakeup and issue, load-to-load latency is reduced to 2 cycles.
      
      Now a load instruction can wakeup another load instruction at LOAD stage 1. When the producer load instruction arrives at stage 2, the consumer load instruction is issued to load stage 0 and using data from the producer to generate load address.
      
      In reservation station, load can be dequeued from staged 1 when stage 2 does not have a valid instruction. If the fast load is not accepted, from the next cycle on, the load will dequeue as normal.
      
      Timing in reservation station (for imm read) and load unit (for writeback data selection) to be optimized later.
      
      * backend,rs: issue load one cycle earlier when possible
      
      This commit adds support for issuing load instructions one cycle
      earlier if the load instruction is wakeup by another load. An extra
      2-bit UInt is added to IO.
      
      * mem: add load to load addr fastpath framework
      
      * mem: enable load to load forward
      
      * mem: add load-load forward counter
      Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
      718f8a60
  16. 29 8月, 2021 1 次提交
    • Y
      rs,bypass: add left and right bypass strategy (#971) · 605f31fc
      Yinan Xu 提交于
      * rs,bypass: remove optBuf for valid bits
      
      * rs,bypass: add left and right bypass strategy
      
      This commit adds another bypass network implementation to optimize timing of the first stage of function units.
      
      In BypassNetworkLeft, we bypass data at the same cycle that function units write data back. This increases the length of the critical path of the last stage of function units but reduces the length of the critical path of the first stage of function units. Some function units that require a shorter stage zero, like LOAD, may use BypassNetworkLeft.
      
      In this commit, we set all bypass networks to the left style, but we will make it configurable depending on different function units in the future.
      605f31fc
  17. 27 8月, 2021 1 次提交
    • Y
      backend,fu: allow early arbitration via fastUopOut (#962) · f83b578a
      Yinan Xu 提交于
      This commit adds a fastUopOut option to function units. This allows the
      function units to give valid and uop one cycle before its output data is
      ready. FastUopOut lets writeback arbitration happen one cycle before
      data is ready and helps optimize the timing.
      
      Since some function units are not ready for this new feature, this
      commit adds a fastImplemented option to allow function units to have
      fastUopOut but the data is still at the same cycle as uop. This option
      will delay the data for one cycle and may cause performance degradation.
      FastImplemented should be true after function units support fastUopOut.
      f83b578a
  18. 25 8月, 2021 1 次提交
  19. 24 8月, 2021 1 次提交
    • Y
      backend, rs: add an age matrix to find the oldest instruction (#937) · 90923bd3
      Yinan Xu 提交于
      * backend, rs: add an age matrix to find the oldest instruction
      
      This commit adds an age matrix to reservation station to find
      the oldest instruction. This enables the RS to schedule the oldest
      instruction first.
      
      This commit also adda performance counter for oldest inst
      90923bd3
  20. 22 8月, 2021 1 次提交
  21. 21 8月, 2021 1 次提交
    • Y
      backend: separate store address and data (#921) · 85b4cd54
      Yinan Xu 提交于
      This commit separates store address and store data in backend, including both reservation stations and function units. This commit also changes how stIssuePtr is updated. stIssuePtr should only be updated when both store data and address issue. 
      85b4cd54
  22. 04 8月, 2021 1 次提交
  23. 25 7月, 2021 1 次提交
  24. 24 7月, 2021 2 次提交
  25. 18 7月, 2021 1 次提交
  26. 17 7月, 2021 3 次提交
  27. 16 7月, 2021 1 次提交
  28. 14 7月, 2021 1 次提交
    • Y
      backend: wrap all RS into a larger scheduler module (#880) · 66220144
      Yinan Xu 提交于
      This commit adds an non-parameterized scheduler containing all reservation stations.
      Now IntegerBlock, FloatBlock, MemBlock contain only function units.
      The Schduler connects dispatch with all function units.
      Parameterization to be added later.
      66220144
  29. 08 7月, 2021 1 次提交
    • Y
      backend: optimize dispatch and issue timing (#821) · c84ff7ef
      Yinan Xu 提交于
      * better select policy timing
      * unified RS enqueue ports for 4 ALUs
      * wrap imm extractor into a module
      * backend,rs: wrap dataArray in RawDataModuleTemplate
      * should only bypass data between the same addr when allocate.valid
      c84ff7ef
  30. 04 6月, 2021 1 次提交
  31. 27 5月, 2021 1 次提交
  32. 15 5月, 2021 1 次提交
    • Y
      backend,RS: rewrite RS to optimize timing (#812) · 5c7674fe
      Yinan Xu 提交于
      * test,vcs: call $finish when difftest fails
      
      * backend,RS: refactor with more submodules
      
      This commit rewrites the reservation station in a more configurable style.
      
      The new RS has not finished.
      - Support only integer instructions
      - Feedback from load/store instructions is not supported
      - Fast wakeup for multi-cycle instructions is not supported
      - Submodules are refined later
      
      * RS: use wakeup signals from arbiter.out
      
      * RS: support feedback and re-schedule when needed
      
      For load and store reservation stations, the instructions that left RS before may be
      replayed later.
      
      * test,vcs: check difftest_state and return on nemu trap instructions
      
      * backend,RS: support floating-point operands and delayed regfile read for store RS
      
      This commit adds support for floating-point instructions in reservation stations.
      Beside, currently fp data for store operands come a cycle later than int data. This
      feature is also supported.
      
      Currently the RS should be ready for any circumstances.
      
      * rs,status: don't trigger assertions when !status.valid
      
      * test,vcs: add +workload option to specify the ram init file
      
      * backend,rs: don't enqueue when redirect.valid or flush.valid
      
      * backend,rs: support wait bit that instruction waits until store issues
      
      This commit adds support for wait bit, which is mainly used in load and
      store reservation stations to delay instruction issue until the corresponding
      store instruction issued.
      
      * backend,RS: optimize timing
      
      This commit optimizes BypassNetwork and PayloadArray timing.
      
      - duplicate bypass mask to avoid too many FO4
      - use one-hot vec to get read data
      5c7674fe