1. 28 6月, 2022 1 次提交
  2. 22 6月, 2022 1 次提交
    • Y
      core: add buffers for function units across int/fp (#1590) · 5010f3fb
      Yinan Xu 提交于
      This commit adds a buffer after the function unit that operate across
      the integer block and the floating-point block, such as f2i and i2f.
      
      For example, previously the out.ready of f2i depends on whether
      mul/div/csr/jump has a valid instruction out, since f2i has lower
      priority than them. This ready back-propagates from the integer function
      units to the floating-point function units, and finally to the
      floating-point reservation stations (since f2i is fully pipelined).
      
      We add a buffer after the function unit to break this ready
      back-propagation. It incurs one more cycle of execution latency, but we
      leave it not-fully-optimized for now.
      
      Timing can be further optimized if we separates the int writeback and fp
      writeback in function units. In the current version, the ready of f2i
      affects the ready of f2f pipelines, which is unnecessary. This is the
      future work.
      5010f3fb
  3. 09 12月, 2021 1 次提交
    • Y
      core: refactor writeback parameters (#1327) · 6ab6918f
      Yinan Xu 提交于
      This commit adds WritebackSink and WritebackSource parameters for
      multiple modules. These traits hide implementation details from
      other modules by defining IO-related functions in modules.
      
      By using WritebackSink, ROB is able to choose the writeback sources.
      Now fflags and exceptions are connected from exe units to reduce write
      ports and optimize timing.
      
      Further optimizations on write-back to RS and better coding style to
      be added later.
      6ab6918f
  4. 16 10月, 2021 1 次提交
  5. 01 10月, 2021 1 次提交
    • Y
      core: update parameters and module organizations (#1080) · 2b4e8253
      Yinan Xu 提交于
      This commit moves load/store reservation stations into the first
      ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
      is also removed from CtrlBlock.
      
      Now the module organization becomes:
      * ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs
      * ExuBlock_1: Fp RS, Fp RF, Fp FUs
      * MemBlock: Load/Store FUs
      
      Besides, load queue has 80 entries and store queue has 64 entries now.
      2b4e8253
  6. 27 9月, 2021 1 次提交
  7. 20 9月, 2021 1 次提交
    • Y
      rs, fma: separate fadd and fmul issue (#1042) · 65e2f311
      Yinan Xu 提交于
      This commit splits FMA instructions into FMUL and FADD for execution.
      
      When the first two operands are ready, an FMA instruction can be issued
      and the intermediate result will be written back to RS after two cycles.
      Since RS currently has DataArray to store the operands, we reuse it to
      store the intermediate FMUL result.
      
      When an FMA enters deq stage and leaves RS with only two operands, we
      mark it as midState ready at this clock cycle T0.
      
      If the instruction's third operand becomes ready at T0, it can be
      selected at T1 and issued at T2, when FMUL is also finished. The
      intermediate result will be sent to FADD instead of writing back to RS.
      If the instruction's third operand becomes ready later, we have the data
      in DataArray or at DataArray's write port. Thus, it's ok to set midState
      ready at clock cycle T0.
      
      The separation of FMA instructions will increase issue pressure since RS
      needs to issue more times. However, it larges reduce FMA latency if many
      FMA instructions are waiting for the third operand.
      65e2f311
  8. 19 9月, 2021 1 次提交
    • Y
      backend,rs: load balance for issue selection (#1048) · 7bb7bf3d
      Yinan Xu 提交于
      This commit adds load balance strategy in issue selection logic for
      reservation stations.
      
      Previously we have a load balance option in ExuBlock, but it cannot work
      if the function units have feedbacks to RS. In this commit it is
      removed.
      
      This commit adds a victim index option for oldestFirst. For LOAD, the
      first issue port has better performance and thus we set the victim index
      to 0. For other function units, we use the last issue port.
      7bb7bf3d
  9. 13 9月, 2021 1 次提交
    • Y
      backend: clean up exception vector usages (#1026) · c88c3a2a
      Yinan Xu 提交于
      This commit cleans up exception vector usages in backend.
      
      Previously the exception vector will go through the pipeline with the
      uop. However, instructions with exceptions will enter ROB when they are
      dispatched. Thus, actually we don't need the exception vector when an
      instruction enters a function unit.
      
      * exceptionVec, flushPipe, replayInst are reset when an instruction
      enters function units.
      
      * For execution units that don't have exceptions, we reset their output
      exception vectors to avoid ROB to record them.
      
      * Move replayInst to CtrlSignals.
      c88c3a2a
  10. 05 9月, 2021 1 次提交
    • Y
      backend,exu: load balance between issue ports (#947) · bd278897
      Yinan Xu 提交于
      This commit adds support for load balance between different issue ports
      when the function unit is not pipelined and the reservation station has
      more than one issue ports.
      
      We use a ping pong bit to decide which port to issue the instruction. At
      every clock cycle, the bit is flipped.
      bd278897
  11. 03 9月, 2021 1 次提交
  12. 01 9月, 2021 2 次提交
  13. 31 8月, 2021 1 次提交
    • Y
      backend,exu: connect writeback when possible (#977) · dd381594
      Yinan Xu 提交于
      This commit optimizes ExuBlock timing by connecting writeback when
      possible.
      
      The timing priorities are RegNext(rs.fastUopOut) > fu.writeback >
      arbiter.out(--> io.rfWriteback --> rs.writeback). The higher priority,
      the better timing.
      
      (1) When function units have exclusive writeback ports, their
      wakeup ports for reservation stations can be connected directly from
      function units' writeback ports. Special case: when the function unit
      has fastUopOut, valid and uop should be RegNext.
      
      (2) If the reservation station has fastUopOut for all instructions
      in this exu, we should replace io.fuWriteback with RegNext(fastUopOut).
      In this case, the corresponding execution units must have exclusive
      writeback ports, unless it's impossible that rs can ensure the
      instruction is able to write the regfile.
      
      (3) If the reservation station has fastUopOut for all instructions in
      this exu, we should replace io.rfWriteback (rs.writeback) with
      RegNext(rs.wakeupOut).
      dd381594
  14. 27 8月, 2021 1 次提交
    • Y
      backend,fu: allow early arbitration via fastUopOut (#962) · f83b578a
      Yinan Xu 提交于
      This commit adds a fastUopOut option to function units. This allows the
      function units to give valid and uop one cycle before its output data is
      ready. FastUopOut lets writeback arbitration happen one cycle before
      data is ready and helps optimize the timing.
      
      Since some function units are not ready for this new feature, this
      commit adds a fastImplemented option to allow function units to have
      fastUopOut but the data is still at the same cycle as uop. This option
      will delay the data for one cycle and may cause performance degradation.
      FastImplemented should be true after function units support fastUopOut.
      f83b578a
  15. 25 8月, 2021 1 次提交
  16. 23 8月, 2021 1 次提交
  17. 21 8月, 2021 1 次提交
    • Y
      backend: separate store address and data (#921) · 85b4cd54
      Yinan Xu 提交于
      This commit separates store address and store data in backend, including both reservation stations and function units. This commit also changes how stIssuePtr is updated. stIssuePtr should only be updated when both store data and address issue. 
      85b4cd54
  18. 04 8月, 2021 1 次提交
  19. 24 7月, 2021 1 次提交
  20. 17 7月, 2021 1 次提交
  21. 16 7月, 2021 1 次提交
  22. 04 6月, 2021 1 次提交
  23. 15 5月, 2021 1 次提交
    • Y
      backend,RS: rewrite RS to optimize timing (#812) · 5c7674fe
      Yinan Xu 提交于
      * test,vcs: call $finish when difftest fails
      
      * backend,RS: refactor with more submodules
      
      This commit rewrites the reservation station in a more configurable style.
      
      The new RS has not finished.
      - Support only integer instructions
      - Feedback from load/store instructions is not supported
      - Fast wakeup for multi-cycle instructions is not supported
      - Submodules are refined later
      
      * RS: use wakeup signals from arbiter.out
      
      * RS: support feedback and re-schedule when needed
      
      For load and store reservation stations, the instructions that left RS before may be
      replayed later.
      
      * test,vcs: check difftest_state and return on nemu trap instructions
      
      * backend,RS: support floating-point operands and delayed regfile read for store RS
      
      This commit adds support for floating-point instructions in reservation stations.
      Beside, currently fp data for store operands come a cycle later than int data. This
      feature is also supported.
      
      Currently the RS should be ready for any circumstances.
      
      * rs,status: don't trigger assertions when !status.valid
      
      * test,vcs: add +workload option to specify the ram init file
      
      * backend,rs: don't enqueue when redirect.valid or flush.valid
      
      * backend,rs: support wait bit that instruction waits until store issues
      
      This commit adds support for wait bit, which is mainly used in load and
      store reservation stations to delay instruction issue until the corresponding
      store instruction issued.
      
      * backend,RS: optimize timing
      
      This commit optimizes BypassNetwork and PayloadArray timing.
      
      - duplicate bypass mask to avoid too many FO4
      - use one-hot vec to get read data
      5c7674fe
  24. 09 5月, 2021 1 次提交
  25. 06 5月, 2021 1 次提交
    • L
      Backend: add mul to fast wakeup (#769) · 22deac3a
      Lemover 提交于
      * [WIP] Backend: add mul to fast wake-up
      
      * Backend: handle mul wb priority and fix wrong delay
      
      * RS: devide fastwakeup and nonBlocked(they were binded)
      22deac3a
  26. 29 4月, 2021 1 次提交
  27. 19 4月, 2021 1 次提交
    • J
      Refactor parameters, SimTop and difftest (#753) · 2225d46e
      Jiawei Lin 提交于
      * difftest: use DPI-C to refactor difftest
      
      In this commit, difftest is refactored with DPI-C calls.
      There're a few reasons:
      (1) From Verilator's manual, DPI-C calls should be more efficient than accessing from dut_ptr.
      (2) DPI-C is cross-platform (Verilator, VCS, ...)
      (3) difftest APIs are splited from emu.cpp to possibly support more backend platforms
      (NEMU, Spike, ...)
      
      The performance at this commit is quite slower than the original emu.
      Performance issues will be fixed later.
      
      * [WIP] SimTop: try to use 'XSTop' as soc
      
      * CircularQueuePtr: ues F-bounded polymorphis instead implict helper
      
      * Refactor parameters & Clean up code
      
      * difftest: support basic difftest
      
      * Support diffetst in new sim top
      
      * Difftest; convert recode fmt to ieee754 when comparing fp regs
      
      * Difftest: pass sign-ext pc to dpic functions && fix exception pc
      
      * Debug: add int/exc inst wb to debug queue
      
      * Difftest: pass sign-ext pc to dpic functions && fix exception pc
      
      * Difftest: fix naive commit num limit
      Co-authored-by: NYinan Xu <xuyinan1997@gmail.com>
      Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
      2225d46e
  28. 26 2月, 2021 1 次提交
  29. 22 2月, 2021 1 次提交
  30. 26 1月, 2021 1 次提交
  31. 25 1月, 2021 2 次提交
  32. 24 1月, 2021 1 次提交
  33. 22 1月, 2021 1 次提交
  34. 21 1月, 2021 1 次提交
  35. 17 1月, 2021 1 次提交
  36. 14 1月, 2021 2 次提交
  37. 13 1月, 2021 1 次提交