1. 17 9月, 2021 1 次提交
    • Y
      regfile: manually reset every registers (#1038) · 93b61a80
      Yinan Xu 提交于
      This commit adds manual reset for every register in Regfile. Previously
      the reset is done by add reset values to the registers. However,
      physically general-purpose register file does not have reset values.
      
      Since all the regfile always has the same writeback data, we don't need
      to explicitly assign reset data.
      93b61a80
  2. 16 9月, 2021 1 次提交
    • Y
      backend,rs: add counters for critical wakeup sources (#1027) · b6c0697a
      Yinan Xu 提交于
      This commit adds critical_wakeup_*_* counters to indicate which function
      units wake up the instructions in RS. Previously we have wait_for_src_*
      counters but they cannot represent where the critical operand (the last
      waiting operand) comes from.
      
      We need these counters to optimize fast wakeup logic. If some
      instructions critically depend on some other instructions, we can think
      of how we can optimize the wakeup process.
      
      Furthermore, this commit also adds a specific counter for FMAs that
      wakeup other FMAs' third operand. This helps us to decide which strategy
      is used for FMA fast issue.
      b6c0697a
  3. 15 9月, 2021 1 次提交
    • L
      mmu.tlb: ptw resp will refill both ld & st tlb (#1029) · bf08468c
      Lemover 提交于
      nothing changed but add one parameter to control if ldtlb and sttlb are the same
      now there two similar parameters:
      
      outReplace: when this is true, two ldtlb are 'same', two sttlb are 'same'
      refillBothTlb: when this is true, the four tlb are same(require outReplace to be true)
      
      * mmu.tlb: add param refillBothTlb to refill both ld & st tlb
      
      * mmu.tlb: set param refillBothTlb to false
      bf08468c
  4. 13 9月, 2021 1 次提交
    • Y
      backend: clean up exception vector usages (#1026) · c88c3a2a
      Yinan Xu 提交于
      This commit cleans up exception vector usages in backend.
      
      Previously the exception vector will go through the pipeline with the
      uop. However, instructions with exceptions will enter ROB when they are
      dispatched. Thus, actually we don't need the exception vector when an
      instruction enters a function unit.
      
      * exceptionVec, flushPipe, replayInst are reset when an instruction
      enters function units.
      
      * For execution units that don't have exceptions, we reset their output
      exception vectors to avoid ROB to record them.
      
      * Move replayInst to CtrlSignals.
      c88c3a2a
  5. 12 9月, 2021 2 次提交
    • Y
      backend,rs: move select logic to stage 0 (#1023) · 64056bed
      Yinan Xu 提交于
      This commit moves issue select logic in reservation stations to stage 0
      from stage 1. It helps timing of stage 1, which load-to-load requires.
      
      Now, reservation stations have the following stages:
      
      * S0: enqueue and wakeup, select. Selection results are RegNext-ed.
      * S1: data/uop read and data bypass. Bypassed results are RegNext-ed.
      * S2: issue instructions to function units.
      64056bed
    • Y
      backend: add 3-bit shift fused instructions (#1022) · a792bcf1
      Yinan Xu 提交于
      This commit adds 3-bit shift fused instructions. When the program
      tries to add 8-byte index, these may be used.
      
      List of fused instructions added in this commit:
      
      * szewl3: `slli r1, r0, 32` + `srli r1, r0, 29`
      
      * sr29add: `srli r1, r0, 29` + `add r1, r1, r2`
      a792bcf1
  6. 11 9月, 2021 1 次提交
    • Y
      rs,status: simplify logic to optimize timing (#1020) · c9ebdf90
      Yinan Xu 提交于
      This commit simplifies status logic in reservations stations. Module
      StatusArray is mostly rewritten.
      
      The following optimizations are applied:
      
      * Wakeup now has higher priority than enqueue. This reduces the length
      of the critical path of ALU back-to-back wakeup.
      
      * Don't compare fpWen/rfWen if the reservation station does not have
      float/int operands.
      
      * Ignore status.valid or redirect for srcState update. For data capture,
      these are necessary and not changed.
      
      * Remove blocked and scheduled conditions in issue logic when the
      reservation station does not have loadWait bit and feedback.
      c9ebdf90
  7. 10 9月, 2021 1 次提交
    • Y
      backend, rs: parallelize selection and data read (#1018) · 66c2a07b
      Yinan Xu 提交于
      This commit changes how uop and data are read in reservation stations.
      It helps the issue timing.
      
      Previously, we access payload array and data array after we decide the
      instructions that we want to issue. This method makes issue selection
      and array access serialized and brings critial path.
      
      In this commit, we add one more read port to payload array and data
      array. This extra read port is for the oldest instruction. We decide
      whether to issue the oldest instruction and read uop/data
      simultaneously. This change reduces the critical path to each selection
      logic + read + Mux (previously it's selection + arbitration + read).
      
      Variable oldestOverride indicates whether we choose the oldest ready
      instruction instead of the normal selection. An oldestFirst option is
      added to RSParams to parameterize whether we need the age logic. By
      default, it is set to true unless the RS is for ALU. If the timing for
      aged ALU rs meets, we will enable it later.
      66c2a07b
  8. 09 9月, 2021 1 次提交
    • Y
      backend: support instruction fusion cases (#1011) · 88825c5c
      Yinan Xu 提交于
      This commit adds some simple instruction fusion cases in decode stage.
      Currently we only implement instruction pairs that can be fused into
      RV64GCB instructions.
      
      Instruction fusions are detected in the decode stage by FusionDecoder.
      The decoder checks every two instructions and marks the first
      instruction fused if they can be fused into one instruction. The second
      instruction is removed by setting the valid field to false.
      
      Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
      
      Currently, ftq in frontend needs every instruction to commit. However,
      the second instruction is removed from the pipeline and will not commit.
      To solve this issue, we temporarily add more bits to isFused to indicate
      the offset diff of the two fused instruction. There are four
      possibilities now. This feature may be removed later.
      
      This commit also adds more instruction fusion cases that need changes
      in both the decode stage and the funtion units. In this commit, we add
      some opcode to the function units and fuse the new instruction pairs
      into these new internal uops.
      
      The list of opcodes we add in this commit is shown below:
      - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
      - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
      - byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
      - sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
      - sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
      - sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
      - sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
      - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
      - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
      - orh48: mask off the first 16 bits and or with another operand
               (`andi r1, r0, -256`` + `or r1, r1, r2`)
      
      Furthermore, this commit adds some complex instruction fusion cases to
      the decode stage and function units. The complex instruction fusion cases
      are detected after the instructions are decoded into uop and their
      CtrlSignals are used for instruction fusion detection.
      
      We add the following complex instruction fusion cases:
      - addwbyte: addw and mask it with 0xff (extract the first byte)
      - addwbit: addw and mask it with 0x1 (extract the first bit)
      - logiclsb: logic operation and mask it with 0x1 (extract the first bit)
      - mulw7: andi 127 and mulw instructions.
              Input to mul is AND with 0x7f if mulw7 bit is set to true.
      88825c5c
  9. 08 9月, 2021 1 次提交
  10. 06 9月, 2021 2 次提交
  11. 05 9月, 2021 3 次提交
  12. 04 9月, 2021 1 次提交
  13. 03 9月, 2021 3 次提交
  14. 02 9月, 2021 5 次提交
    • L
      l0tlb: add a new level tlb, a load tlb and a store tlb (#961) · a0301c0d
      Lemover 提交于
      * Revert "Revert "l0tlb: add a new level tlb to each mem pipeline (#936)" (#945)"
      
      This reverts commit b052b972.
      
      * fu: remove unused import
      
      * mmu.tlb: 2 load/store pipeline has 1 dtlb
      
      * mmu: remove btlb, the l1-tlb
      
      * mmu: set split-tlb to 32 to check perf effect
      
      * mmu: wrap tlb's param with TLBParameters
      
      * mmu: add params 'useBTlb'
      
      dtlb size is small: normal 8, super 2
      
      * mmu.tlb: add Bundle TlbEntry, simplify tlb hit logic(coding)
      
      * mmu.tlb: seperate tlb's storage, relative hit/sfence logic
      
      tlb now supports full-associate, set-associate, directive-associate.
      more: change tlb's parameter usage, change util.Random to support
      case that mod is 1.
      
      * mmu.tlb: support normalAsVictim, super(fa) -> normal(sa/da)
      
      be carefull to use tlb's parameter, only a part of param combination
      is supported
      
      * mmu.tlb: fix bug of hit method and victim write
      
      * mmu.tlb: add tlb storage's perf counter
      
      * mmu.tlb: rewrite replace part, support set or non-set
      
      * mmu.tlb: add param outReplace to receive out replace index
      
      * mmu.tlb: change param superSize to superNWays
      
      add param superNSets, which should always be 1
      
      * mmu.tlb: change some perf counter's name and change some params
      
      * mmu.tlb: fix bug of replace io bundle
      
      * mmu.tlb: remove unused signal wayIdx in tlbstorageio
      
      * mmu.tlb: separate tlb_ld/st into two 'same' tlb
      
      * mmu.tlb: when nWays is 1, replace returns 0.U
      
      before, replace will return 1.U, no influence for refill but bad
      for perf counter
      
      * mmu.tlb: give tlb_ld and tlb_st a name (in waveform)
      a0301c0d
    • W
      chore: fix frontend / memblock merge conflict · 588e93e0
      William Wang 提交于
      588e93e0
    • W
      chore: fix frontend / memblock merge conflict · 154904ce
      William Wang 提交于
      154904ce
    • Y
      rs,mem: support fast load-to-load wakeup and issue (#984) · 718f8a60
      Yinan Xu 提交于
      This PR adds support for fast load-to-load wakeup and issue. In load-to-load fast wakeup and issue, load-to-load latency is reduced to 2 cycles.
      
      Now a load instruction can wakeup another load instruction at LOAD stage 1. When the producer load instruction arrives at stage 2, the consumer load instruction is issued to load stage 0 and using data from the producer to generate load address.
      
      In reservation station, load can be dequeued from staged 1 when stage 2 does not have a valid instruction. If the fast load is not accepted, from the next cycle on, the load will dequeue as normal.
      
      Timing in reservation station (for imm read) and load unit (for writeback data selection) to be optimized later.
      
      * backend,rs: issue load one cycle earlier when possible
      
      This commit adds support for issuing load instructions one cycle
      earlier if the load instruction is wakeup by another load. An extra
      2-bit UInt is added to IO.
      
      * mem: add load to load addr fastpath framework
      
      * mem: enable load to load forward
      
      * mem: add load-load forward counter
      Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
      718f8a60
    • Y
      Rename: fix doAllocate logic in refactored version · 4efb89cb
      YikeZhou 提交于
      MEFreeList: remove useless code + give specified
      (instead of DontCare) value to phy reg allocated port
      4efb89cb
  15. 01 9月, 2021 4 次提交
  16. 31 8月, 2021 3 次提交
    • J
      fudian: The new floating-point lib to replace hardfloat (#975) · dc597826
      Jiawei Lin 提交于
      * Add submodule 'fudian'
      
      * IntToFP: use fudian
      
      * FMA: use fudian.CMA
      
      * FPToInt: remove recode format
      dc597826
    • Z
      Alu: optimize timing for bitmanip (#979) · 28c18878
      zfw 提交于
      * Alu: optimize timing
      
      This pull request optimizes timing by adding a 32bit adder for addw and changing the encode.
      28c18878
    • Y
      backend,exu: connect writeback when possible (#977) · dd381594
      Yinan Xu 提交于
      This commit optimizes ExuBlock timing by connecting writeback when
      possible.
      
      The timing priorities are RegNext(rs.fastUopOut) > fu.writeback >
      arbiter.out(--> io.rfWriteback --> rs.writeback). The higher priority,
      the better timing.
      
      (1) When function units have exclusive writeback ports, their
      wakeup ports for reservation stations can be connected directly from
      function units' writeback ports. Special case: when the function unit
      has fastUopOut, valid and uop should be RegNext.
      
      (2) If the reservation station has fastUopOut for all instructions
      in this exu, we should replace io.fuWriteback with RegNext(fastUopOut).
      In this case, the corresponding execution units must have exclusive
      writeback ports, unless it's impossible that rs can ensure the
      instruction is able to write the regfile.
      
      (3) If the reservation station has fastUopOut for all instructions in
      this exu, we should replace io.rfWriteback (rs.writeback) with
      RegNext(rs.wakeupOut).
      dd381594
  17. 30 8月, 2021 2 次提交
  18. 29 8月, 2021 1 次提交
    • Y
      rs,bypass: add left and right bypass strategy (#971) · 605f31fc
      Yinan Xu 提交于
      * rs,bypass: remove optBuf for valid bits
      
      * rs,bypass: add left and right bypass strategy
      
      This commit adds another bypass network implementation to optimize timing of the first stage of function units.
      
      In BypassNetworkLeft, we bypass data at the same cycle that function units write data back. This increases the length of the critical path of the last stage of function units but reduces the length of the critical path of the first stage of function units. Some function units that require a shorter stage zero, like LOAD, may use BypassNetworkLeft.
      
      In this commit, we set all bypass networks to the left style, but we will make it configurable depending on different function units in the future.
      605f31fc
  19. 28 8月, 2021 1 次提交
    • Y
      rs,age: optimize timing for output (#970) · 9bc8f3e1
      Yinan Xu 提交于
      This commit changes how io.out is computed for age detector. We use a
      register to keep track of the position of the oldest instruction. Since
      the updating information has better timing than issue, this could
      optimize the timing of issue logic.
      9bc8f3e1
  20. 27 8月, 2021 2 次提交
    • Y
      rs,age: use less registers for age matrix (#964) · 38683dba
      Yinan Xu 提交于
      This commit reduces register usage in age detector via using the
      upper matrix only. Since the age matrix is symmetric, age(i)(j)
      equals !age(j)(i). Besides, age(i)(i) is the same as valid(i).
      Thus, we also remove validVec in this commit.
      38683dba
    • Y
      backend,fu: allow early arbitration via fastUopOut (#962) · f83b578a
      Yinan Xu 提交于
      This commit adds a fastUopOut option to function units. This allows the
      function units to give valid and uop one cycle before its output data is
      ready. FastUopOut lets writeback arbitration happen one cycle before
      data is ready and helps optimize the timing.
      
      Since some function units are not ready for this new feature, this
      commit adds a fastImplemented option to allow function units to have
      fastUopOut but the data is still at the same cycle as uop. This option
      will delay the data for one cycle and may cause performance degradation.
      FastImplemented should be true after function units support fastUopOut.
      f83b578a
  21. 26 8月, 2021 2 次提交
  22. 25 8月, 2021 1 次提交