1. 09 10月, 2021 2 次提交
    • Y
      scheduler: support reading fp state from others (#1096) · 023cdb1e
      Yinan Xu 提交于
      This commit adds fpStateReadOut and fpStateReadIn ports to Scheduler to
      support reading fp reg states from other schedulers.
      
      It should have better timing because now ExuBlock(0) has only int
      regfile and busytable. This block does not need fp writeback any more.
      023cdb1e
    • L
      Srt16div Bug Fix (#1089) · f7e0356a
      Li Qianruo 提交于
      * Fix a div 1 bug
      * Fix a typo
      f7e0356a
  2. 04 10月, 2021 1 次提交
  3. 01 10月, 2021 1 次提交
    • Y
      core: update parameters and module organizations (#1080) · 2b4e8253
      Yinan Xu 提交于
      This commit moves load/store reservation stations into the first
      ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
      is also removed from CtrlBlock.
      
      Now the module organization becomes:
      * ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs
      * ExuBlock_1: Fp RS, Fp RF, Fp FUs
      * MemBlock: Load/Store FUs
      
      Besides, load queue has 80 entries and store queue has 64 entries now.
      2b4e8253
  4. 28 9月, 2021 3 次提交
    • Y
      rs: latch jump pc when deq is blocked (#1076) · 085b0af8
      Yinan Xu 提交于
      This commit fixes a bug that causes pc to be wrong values when a jump is
      blocked for issue and a new jump instruction enters reservation station.
      When the jump for issue is blocked, we should latch its pc value because
      the entry has been deallocated from rs (and pc no longer exists in the
      pc mem).
      085b0af8
    • Y
      configs, core: update some parameters (#1072) · 7154d65e
      Yinan Xu 提交于
      * change ROB to 256 entries
      * change physical register file to 192 entries
      * re-organize reservation stations, function units and regfile
      7154d65e
    • Y
      misc: code clean up (#1073) · 9aca92b9
      Yinan Xu 提交于
      * rename Roq to Rob
      
      * remove trailing whitespaces
      
      * remove unused parameters
      9aca92b9
  5. 27 9月, 2021 2 次提交
  6. 25 9月, 2021 2 次提交
    • Y
      backend: optimize aluOpType to 7 bits (#1061) · 675acc68
      Yinan Xu 提交于
      This commit optimizes ALUOpType to 7 bits. Alu timing will be checked
      later.
      
      We also apply some misc changes including:
      
      * Move REVB, PACK, PACKH, PACKW to ALU
      
      * Add fused logicZexth, addwZext, addwSexth
      
      * Add instruction fusion test cases to CI
      675acc68
    • Z
      Bmu: support zbk* instruction (#1059) · 07596dc6
      zfw 提交于
      * Bmu: support zbk* instructions
      
      * ci: add zbk* instruction test
      07596dc6
  7. 23 9月, 2021 1 次提交
    • L
      Integer SRT16 Divider (#1019) · a58e3351
      Li Qianruo 提交于
      * New SRT4 divider that may improve timing
      
      See "Digital reurrence dividers with reduced logical depth"
      
      * SRT16 Int Divider that is working properly
      
      * Fix bug related to div 1
      
      * Timing improved version of SRT16 int divider
      
      * Add copyright and made some minor changes
      
      * Fix bugs related to div 0
      
      * Fix another div 0 bug
      
      * Fix another special case bug
      a58e3351
  8. 22 9月, 2021 2 次提交
  9. 21 9月, 2021 1 次提交
  10. 20 9月, 2021 1 次提交
    • Y
      rs, fma: separate fadd and fmul issue (#1042) · 65e2f311
      Yinan Xu 提交于
      This commit splits FMA instructions into FMUL and FADD for execution.
      
      When the first two operands are ready, an FMA instruction can be issued
      and the intermediate result will be written back to RS after two cycles.
      Since RS currently has DataArray to store the operands, we reuse it to
      store the intermediate FMUL result.
      
      When an FMA enters deq stage and leaves RS with only two operands, we
      mark it as midState ready at this clock cycle T0.
      
      If the instruction's third operand becomes ready at T0, it can be
      selected at T1 and issued at T2, when FMUL is also finished. The
      intermediate result will be sent to FADD instead of writing back to RS.
      If the instruction's third operand becomes ready later, we have the data
      in DataArray or at DataArray's write port. Thus, it's ok to set midState
      ready at clock cycle T0.
      
      The separation of FMA instructions will increase issue pressure since RS
      needs to issue more times. However, it larges reduce FMA latency if many
      FMA instructions are waiting for the third operand.
      65e2f311
  11. 19 9月, 2021 4 次提交
    • Y
      backend,rs: load balance for issue selection (#1048) · 7bb7bf3d
      Yinan Xu 提交于
      This commit adds load balance strategy in issue selection logic for
      reservation stations.
      
      Previously we have a load balance option in ExuBlock, but it cannot work
      if the function units have feedbacks to RS. In this commit it is
      removed.
      
      This commit adds a victim index option for oldestFirst. For LOAD, the
      first issue port has better performance and thus we set the victim index
      to 0. For other function units, we use the last issue port.
      7bb7bf3d
    • Y
      backend, freelist: remove unused log & assertions · 20acd4ae
      YikeZhou 提交于
      20acd4ae
    • Y
      backend, freelist: modify free list allocatePhyReg logic · 8949e3b0
      YikeZhou 提交于
      1) generate ptr and preg in a vec first
      2) use renameEnable to replace common parts in allocating logic
      8949e3b0
    • Y
      core: add timer counters for important stages (#1045) · ebb8ebf8
      Yinan Xu 提交于
      This commit adds timer counters for some important pipeline stages,
      including rename, dispatch, dispatch2, select, issue, execute, commit.
      We add performance counters for different types of instructions to see
      the latency in different pipeline stages.
      ebb8ebf8
  12. 18 9月, 2021 1 次提交
  13. 17 9月, 2021 1 次提交
    • Y
      regfile: manually reset every registers (#1038) · 93b61a80
      Yinan Xu 提交于
      This commit adds manual reset for every register in Regfile. Previously
      the reset is done by add reset values to the registers. However,
      physically general-purpose register file does not have reset values.
      
      Since all the regfile always has the same writeback data, we don't need
      to explicitly assign reset data.
      93b61a80
  14. 16 9月, 2021 1 次提交
    • Y
      backend,rs: add counters for critical wakeup sources (#1027) · b6c0697a
      Yinan Xu 提交于
      This commit adds critical_wakeup_*_* counters to indicate which function
      units wake up the instructions in RS. Previously we have wait_for_src_*
      counters but they cannot represent where the critical operand (the last
      waiting operand) comes from.
      
      We need these counters to optimize fast wakeup logic. If some
      instructions critically depend on some other instructions, we can think
      of how we can optimize the wakeup process.
      
      Furthermore, this commit also adds a specific counter for FMAs that
      wakeup other FMAs' third operand. This helps us to decide which strategy
      is used for FMA fast issue.
      b6c0697a
  15. 15 9月, 2021 1 次提交
    • L
      mmu.tlb: ptw resp will refill both ld & st tlb (#1029) · bf08468c
      Lemover 提交于
      nothing changed but add one parameter to control if ldtlb and sttlb are the same
      now there two similar parameters:
      
      outReplace: when this is true, two ldtlb are 'same', two sttlb are 'same'
      refillBothTlb: when this is true, the four tlb are same(require outReplace to be true)
      
      * mmu.tlb: add param refillBothTlb to refill both ld & st tlb
      
      * mmu.tlb: set param refillBothTlb to false
      bf08468c
  16. 13 9月, 2021 2 次提交
    • Y
      backend, rename: elimination psrc directly from intRat · 0153cd55
      YikeZhou 提交于
      0153cd55
    • Y
      backend: clean up exception vector usages (#1026) · c88c3a2a
      Yinan Xu 提交于
      This commit cleans up exception vector usages in backend.
      
      Previously the exception vector will go through the pipeline with the
      uop. However, instructions with exceptions will enter ROB when they are
      dispatched. Thus, actually we don't need the exception vector when an
      instruction enters a function unit.
      
      * exceptionVec, flushPipe, replayInst are reset when an instruction
      enters function units.
      
      * For execution units that don't have exceptions, we reset their output
      exception vectors to avoid ROB to record them.
      
      * Move replayInst to CtrlSignals.
      c88c3a2a
  17. 12 9月, 2021 3 次提交
    • Y
      backend, rename: optimize MEFreeList free logic · 62d2a04b
      YikeZhou 提交于
      62d2a04b
    • Y
      backend,rs: move select logic to stage 0 (#1023) · 64056bed
      Yinan Xu 提交于
      This commit moves issue select logic in reservation stations to stage 0
      from stage 1. It helps timing of stage 1, which load-to-load requires.
      
      Now, reservation stations have the following stages:
      
      * S0: enqueue and wakeup, select. Selection results are RegNext-ed.
      * S1: data/uop read and data bypass. Bypassed results are RegNext-ed.
      * S2: issue instructions to function units.
      64056bed
    • Y
      backend: add 3-bit shift fused instructions (#1022) · a792bcf1
      Yinan Xu 提交于
      This commit adds 3-bit shift fused instructions. When the program
      tries to add 8-byte index, these may be used.
      
      List of fused instructions added in this commit:
      
      * szewl3: `slli r1, r0, 32` + `srli r1, r0, 29`
      
      * sr29add: `srli r1, r0, 29` + `add r1, r1, r2`
      a792bcf1
  18. 11 9月, 2021 1 次提交
    • Y
      rs,status: simplify logic to optimize timing (#1020) · c9ebdf90
      Yinan Xu 提交于
      This commit simplifies status logic in reservations stations. Module
      StatusArray is mostly rewritten.
      
      The following optimizations are applied:
      
      * Wakeup now has higher priority than enqueue. This reduces the length
      of the critical path of ALU back-to-back wakeup.
      
      * Don't compare fpWen/rfWen if the reservation station does not have
      float/int operands.
      
      * Ignore status.valid or redirect for srcState update. For data capture,
      these are necessary and not changed.
      
      * Remove blocked and scheduled conditions in issue logic when the
      reservation station does not have loadWait bit and feedback.
      c9ebdf90
  19. 10 9月, 2021 1 次提交
    • Y
      backend, rs: parallelize selection and data read (#1018) · 66c2a07b
      Yinan Xu 提交于
      This commit changes how uop and data are read in reservation stations.
      It helps the issue timing.
      
      Previously, we access payload array and data array after we decide the
      instructions that we want to issue. This method makes issue selection
      and array access serialized and brings critial path.
      
      In this commit, we add one more read port to payload array and data
      array. This extra read port is for the oldest instruction. We decide
      whether to issue the oldest instruction and read uop/data
      simultaneously. This change reduces the critical path to each selection
      logic + read + Mux (previously it's selection + arbitration + read).
      
      Variable oldestOverride indicates whether we choose the oldest ready
      instruction instead of the normal selection. An oldestFirst option is
      added to RSParams to parameterize whether we need the age logic. By
      default, it is set to true unless the RS is for ALU. If the timing for
      aged ALU rs meets, we will enable it later.
      66c2a07b
  20. 09 9月, 2021 1 次提交
    • Y
      backend: support instruction fusion cases (#1011) · 88825c5c
      Yinan Xu 提交于
      This commit adds some simple instruction fusion cases in decode stage.
      Currently we only implement instruction pairs that can be fused into
      RV64GCB instructions.
      
      Instruction fusions are detected in the decode stage by FusionDecoder.
      The decoder checks every two instructions and marks the first
      instruction fused if they can be fused into one instruction. The second
      instruction is removed by setting the valid field to false.
      
      Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
      
      Currently, ftq in frontend needs every instruction to commit. However,
      the second instruction is removed from the pipeline and will not commit.
      To solve this issue, we temporarily add more bits to isFused to indicate
      the offset diff of the two fused instruction. There are four
      possibilities now. This feature may be removed later.
      
      This commit also adds more instruction fusion cases that need changes
      in both the decode stage and the funtion units. In this commit, we add
      some opcode to the function units and fuse the new instruction pairs
      into these new internal uops.
      
      The list of opcodes we add in this commit is shown below:
      - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
      - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
      - byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
      - sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
      - sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
      - sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
      - sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
      - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
      - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
      - orh48: mask off the first 16 bits and or with another operand
               (`andi r1, r0, -256`` + `or r1, r1, r2`)
      
      Furthermore, this commit adds some complex instruction fusion cases to
      the decode stage and function units. The complex instruction fusion cases
      are detected after the instructions are decoded into uop and their
      CtrlSignals are used for instruction fusion detection.
      
      We add the following complex instruction fusion cases:
      - addwbyte: addw and mask it with 0xff (extract the first byte)
      - addwbit: addw and mask it with 0x1 (extract the first bit)
      - logiclsb: logic operation and mask it with 0x1 (extract the first bit)
      - mulw7: andi 127 and mulw instructions.
              Input to mul is AND with 0x7f if mulw7 bit is set to true.
      88825c5c
  21. 08 9月, 2021 1 次提交
  22. 06 9月, 2021 3 次提交
  23. 05 9月, 2021 3 次提交
  24. 04 9月, 2021 1 次提交