1. 19 9月, 2021 3 次提交
    • Y
      backend,rs: load balance for issue selection (#1048) · 7bb7bf3d
      Yinan Xu 提交于
      This commit adds load balance strategy in issue selection logic for
      reservation stations.
      
      Previously we have a load balance option in ExuBlock, but it cannot work
      if the function units have feedbacks to RS. In this commit it is
      removed.
      
      This commit adds a victim index option for oldestFirst. For LOAD, the
      first issue port has better performance and thus we set the victim index
      to 0. For other function units, we use the last issue port.
      7bb7bf3d
    • Y
      core: add timer counters for important stages (#1045) · ebb8ebf8
      Yinan Xu 提交于
      This commit adds timer counters for some important pipeline stages,
      including rename, dispatch, dispatch2, select, issue, execute, commit.
      We add performance counters for different types of instructions to see
      the latency in different pipeline stages.
      ebb8ebf8
    • Z
      ci: update RV64GCB workloads (#1047) · 5092a298
      zfw 提交于
      This PR replaces coremark, microbench, and all perfromence test workloads by corresponding RV64GCB workloads.
      5092a298
  2. 18 9月, 2021 1 次提交
  3. 17 9月, 2021 4 次提交
  4. 16 9月, 2021 2 次提交
    • Z
      c33a770f
    • Y
      backend,rs: add counters for critical wakeup sources (#1027) · b6c0697a
      Yinan Xu 提交于
      This commit adds critical_wakeup_*_* counters to indicate which function
      units wake up the instructions in RS. Previously we have wait_for_src_*
      counters but they cannot represent where the critical operand (the last
      waiting operand) comes from.
      
      We need these counters to optimize fast wakeup logic. If some
      instructions critically depend on some other instructions, we can think
      of how we can optimize the wakeup process.
      
      Furthermore, this commit also adds a specific counter for FMAs that
      wakeup other FMAs' third operand. This helps us to decide which strategy
      is used for FMA fast issue.
      b6c0697a
  5. 15 9月, 2021 2 次提交
  6. 14 9月, 2021 1 次提交
  7. 13 9月, 2021 2 次提交
    • Z
      MissQueue: fix bug in miss-merge logic (#1028) · ef90f6bd
      zhanglinjuan 提交于
      ef90f6bd
    • Y
      backend: clean up exception vector usages (#1026) · c88c3a2a
      Yinan Xu 提交于
      This commit cleans up exception vector usages in backend.
      
      Previously the exception vector will go through the pipeline with the
      uop. However, instructions with exceptions will enter ROB when they are
      dispatched. Thus, actually we don't need the exception vector when an
      instruction enters a function unit.
      
      * exceptionVec, flushPipe, replayInst are reset when an instruction
      enters function units.
      
      * For execution units that don't have exceptions, we reset their output
      exception vectors to avoid ROB to record them.
      
      * Move replayInst to CtrlSignals.
      c88c3a2a
  8. 12 9月, 2021 3 次提交
    • S
      Merge pull request #1025 from OpenXiangShan/false_hit_fix · 42ba7d8c
      Steve Gou 提交于
      BPU: Fix bug and significantly reduce false_hit 
      42ba7d8c
    • Y
      backend,rs: move select logic to stage 0 (#1023) · 64056bed
      Yinan Xu 提交于
      This commit moves issue select logic in reservation stations to stage 0
      from stage 1. It helps timing of stage 1, which load-to-load requires.
      
      Now, reservation stations have the following stages:
      
      * S0: enqueue and wakeup, select. Selection results are RegNext-ed.
      * S1: data/uop read and data bypass. Bypassed results are RegNext-ed.
      * S2: issue instructions to function units.
      64056bed
    • Y
      backend: add 3-bit shift fused instructions (#1022) · a792bcf1
      Yinan Xu 提交于
      This commit adds 3-bit shift fused instructions. When the program
      tries to add 8-byte index, these may be used.
      
      List of fused instructions added in this commit:
      
      * szewl3: `slli r1, r0, 32` + `srli r1, r0, 29`
      
      * sr29add: `srli r1, r0, 29` + `add r1, r1, r2`
      a792bcf1
  9. 11 9月, 2021 4 次提交
    • Z
      3ad99c7f
    • Z
      MissQueue: send GrantAck immediately after first beat of GrantData (#1013) · 59a7cc92
      zhanglinjuan 提交于
      * MissQueue: send GrantAck immediately after first beat of GrantData
      
      * MissQueue: add perf cnts
      
      * MissQueue: fix assertion failure in perf cnt
      
      * MissQueue: add perf cnts for proportion of load merge / load reject
      
      * MissQueue: add perf cnt
      
      * MissQueue: fix merge-conflict error
      59a7cc92
    • L
      mmu.l2tlb: add TimeOutAssert & cut down mem resp data buffer (#1021) · 9bd9cdfa
      Lemover 提交于
      * mmu.l2tlb: add object TimeOutAssert
      
      * mmu.l2tlb: add TimeOutAssert to Repeater
      
      * mmu.l2tlb: cut down mem req buffer from 8 ptes to 1 pte each
      
      * util: move some utils from MMUBundle to utils
      9bd9cdfa
    • Y
      rs,status: simplify logic to optimize timing (#1020) · c9ebdf90
      Yinan Xu 提交于
      This commit simplifies status logic in reservations stations. Module
      StatusArray is mostly rewritten.
      
      The following optimizations are applied:
      
      * Wakeup now has higher priority than enqueue. This reduces the length
      of the critical path of ALU back-to-back wakeup.
      
      * Don't compare fpWen/rfWen if the reservation station does not have
      float/int operands.
      
      * Ignore status.valid or redirect for srcState update. For data capture,
      these are necessary and not changed.
      
      * Remove blocked and scheduled conditions in issue logic when the
      reservation station does not have loadWait bit and feedback.
      c9ebdf90
  10. 10 9月, 2021 3 次提交
    • Z
      BPU: Fix bug that false hit in coremark 10 · 7f36ad77
      zoujr 提交于
      7f36ad77
    • J
      Use HuanCun instead of block-inclusive-cache (#1016) · a1ea7f76
      Jiawei Lin 提交于
      * misc: add submodule huancun
      
      * huancun: integrate huancun to SoC as L3
      
      * remove l2prefetcher
      
      * update huancun
      
      * Bump HuanCun
      
      * Use HuanCun instead old L2/L3
      
      * bump huancun
      
      * bump huancun
      
      * Set L3NBanks to 4
      
      * Update rocketchip
      
      * Bump huancun
      
      * Bump HuanCun
      
      * Optimize debug configs
      
      * Configs: fix L3 bug
      
      * Add TLLogger
      
      * TLLogger: fix release ack address
      
      * Support write prefix into database
      
      * Recoding more tilelink info
      
      * Add a database output format converter
      
      * missqueue: add difftest port for memory difftest during refill
      
      * misc: bump difftest
      
      * misc: bump difftest & huancun
      
      * missqueue: do not check refill data when get Grant
      
      * Add directory debug tool
      
      * config: increase client dir size for non-inclusive cache
      
      * Bump difftest and huancun
      
      * Update l2/l3 cache configs
      
      * Remove deprecated fpga/*
      
      * Remove cache test
      
      * Remove L2 preftecher
      
      * bump huancun
      
      * Params: turn on l2 prefetch by default
      
      * misc: remove duplicate chisel-tester2
      
      * misc: remove sifive inclusive cache
      
      * bump difftest
      
      * bump huancun
      
      * config: use 4MB L3 cache
      
      * bump huancun
      
      * bump difftest
      
      * bump difftest
      Co-authored-by: Nwangkaifan <wangkaifan@ict.ac.cn>
      Co-authored-by: NTangDan <tangdan@ict.ac.cn>
      a1ea7f76
    • Y
      backend, rs: parallelize selection and data read (#1018) · 66c2a07b
      Yinan Xu 提交于
      This commit changes how uop and data are read in reservation stations.
      It helps the issue timing.
      
      Previously, we access payload array and data array after we decide the
      instructions that we want to issue. This method makes issue selection
      and array access serialized and brings critial path.
      
      In this commit, we add one more read port to payload array and data
      array. This extra read port is for the oldest instruction. We decide
      whether to issue the oldest instruction and read uop/data
      simultaneously. This change reduces the critical path to each selection
      logic + read + Mux (previously it's selection + arbitration + read).
      
      Variable oldestOverride indicates whether we choose the oldest ready
      instruction instead of the normal selection. An oldestFirst option is
      added to RSParams to parameterize whether we need the age logic. By
      default, it is set to true unless the RS is for ALU. If the timing for
      aged ALU rs meets, we will enable it later.
      66c2a07b
  11. 09 9月, 2021 3 次提交
    • L
      mmu.l2tlb: partially rewrite fsm and miss queue for bug and optimization (#1007) · cc5a5f22
      Lemover 提交于
      * mmu.l2tlb: l2tlb now support multiple parallel mem accesses
      
      8 missqueue entry and 1 page table worker
      mq entry only supports page leaf entry
      ptw supports all the three level entries
      
      * mmu.tlb: fix bug of mq.refill_vpn and out.ready
      
      * mmu.tlb: fix bug of perf counter
      
      * mmu.tlb: l2tlb's l3 now 128 sets and 4 ways
      
      * mmu.tlb: miss queue now will 'merge' same mem req addr
      
      * mmu.l2tlb: ptw doesn't access last level pte
      
      * mmu.l2tlb: add mem req mask into ptw
      
      func block_decoupled doesn't work well and has bug in signal ready
      
      * mmu.l2tlb: fix bug of sfence to fsm
      
      add a new state s_check_pte to ptw
      fsm now take memPte from outside, doesn't store it inside
      mem_resp_valid will arrive a cycle before mem_resp_data
      
      * mmu.l2tlb: rm some state in fsm
      
      * mmu.tlb: set itlb default size
      
      * mmu.l2tlb: unkonwn mq wait bug, change code style to avoid it
      
      * mmu.l2tlb: opt, mq's entry with cache_l3 would not be blocked
      
      * mmu.l2tlb: add many time out assert
      
      * mmu.l2tlb: fix bug of mq enq state change & wait_id
      
      * Revert "mmu.tlb: l2tlb's l3 now 128 sets and 4 ways"
      
      This reverts commit 216e4192e4b01e68ce5502135318bc2473434907.
      
      * Revert "mmu.tlb: set itlb default size"
      
      This reverts commit 670bf1e408384964c601c0a55defbc767eb80698.
      
      * mmu.l2tlb: set miss queue size to 9 and set filter size to 8
      
      if they are equal, itlb may loss its req
      cc5a5f22
    • Y
      backend: support instruction fusion cases (#1011) · 88825c5c
      Yinan Xu 提交于
      This commit adds some simple instruction fusion cases in decode stage.
      Currently we only implement instruction pairs that can be fused into
      RV64GCB instructions.
      
      Instruction fusions are detected in the decode stage by FusionDecoder.
      The decoder checks every two instructions and marks the first
      instruction fused if they can be fused into one instruction. The second
      instruction is removed by setting the valid field to false.
      
      Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
      
      Currently, ftq in frontend needs every instruction to commit. However,
      the second instruction is removed from the pipeline and will not commit.
      To solve this issue, we temporarily add more bits to isFused to indicate
      the offset diff of the two fused instruction. There are four
      possibilities now. This feature may be removed later.
      
      This commit also adds more instruction fusion cases that need changes
      in both the decode stage and the funtion units. In this commit, we add
      some opcode to the function units and fuse the new instruction pairs
      into these new internal uops.
      
      The list of opcodes we add in this commit is shown below:
      - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
      - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
      - byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
      - sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
      - sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
      - sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
      - sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
      - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
      - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
      - orh48: mask off the first 16 bits and or with another operand
               (`andi r1, r0, -256`` + `or r1, r1, r2`)
      
      Furthermore, this commit adds some complex instruction fusion cases to
      the decode stage and function units. The complex instruction fusion cases
      are detected after the instructions are decoded into uop and their
      CtrlSignals are used for instruction fusion detection.
      
      We add the following complex instruction fusion cases:
      - addwbyte: addw and mask it with 0xff (extract the first byte)
      - addwbit: addw and mask it with 0x1 (extract the first bit)
      - logiclsb: logic operation and mask it with 0x1 (extract the first bit)
      - mulw7: andi 127 and mulw instructions.
              Input to mul is AND with 0x7f if mulw7 bit is set to true.
      88825c5c
    • L
      mmu.tlb: set itlb's and l2tlb's size (#1014) · fa086d5e
      Lemover 提交于
      * mmu.tlb: l2tlb's l3 now 128 sets and 4 ways
      
      * mmu.tlb: set itlb default size
      fa086d5e
  12. 08 9月, 2021 1 次提交
  13. 06 9月, 2021 4 次提交
  14. 05 9月, 2021 5 次提交
  15. 04 9月, 2021 2 次提交