1. 04 1月, 2022 1 次提交
  2. 01 1月, 2022 1 次提交
    • L
      bpu: timing optimizations · cb4f77ce
      Lingrui98 提交于
      * move statisical corrector to stage 3
      * add recover path in stage 3 for ras in case stage 2 falsely push or pop
      * let stage 2 has the highest physical priority in bpu
      * left ras broken for the next commit to fix
      cb4f77ce
  3. 30 12月, 2021 1 次提交
    • L
      ubtb: timing and performance optimizations · edc18578
      Lingrui98 提交于
      * timing: use single ported SRAMs, invalidating read responses on write
      * performance:
      -- shortening history length to accelerate training
      -- use a predictor to reduce s2_redirects on FTB not hit
      edc18578
  4. 24 12月, 2021 1 次提交
    • J
      IPrefetch: fix prefetchPtr stop problem (#1387) · e30430c2
      Jay 提交于
      * IPrefetch: fix prefetchPtr stop problem
      
      * This problem happens because prefetchPtr still exits when close IPrefetch
      
      * Fix PMP req port still be occupied even when ICache miss
      
      * Shut down IPrefetch
      
      * IPrefetch: fix Hint not set PreferCache bit
      
      * bump HuanCun
      e30430c2
  5. 23 12月, 2021 3 次提交
  6. 22 12月, 2021 1 次提交
    • J
      IPrefetch: fix prefetchPtr stop problem · ca4df9c2
      JinYue 提交于
      * This problem happens because prefetchPtr still exits when close IPrefetch
      
      * Fix PMP req port still be occupied even when ICache miss
      ca4df9c2
  7. 21 12月, 2021 1 次提交
  8. 20 12月, 2021 1 次提交
  9. 18 12月, 2021 2 次提交
  10. 17 12月, 2021 1 次提交
  11. 11 12月, 2021 1 次提交
    • Y
      core: delay csrCtrl for two cycles (#1336) · 6f688dac
      Yinan Xu 提交于
      This commit adds DelayN(2) to some CSR-related signals, including
      control bits to ITLB, DTLB, PTW, etc.
      
      To avoid accessing the ITLB before control bits change, we also need
      to delay the flush for two cycles. We assume branch misprediction or
      memory violation does not cause csrCtrl to change.
      6f688dac
  12. 10 12月, 2021 1 次提交
  13. 08 12月, 2021 1 次提交
  14. 03 12月, 2021 1 次提交
    • L
      bpu: timing optimizations · a229ab6c
      Lingrui98 提交于
      * let ubtb store full targets and fall through addresses
      * add some fields in BranchPrediction so that ifu requests can be solely derived from it
      a229ab6c
  15. 26 11月, 2021 1 次提交
    • L
      bpu: timing optimizations · 1ccea249
      Lingrui98 提交于
      * decouple fall through address calculating logic from the pftAddr interface
      * let ghr update from s1 has the highest priority
      * fix the physical priority of PhyPriorityMuxGenerator
      1ccea249
  16. 25 11月, 2021 1 次提交
  17. 18 11月, 2021 2 次提交
  18. 12 11月, 2021 1 次提交
  19. 11 11月, 2021 1 次提交
  20. 05 11月, 2021 1 次提交
  21. 23 10月, 2021 1 次提交
  22. 22 10月, 2021 2 次提交
    • L
      ftq: fix bugs when shareTailSlot is false · 710a8720
      Lingrui98 提交于
      710a8720
    • Y
      rob: optimize bits width in storage (#1155) · c3abb8b6
      Yinan Xu 提交于
      This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits.
      
      * isFused is merged with commitType (2 bits reduced)
      * crossPageIPFFix is used only in ExceptionGen (1 bit reduced)
      * rename: reduce ldest usages
      * decode: set isMove to false if ldest is zero
      c3abb8b6
  23. 20 10月, 2021 1 次提交
  24. 18 10月, 2021 3 次提交
  25. 16 10月, 2021 2 次提交
  26. 14 10月, 2021 1 次提交
  27. 28 9月, 2021 1 次提交
  28. 15 9月, 2021 1 次提交
  29. 09 9月, 2021 1 次提交
    • Y
      backend: support instruction fusion cases (#1011) · 88825c5c
      Yinan Xu 提交于
      This commit adds some simple instruction fusion cases in decode stage.
      Currently we only implement instruction pairs that can be fused into
      RV64GCB instructions.
      
      Instruction fusions are detected in the decode stage by FusionDecoder.
      The decoder checks every two instructions and marks the first
      instruction fused if they can be fused into one instruction. The second
      instruction is removed by setting the valid field to false.
      
      Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
      
      Currently, ftq in frontend needs every instruction to commit. However,
      the second instruction is removed from the pipeline and will not commit.
      To solve this issue, we temporarily add more bits to isFused to indicate
      the offset diff of the two fused instruction. There are four
      possibilities now. This feature may be removed later.
      
      This commit also adds more instruction fusion cases that need changes
      in both the decode stage and the funtion units. In this commit, we add
      some opcode to the function units and fuse the new instruction pairs
      into these new internal uops.
      
      The list of opcodes we add in this commit is shown below:
      - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
      - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
      - byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
      - sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
      - sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
      - sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
      - sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
      - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
      - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
      - orh48: mask off the first 16 bits and or with another operand
               (`andi r1, r0, -256`` + `or r1, r1, r2`)
      
      Furthermore, this commit adds some complex instruction fusion cases to
      the decode stage and function units. The complex instruction fusion cases
      are detected after the instructions are decoded into uop and their
      CtrlSignals are used for instruction fusion detection.
      
      We add the following complex instruction fusion cases:
      - addwbyte: addw and mask it with 0xff (extract the first byte)
      - addwbit: addw and mask it with 0x1 (extract the first bit)
      - logiclsb: logic operation and mask it with 0x1 (extract the first bit)
      - mulw7: andi 127 and mulw instructions.
              Input to mul is AND with 0x7f if mulw7 bit is set to true.
      88825c5c
  30. 03 9月, 2021 3 次提交