1. 14 7月, 2022 1 次提交
    • Y
      rob: optimize timing for commit and walk (#1644) · 6474c47f
      Yinan Xu 提交于
      * rob: separate walk and commit valid bits
      
      * rob: optimize instrCnt timing
      
      * rob: fix blockCommit condition when flushPipe
      
      When flushPipe is enabled, it will block commits in ROB. However,
      in the deqPtrModule, the commit is not blocked. This commit fixes
      the issue.
      6474c47f
  2. 13 7月, 2022 1 次提交
  3. 09 7月, 2022 1 次提交
    • Y
      decode: move fusion decoder result Mux to rename (#1631) · 0febc381
      Yinan Xu 提交于
      This commit moves the fusion decoder to both decode and rename stage.
      
      In the decode stage, fusion decoder determines whether the instruction
      pairs can be fused. Valid bits of decode are not affected by fusion
      decoder. This should fix the timing issues of rename.valid.
      
      In the rename stage, some fields are updated according the result of
      fusion decoder. This will bring a minor timing path to both valid and
      other fields in uop in the rename stage. However, since freelist and
      rat have worse timing. This should not cause timing issues.
      0febc381
  4. 06 7月, 2022 1 次提交
  5. 20 6月, 2022 1 次提交
  6. 20 12月, 2021 1 次提交
  7. 16 12月, 2021 1 次提交
  8. 15 12月, 2021 2 次提交
    • L
      Debug Mode: support difftest with spike (#1363) · f1c56d6c
      Li Qianruo 提交于
      * Debug Mode: support basic difftest with spike
      
      * Debug Mode: fix some bugs
      
      Bugs fixed are:
      1. All interrupts and exceptions cause debug mode to enter park loop
      2. Debug interrupt ignored due to flushPipe
      f1c56d6c
    • Y
      rename: add fused lui and load (#1356) · fd7603d9
      Yinan Xu 提交于
      This commit adds fused load support by bypassing LUI results to load.
      
      For better timing, detection is done at the rename stage. Imm is stored
      in psrc(1), psrc(0) and imm.
      fd7603d9
  9. 14 12月, 2021 1 次提交
  10. 10 12月, 2021 1 次提交
  11. 26 11月, 2021 1 次提交
  12. 23 11月, 2021 1 次提交
    • W
      mem,mdp: use robIdx instead of sqIdx (#1242) · 980c1bc3
      William Wang 提交于
      * mdp: implement SSIT with sram
      
      * mdp: use robIdx instead of sqIdx
      
      Dispatch refactor moves lsq enq to dispatch2, as a result, mdp can not
      get correct sqIdx in dispatch. Unlike robIdx, it is hard to maintain a
      "speculatively assigned" sqIdx, as it is hard to track store insts in
      dispatch queue. Yet we can still use "speculatively assigned" robIdx
      for memory dependency predictor.
      
      For now, memory dependency predictor uses "speculatively assigned"
      robIdx to track inflight store.
      
      However, sqIdx is still used to track those store which's addr is valid
      but data it not valid. When load insts try to get forward data from
      those store, load insts will get that store's sqIdx and wait in RS.
      They will not waken until store data with that sqIdx is issued.
      
      * mdp: add track robIdx recover logic
      980c1bc3
  13. 12 11月, 2021 1 次提交
    • Y
      difftest: add basic difftest features for releases (#1219) · cbe9a847
      Yinan Xu 提交于
      * difftest: add basic difftest features for releases
      
      This commit adds basic difftest features for every release, no matter
      it's for simulation or physical design. The macro SYNTHESIS is used to
      skip these logics when synthesizing the design. This commit aims at
      allowing designs for physical design to be verified.
      
      * bump ready-to-run
      
      * difftest: add int and fp writeback data
      cbe9a847
  14. 24 10月, 2021 1 次提交
  15. 23 10月, 2021 1 次提交
  16. 22 10月, 2021 1 次提交
    • Y
      rob: optimize bits width in storage (#1155) · c3abb8b6
      Yinan Xu 提交于
      This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits.
      
      * isFused is merged with commitType (2 bits reduced)
      * crossPageIPFFix is used only in ExceptionGen (1 bit reduced)
      * rename: reduce ldest usages
      * decode: set isMove to false if ldest is zero
      c3abb8b6
  17. 17 10月, 2021 1 次提交
    • Y
      backend: remove lsrc usages after rename (#1124) · a020ce37
      Yinan Xu 提交于
      This commit removes lsrc usages in the fence unit and lsrc is no longer
      needed after an instruction is renamed. It helps timing and area.
      
      lsrc is placed in imm at rename stage (the last stage we need lsrc).
      They are extracted in the fence unit. Imm needs to go through the
      pipelines because Jump needs it (and we re-use it for lsrc).
      a020ce37
  18. 16 10月, 2021 2 次提交
    • Y
      rename: support full-featured move elimination (#1123) · 70224bf6
      Yinan Xu 提交于
      This commit optimizes the move elimination implementation.
      
      Reference counting for every physical register is recorded. Originally
      0-31 registers have counters of ones. Every time the physical register
      is allocated or deallocated, the counter is increased or decreased by
      one. When the counter becomes zero from a non-zero value, the register
      is freed and released to freelist.
      70224bf6
    • Y
      core: use redirect ports for flush (#1121) · f4b2089a
      Yinan Xu 提交于
      This commit removes flush IO for every module. Flush now re-uses
      redirect ports to flush the instructions.
      f4b2089a
  19. 11 10月, 2021 1 次提交
    • Y
      bump chisel and code clean up (#1104) · aef67050
      Yinan Xu 提交于
      * bump chisel to 3.5.0-RC1
      
      We don't want to use SNAPSHOT version any more because we don't know
      what will happen when we wake up in the morning.
      
      * misc: remove TMA_* to avoid conflicts
      aef67050
  20. 10 10月, 2021 1 次提交
    • Y
      renameTable: optimize read and write timing (#1101) · 7fa2c198
      Yinan Xu 提交于
      This commit optimizes RenameTable's timing.
      
      Read addresses come from instruction buffer directly and has best
      timing. So we let data read at decode stage and bypass write data
      from this clock cycle to the read data at next cycle.
      
      For write, we latch the write request and process it at the next cycle.
      7fa2c198
  21. 28 9月, 2021 1 次提交
  22. 19 9月, 2021 1 次提交
    • Y
      core: add timer counters for important stages (#1045) · ebb8ebf8
      Yinan Xu 提交于
      This commit adds timer counters for some important pipeline stages,
      including rename, dispatch, dispatch2, select, issue, execute, commit.
      We add performance counters for different types of instructions to see
      the latency in different pipeline stages.
      ebb8ebf8
  23. 13 9月, 2021 1 次提交
  24. 09 9月, 2021 1 次提交
    • Y
      backend: support instruction fusion cases (#1011) · 88825c5c
      Yinan Xu 提交于
      This commit adds some simple instruction fusion cases in decode stage.
      Currently we only implement instruction pairs that can be fused into
      RV64GCB instructions.
      
      Instruction fusions are detected in the decode stage by FusionDecoder.
      The decoder checks every two instructions and marks the first
      instruction fused if they can be fused into one instruction. The second
      instruction is removed by setting the valid field to false.
      
      Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
      
      Currently, ftq in frontend needs every instruction to commit. However,
      the second instruction is removed from the pipeline and will not commit.
      To solve this issue, we temporarily add more bits to isFused to indicate
      the offset diff of the two fused instruction. There are four
      possibilities now. This feature may be removed later.
      
      This commit also adds more instruction fusion cases that need changes
      in both the decode stage and the funtion units. In this commit, we add
      some opcode to the function units and fuse the new instruction pairs
      into these new internal uops.
      
      The list of opcodes we add in this commit is shown below:
      - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
      - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
      - byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
      - sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
      - sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
      - sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
      - sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
      - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
      - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
      - orh48: mask off the first 16 bits and or with another operand
               (`andi r1, r0, -256`` + `or r1, r1, r2`)
      
      Furthermore, this commit adds some complex instruction fusion cases to
      the decode stage and function units. The complex instruction fusion cases
      are detected after the instructions are decoded into uop and their
      CtrlSignals are used for instruction fusion detection.
      
      We add the following complex instruction fusion cases:
      - addwbyte: addw and mask it with 0xff (extract the first byte)
      - addwbit: addw and mask it with 0x1 (extract the first bit)
      - logiclsb: logic operation and mask it with 0x1 (extract the first bit)
      - mulw7: andi 127 and mulw instructions.
              Input to mul is AND with 0x7f if mulw7 bit is set to true.
      88825c5c
  25. 06 9月, 2021 1 次提交
  26. 02 9月, 2021 1 次提交
  27. 25 8月, 2021 2 次提交
  28. 23 8月, 2021 1 次提交
  29. 22 8月, 2021 2 次提交
    • Y
      rename: [refactor] move free list into 'freelist' package · 39d3280e
      YikeZhou 提交于
              "trait" was used to improve code style
      parameters: use EnableIntMoveElim to control code generation
      [WIP] EnableIntMoveElim=false hasn't been tested
      39d3280e
    • Y
      backend, rename: performance bug fixed in move elimination process (#934) · d3975bec
      YikeZhou 提交于
      * Rename: add perf counter for move elimination
      [NOTE] There are three reasons why one ME is cancelled:
        1. counter reaching max value
        2. RAW dependency with former instruction
        3. 2 move instruction with same psrc in 1 cycle
      
      * Rename: add debug log + fix perf bug for move elim cancelation
      
      * AlternativeFreeList: parameterize width of counter
      
      * Rename:[bug fix] RAW conflict in meEnable decision
      (suppose former inst=i while latter inst=j, i does
      not have to be move instruction)
      d3975bec
  30. 21 8月, 2021 1 次提交
    • Y
      backend, rename: support move elimination (#920) · 8b8e745d
      YikeZhou 提交于
      * Bundle, Rename: Add some comments
      FreeList, RenameTable: Comment out unused variables
      
      * refcnt: Implement AdderTree for reference counter
      
      * build.sc: add testOne method for unit test
      
      * AdderTest: add testbench for Adder (passed)
      
      * AdderTree: Add testbench for AdderTree (passed)
      
      * ReferenceCounter: implement a 2-bit counter
      
      * Rename: remove redundant code
      
      * Rename: prepared for move elimination [WIP]
      
      * Roq: add eliminated move bit in roq entry;
        label elim move inst as writebacked
      AlternativeFreeList: new impl for int free list
      Rename: change io of free list
      Dispatch1: (todo) not send move to intDq
      Bundle: add eliminatedMove bit in roqCommitInfo, uop and debugio
      ReferenceCounter: add debug print msg
      
      * Dispatch1: [BUG FIX] not send move inst to IntDq
      
      * DecodeUnit: [BUG FIX] differentiate li from mv
      
      * Bug fix:
        1. Dispatch1: should not label pdest of move as busy in busy table
        2. Rename: use psrc0 to index bit vec isMax
        3. AlternativeFreeList: fix maxVec calculation logic and ref counter
           increment logic
      Besides, more debug info and assertions were added.
      
      * AlternativeFreeList Bug Fix:
        1. add redirect input - shouldn't allocate reg when redirect is
           valid
        2. handle duplicate preg in roqCommits in int free list
      
      * AlternativeFreeList: Fix value assignment race condition
      
      * Rename: Fix value assignment race condition too
      
      * RenameTable: refactor spec/arch table write process
      
      * Roq: Fix debug_exuData of move(addi) instruction
        (it was trash data before because move needn't enter exu)
      
      * Rename: change intFreeList's redirect process
        (by setting headPtr back) and flush process
      
      * ME: microbench & coremark & linux-hello passed
        1. DecodeUnit: treat `mv x,x` inst as non-move
        2. AlternativeFreeList: handle duplicate walk req correctly
        3. Roq: fix debug_exuData bug (make sure writeback that updates
      debug_exuData happens before ME instruction in program order)
      
      * AlternativeFreeList: License added
      build.sc: remove unused config
      Others: comments added
      
      * package rename: remove unused modules
      
      * Roq: Replace debug_prf with a cleaner fix method
      
      * Disp1/AltFL/Rename: del unnecessary white spaces
      
      * build.sc: change stack size
      AlternativeFreeList: turn off assertions
      
      * build.sc: change stack size for test
      8b8e745d
  31. 24 7月, 2021 1 次提交
  32. 04 6月, 2021 1 次提交
  33. 01 5月, 2021 1 次提交
  34. 19 4月, 2021 1 次提交
    • J
      Refactor parameters, SimTop and difftest (#753) · 2225d46e
      Jiawei Lin 提交于
      * difftest: use DPI-C to refactor difftest
      
      In this commit, difftest is refactored with DPI-C calls.
      There're a few reasons:
      (1) From Verilator's manual, DPI-C calls should be more efficient than accessing from dut_ptr.
      (2) DPI-C is cross-platform (Verilator, VCS, ...)
      (3) difftest APIs are splited from emu.cpp to possibly support more backend platforms
      (NEMU, Spike, ...)
      
      The performance at this commit is quite slower than the original emu.
      Performance issues will be fixed later.
      
      * [WIP] SimTop: try to use 'XSTop' as soc
      
      * CircularQueuePtr: ues F-bounded polymorphis instead implict helper
      
      * Refactor parameters & Clean up code
      
      * difftest: support basic difftest
      
      * Support diffetst in new sim top
      
      * Difftest; convert recode fmt to ieee754 when comparing fp regs
      
      * Difftest: pass sign-ext pc to dpic functions && fix exception pc
      
      * Debug: add int/exc inst wb to debug queue
      
      * Difftest: pass sign-ext pc to dpic functions && fix exception pc
      
      * Difftest: fix naive commit num limit
      Co-authored-by: NYinan Xu <xuyinan1997@gmail.com>
      Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
      2225d46e
  35. 25 3月, 2021 1 次提交
    • A
      Refactor XSPerf, now we have three XSPerf Functions. · 408a32b7
      Allen 提交于
      XSPerfAccumulate: sum up performance values.
      XSPerfHistogram: count the occurrence of performance values, split them
      into bins, so that we can estimate their distribution.
      XSPerfMax: get max of performance values.
      408a32b7
  36. 11 3月, 2021 1 次提交
    • Y
      Add support for a simple version of move elimination (#682) · aac4464e
      Yinan Xu 提交于
      In this commit, we add support for a simpler version of move elimination.
      
      The original instruction sequences are:
      move r1, r0
      add r2, r1, r3
      
      The optimized sequnces are:
      move pr1, pr0
      add pr2, pr0, pr3 # instead of add pr2, pr1, pr3
      
      In this way, add can be issued once r0 is ready and move seems to be eliminated.
      aac4464e