1. 02 1月, 2023 2 次提交
  2. 28 12月, 2022 1 次提交
    • H
      lq: Remove LQ data (#1862) · 683c1411
      happy-lx 提交于
      This PR remove data in lq.
      
      All cache miss load instructions will be replayed by lq, and the forward path to the D channel
      and mshr is added to the pipeline.
      Special treatment is made for uncache load. The data is no longer stored in the datamodule
      but stored in a separate register. ldout is only used as uncache writeback, and only ldout0
      will be used. Adjust the priority so that the replayed instruction has the highest priority in S0.
      
      Future work:
      1. fix `milc` perf loss
      2. remove data from MSHRs
      
      * difftest: monitor cache miss latency
      
      * lq, ldu, dcache: remove lq's data
      
      * lq's data is no longer used
      * replay cache miss load from lq (use counter to delay)
      * if dcache's mshr gets refill data, wake up lq's missed load
      * uncache load will writeback to ldu using ldout_0
      * ldout_1 is no longer used
      
      * lq, ldu: add forward port
      
      * forward D and mshr in load S1, get result in S2
      * remove useless code logic in loadQueueData
      
      * misc: revert monitor
      683c1411
  3. 25 12月, 2022 1 次提交
  4. 21 12月, 2022 2 次提交
  5. 15 12月, 2022 1 次提交
  6. 11 12月, 2022 1 次提交
  7. 08 12月, 2022 1 次提交
  8. 07 12月, 2022 1 次提交
    • S
      Uncache: optimize write operation (#1844) · 37225120
      sfencevma 提交于
      This commit adds an uncache write buffer to accelerate uncache write
      
      For uncacheable address range, now we use atomic bit in PMA to indicate
      uncache write in this range should not use uncache write buffer.
      
      Note that XiangShan does not support atomic insts in uncacheable address range.
      
      * uncache: optimize write operation
      
      * pma: add atomic config
      
      * uncache: assign hartId
      
      * remove some pma atomic
      
      * extend peripheral id width
      Co-authored-by: NLyn <lyn@Lyns-MacBook-Pro.local>
      37225120
  9. 05 12月, 2022 1 次提交
  10. 02 12月, 2022 2 次提交
    • H
      Replay all load instructions from LQ (#1838) · a760aeb0
      happy-lx 提交于
      This intermediate architecture replays all load instructions from LQ.
      An independent load replay queue will be added later.
      
      Performance loss caused by changing of load replay sequences will be
      analyzed in the future.
      
      * memblock: load queue based replay
      
      * replay load from load queue rather than RS
      * use counters to delay replay logic
      
      * memblock: refactor priority
      
      * lsq-replay has higher priority than try pointchasing
      
      * RS: remove load store rs's feedback port
      
      * ld-replay: a new path for fast replay
      
      * when fast replay needed, wire it to loadqueue and it will be selected
      this cycle and replay to load pipline s0 in next cycle
      
      * memblock: refactor load S0
      
      * move all the select logic from lsq to load S0
      * split a tlbReplayDelayCycleCtrl out of loadqueue to speed up
      generating emu
      
      * loadqueue: parameterize replay
      a760aeb0
    • H
      mmu: increase mmu timeout to 10000 (#1839) · 914b8455
      Haoyuan Feng 提交于
      914b8455
  11. 30 11月, 2022 1 次提交
  12. 22 11月, 2022 1 次提交
  13. 21 11月, 2022 1 次提交
  14. 19 11月, 2022 20 次提交
  15. 18 11月, 2022 4 次提交
    • B
      l2tlb: add dup register & add blockhelper & llptw mem resp select timing optimization (#1752) · 7797f035
      bugGenerator 提交于
      This commit includes:
      1. timimg optimization: add dup register and optimize llptw mem resp select relative logic
      2. l2tlb more fifo: add a blockhelper to help l2tlb behave more like a fifo to l1tlb. And fix some cases that cause page cache s has dupliacate entries (not cover all cases).
      
      * l2tlb: add duplicate reg for better fanout (#1725)
      
      page cache has large fanout:
      1. addr_low -> sel data
      2. level
      3. sfence
      4. ecc error flush
      
      solution, add duplicate reg:
      1. sfence/csr reg
      2. ecc error reg
      3. memSelData
      4. one hot level code
      
      * l2tlb: fix bug that wrongle chosen req info from llptw
      
      * l2tlb.cache: move hitCheck into StageDelay
      
      * l2tlb: optimize mem resp data selection to ptw
      
      * l2tlb.llptw: optimize timing for pmp check of llptw
      
      * l2tlb.cache: move v-bits select into stageReq
      
      * l2tlb.llptw: req that miss mem should re-access cache
      
      * l2tlb.llptw: fix bug that mix mem_ptr and cache_ptr
      
      * l2tlb.llptw: fix bug that lost a case for merge
      
      * l2tlb.llptw: fix bug of state change priority
      
      * l2tlb.prefetch: add filter buffer and perf counter
      
      * mmu: change TimeOutThreshold to 3000
      
      * l2tlb: ptw has highest priority to enq llptw
      
      * l2tlb.cache: fix bug of bypassed logic
      
      * l2tlb.llptw: fix bug that flush failed to flush pmp check
      
      * l2tlb: add blockhelper to make l2tlb more fifo
      
      * mmu: change TimeOutThreshold to 5000
      
      * l2tlb: new l1tlb doesn't enter ptw directly
      
      a corner case complement to:
      commit(3158ab8f): "l2tlb: add blockhelper to make l2tlb more fifo"
      7797f035
    • L
      dcache: rename `dups` to `dup` · 779109e3
      lixin 提交于
      779109e3
    • W
      dcache: divide meta array into nWays banks (#1723) · 93f90faa
      William Wang 提交于
      It should reduce dcache meta write fanout. Now dcache meta write
      actually takes 2 cycles
      93f90faa
    • W
      sbuffer: opt mask clean fanout (#1720) · 8b1251e1
      William Wang 提交于
      We used to clean mask in sbuffer in 1 cycle when do sbuffer enq,
      which introduced 64*16 fanout.
      
      To reduce fanout, now mask in sbuffer is cleaned when dcache hit resp
      comes. Clean mask for a line in sbuffer takes 2 cycles.
      
      Meanwhile, dcache reqIdWidth is also reduced from 64 to
      log2Up(nEntries) max log2Up(StoreBufferSize).
      
      This commit will not cause perf change.
      8b1251e1