1. 21 4月, 2023 1 次提交
  2. 16 4月, 2023 2 次提交
  3. 12 4月, 2023 1 次提交
  4. 10 4月, 2023 1 次提交
  5. 05 4月, 2023 1 次提交
    • X
      backend: refactor regfile rw parameters · 351e22f2
      Xuan Hu 提交于
      * support float memory load/store
      * refactor regfile read parameters
        * replace `numSrc` with `numRegSrc` to notice the src data being from regfile
      * refactor BusyTable read port
        * make int/vf BusyTable have the same number of read ports to simplify connection in Dispatch2Iq
        * the unused read port will be optimized
      * regular IQSize parameters
      * split writeback port for scheduler into two kinds by reg types
      351e22f2
  6. 27 3月, 2023 1 次提交
  7. 19 3月, 2023 1 次提交
    • H
      Fix replay logic in unified load queue (#1966) · 62dfd6c3
      happy-lx 提交于
      * difftest: monitor cache miss latency
      
      * lq, ldu, dcache: remove lq's data
      
      * lq's data is no longer used
      * replay cache miss load from lq (use counter to delay)
      * if dcache's mshr gets refill data, wake up lq's missed load
      * uncache load will writeback to ldu using ldout_0
      * ldout_1 is no longer used
      
      * lq, ldu: add forward port
      
      * forward D and mshr in load S1, get result in S2
      * remove useless code logic in loadQueueData
      
      * misc: revert monitor
      
      * lq: change replay cycle
      
      * lq: change replay cycle
      * change cycle to 11 36 10 10
      
      * Revert "lq: change replay cycle"
      
      This reverts commit 3ca74b63.
      And change replay cycles
      
      * lq: change replay cycle according to dramsim
      
      * change Reselectlen to 7
      * change replay cycle to (11, 18, 127, 17) to fit refill delay (14, 36,
      188)
      
      * lq: change replay cycle
      
      * change block_cycles_cache to (7, 0, 32, 51)
      
      * lq: change replay cycle
      
      * change block_cycles_cache to (7, 0, 126, 95)
      
      * lq: fix replay ptr update logic
      
      * fix priority of updating ptr
      * revert block_cycles_cache
      
      * lq: change tlb replay cycle
      
      * change tlbReplayDelayCycleCtrl to (15, 0, 126, 0)
      62dfd6c3
  8. 06 3月, 2023 1 次提交
  9. 17 2月, 2023 1 次提交
  10. 13 2月, 2023 1 次提交
    • B
      param: set EnableUncacheWriteOutstanding to false (#1913) · e32bafba
      bugGenerator 提交于
      Here is a bug cause by EnableUncacheWriteOutstanding:
      The case is extintr in Nexus-AM.
      Three steps of the test:
        clear intrGen's intr: Stop pass interrupt. A mmio write.
        clear plic claim: complete intr. A mmio write.
        read plic claim to check: claim should be 0. A mmio read.
      The corner case:
        intrGen's mmio write is to slow. The instruction after it executes
      and plic claim's mmio's write & read execute before it. On the side of
      core with plic, claim is cleared. But on the side of intrGen with plic,
      the source of interrupt is still enabled and trigger interrupt.
      So the "read plic claim to check" get a valid claim and failed.
      e32bafba
  11. 05 2月, 2023 1 次提交
  12. 30 1月, 2023 1 次提交
  13. 28 1月, 2023 1 次提交
  14. 04 1月, 2023 1 次提交
    • Maxpicca's avatar
      dcache: setup way predictor framework (#1857) · 144422dc
      Maxpicca 提交于
      This commit sets up a basic dcache way predictor framework and a dummy predictor.
      A Way Predictor Unit (WPU) module has been added to dcache. Dcache data SRAMs
      have been reorganized for that. 
      
      The dummy predictor is disabled by default. 
      
      Besides, dcache bank conflict check has been optimized. It may cause timing problems,
      to be fixed in the future.
      
      * ideal wpu
      
      * BankedDataArray: change architecture to reduce bank_conflict
      
      * BankedDataArray: add db analysis
      
      * Merge: the rest
      
      * BankedDataArray: change the logic of rrl_bank_conflict, but let the number of rw_bank_conflict up
      
      * Load Logic: changed to be as expected
      
      reading data will be delayed by one cycle to make selection
      writing data will be also delayed by one cycle to do write operation
      
      * fix: ecc check error
      
      * update the gitignore
      
      * WPU: add regular wpu and change the replay mechanism
      
      * WPU: fix refill fail bug, but a new addiw fail bug appears
      
      * WPU: temporarily turn off to PR
      
      * WPU: tfix all bug
      
      * loadqueue: fix the initialization of replayCarry
      
      * bankeddataarray: fix the bug
      
      * DCacheWrapper: fix bug
      
      * ready-to-run: correct the version
      
      * WayPredictor: comments clean
      
      * BankedDataArray: fix ecc_bank bug
      
      * Parameter: set the enable signal of wpu
      144422dc
  15. 23 12月, 2022 1 次提交
  16. 22 12月, 2022 1 次提交
  17. 21 12月, 2022 1 次提交
  18. 14 12月, 2022 1 次提交
  19. 13 12月, 2022 1 次提交
  20. 11 12月, 2022 1 次提交
  21. 07 12月, 2022 1 次提交
    • S
      Uncache: optimize write operation (#1844) · 37225120
      sfencevma 提交于
      This commit adds an uncache write buffer to accelerate uncache write
      
      For uncacheable address range, now we use atomic bit in PMA to indicate
      uncache write in this range should not use uncache write buffer.
      
      Note that XiangShan does not support atomic insts in uncacheable address range.
      
      * uncache: optimize write operation
      
      * pma: add atomic config
      
      * uncache: assign hartId
      
      * remove some pma atomic
      
      * extend peripheral id width
      Co-authored-by: NLyn <lyn@Lyns-MacBook-Pro.local>
      37225120
  22. 02 12月, 2022 1 次提交
    • H
      Replay all load instructions from LQ (#1838) · a760aeb0
      happy-lx 提交于
      This intermediate architecture replays all load instructions from LQ.
      An independent load replay queue will be added later.
      
      Performance loss caused by changing of load replay sequences will be
      analyzed in the future.
      
      * memblock: load queue based replay
      
      * replay load from load queue rather than RS
      * use counters to delay replay logic
      
      * memblock: refactor priority
      
      * lsq-replay has higher priority than try pointchasing
      
      * RS: remove load store rs's feedback port
      
      * ld-replay: a new path for fast replay
      
      * when fast replay needed, wire it to loadqueue and it will be selected
      this cycle and replay to load pipline s0 in next cycle
      
      * memblock: refactor load S0
      
      * move all the select logic from lsq to load S0
      * split a tlbReplayDelayCycleCtrl out of loadqueue to speed up
      generating emu
      
      * loadqueue: parameterize replay
      a760aeb0
  23. 18 11月, 2022 1 次提交
  24. 17 11月, 2022 1 次提交
    • H
      top-down: introduce top-down counters and scripts (#1803) · eb163ef0
      Haojin Tang 提交于
      * top-down: add initial top-down features
      
      * rob600: enlarge queue/buffer size
      
      * 🎨 After git pull
      
      *  Add BranchResteers->CtrlBlock
      
      *  Cg BranchResteers after pending
      
      *  Add robflush_bubble & ldReplay_bubble
      
      * 🚑 Fix loadReplay->loadReplay.valid
      
      * 🎨 Dlt printf
      
      *  Add stage2_redirect_cycles->CtrlBlock
      
      * :saprkles: CtrlBlock:Add s2Redirect_when_pending
      
      *  ID:Add ifu2id_allNO_cycle
      
      *  Add ifu2ibuffer_validCnt
      
      *  Add ibuffer_IDWidth_hvButNotFull
      
      *  Fix ifu2ibuffer_validCnt
      
      * 🚑 Fix ibuffer_IDWidth_hvButNotFull
      
      *  Fix ifu2ibuffer_validCnt->stop
      
      * feat(buggy): parameterize load/store pipeline, etc.
      
      * fix: use LoadPipelineWidth rather than LoadQueueSize
      
      * fix: parameterize `rdataPtrExtNext`
      
      * fix(SBuffer): fix idx update logic
      
      * fix(Sbuffer): use `&&` to generate flushMask instead of `||`
      
      * fix(atomic): parameterize atomic logic in `MemBlock`
      
      * fix(StoreQueue): update allow enque requirement
      
      * chore: update comments, requirements and assertions
      
      * chore: refactor some Mux to meet original logic
      
      * feat: reduce `LsMaxRsDeq` to 2 and delete it
      
      * feat: support one load/store pipeline
      
      * feat: parameterize `EnsbufferWidth`
      
      * chore: resharp codes for better generated name
      
      * top-down: add initial top-down features
      
      * rob600: enlarge queue/buffer size
      
      * top-down: add l1, l2, l3 and ddr loads bound perf counters
      
      * top-down: dig into l1d loads bound
      
      * top-down: move memory related counters to `Scheduler`
      
      * top-down: add 2 Ldus and 2 Stus
      
      * top-down: v1.0
      
      * huancun: bump HuanCun to a version with top-down
      
      * chore: restore parameters and update `build.sc`
      
      * top-down: use ExcitingUtils instead of BoringUtils
      
      * top-down: add switch of top-down counters
      
      * top-down: add top-down scripts
      
      * difftest: enlarge stuck limit cycles again
      Co-authored-by: Ngaozeyu <gaozeyu18@mails.ucas.ac.cn>
      eb163ef0
  25. 09 11月, 2022 3 次提交
  26. 18 7月, 2022 2 次提交
    • L
      dtlb: change volume from s128f8 to s64f16 (#1662) · 06082082
      Lemover 提交于
      DTLB volume configuration:
      old: normal page 128 direct-asso + super page 8 full-asso
      new: normal page 64 direct-asso + super page 16 full-asso
      Better timing and better driver now.
      
      For Spec06,some specs increase slightly, while some others decrease slightly.
      06082082
    • L
      l1tlb: tlb's req port can be configured to be block or non-blocked (#1656) · f1fe8698
      Lemover 提交于
      each tlb's port can be configured to be block or non-blocked.
      For blocked port, there will be a req miss slot stored in tlb, but belong to
      core pipeline, which means only core pipeline flush will invalid them.
      
      For another, itlb also use PTW Filter but with only 4 entries.
      Last, keep svinval extension as usual, still work.
      
      
      * tlb: add blocked-tlb support, miss frontend changes
      
      * tlb: remove tlb's sameCycle support, result will return at next cycle
      
      * tlb: remove param ShouldBlock, move block method into TLB module
      
      * tlb: fix handle_block's miss_req logic
      
      * mmu.filter: change filter's req.ready to canEnqueue
      
      when filter can't let all the req enqueue, set the req.ready to false.
      canEnqueue after filtering has long latency, so we use **_fake
      without filtering, but the filter will still receive the reqs if
      it can(after filtering).
      
      * mmu.tlb: change name from BTlbPtwIO to VectorTlbPtwIO
      
      * mmu: replace itlb's repeater to filter&repeaternb
      
      * mmu.tlb: add TlbStorageWrapper to make TLB cleaner
      
      more: BlockTlbRequestorIO is same with TlbRequestorIO, rm it
      
      * mmu.tlb: rm unused param in function r_req_apply, fix syntax bug
      
      * [WIP]icache: itlb usage from non-blocked to blocked
      
      * mmu.tlb: change parameter NBWidth to Seq of boolean
      
      * icache.mainpipe: fix itlb's resp.ready, not always true
      
      * mmu.tlb: add kill sigal to blocked req that needs sync but fail
      
      in frontend, icache,itlb,next pipe may not able to sync.
      blocked tlb will store miss req ang blocks req, which makes itlb
      couldn't work. So add kill logic to let itlb not to store reqs.
      
      One more thing: fix icache's blocked tlb handling logic
      
      * icache.mainpipe: fix tlb's ready_recv logic
      
      icache mainpipe has two ports, but these two ports may not valid
      all the same time. So add new signals tlb_need_recv to record whether
      stage s1 should wait for the tlb.
      
      * tlb: when flush, just set resp.valid and pf, pf for don't use it
      
      * tlb: flush should concern satp.changed(for blocked io now)
      
      * mmu.tlb: add new flush that doesn't flush reqs
      
      Sfence.vma will flush inflight reqs and flushPipe
      But some other sfence(svinval...) will not. So add new flush to
      distinguish these two kinds of sfence signal
      
      morw: forget to assign resp result when ptw back, fix it
      
      * mmu.tlb: beautify miss_req_v and miss_v relative logic
      
      * mmu.tlb: fix bug, when ptw back and bypass, concern level to genPPN
      
      bug: when ptw back and bypass, forgot to concern level(1GB/2MB/4KB)
      when genPPN.
      
      by the way: some funtions need ": Unit = ", add it.
      
      * mmu.filter: fix bug of canEnqueue, mixed with tlb_req and tlb.req
      
      * icache.mainpipe: fix bug of tlbExcp's usage, & with tlb_need_back
      
      Icache's mainpipe has two ports, but may only port 0 is valid.
      When a port is invalid, the tlbexcp should be false.(Actually, should
      be ignored).
      So & tlb_need_back to fix this bug.
      
      * sfence: instr in svinval ext will also flush pipe
      
      A difficult problem to handle:
      Sfence and Svinval will flush MMU, but only Sfence(some svinval)
        will flush pipe. For itlb that some requestors are blocked and
        icache doesn't recv flush for simplicity, itlb's blocked ptw req
        should not be flushed.
      It's a huge problem for MMU to handle for good or bad solutions. But
        svinval is seldom used, so disable it's effiency.
      
      * mmu: add parameter to control mmu's sfence delay latency
      
      Difficult problem:
        itlb's blocked req should not be abandoned, but sfence will flush
        all infight reqs. when itlb and itlb repeater's delay is not same(itlb
        is flushed, two cycles later, itlb repeater is flushed, then itlb's
        ptw req after flushing will be also flushed sliently.
      So add one parameter to control the flush delay to be the same.
      
      * mmu.tlb: fix bug of csr.priv's delay & sfence valid when req fire
      
      1. csr.priv's delay
      csr.priv should not be delayed, csr.satp should be delayed.
      for excep/intr will change csr.priv, which will be changed at one
      instruction's (commit?). but csrrw satp will not, so satp has more
      cycles to delay.
      2. sfence
      when sfence valid but blocked req fire, resp should still fire.
      3. satp in TlbCsrBundle
      let high bits of satp.ppn to be 0.U
      
      * tlb&icache.mainpipe: rm commented codes
      
      * mmu: move method genPPN to entry bundle
      
      * l1tlb: divide l1tlb flush into flush_mmu and flush_pipe
      
      Problem:
      For l1tlb, there are blocked and non-blocked req ports.
      For blocked ports, there are req slots to store missed reqs.
      Some mmu flush like Sfence should not flush miss slots for outside
      may still need get tlb resp, no matter wrong and correct resp.
      For example. sfence will flush mmu and flush pipe, but won't flush
      reqs inside icache, which waiting for tlb resp.
      For example, svinval instr will flush mmu, but not flush pipe. so
      tlb should return correct resp, althrough the ptw req is flushed
      when tlb miss.
      
      Solution:
      divide l1tlb flush into flush_mmu and flush_pipe.
      The req slot is considered to be a part of core pipeline and should
      only be flushed by flush_pipe.
      flush_mmu will flush mmu entries and inflight ptw reqs.
      When miss but sfence flushed its ptw req, re-send.
      
      * l1tlb: code clean, correct comments and rm unused codes
      
      * l2tlb: divide filterSize into ifiterSize and dfilterSize
      
      * l2tlb: prefetch req won't enter miss queue. Rename MSHR to missqueue
      
      * l1tlb: when disable vm, ptw back should not bypass tlb and should let miss req go ahead
      f1fe8698
  27. 14 7月, 2022 1 次提交
    • L
      dtlb: merge duplicated tlb together: one ld-tlb and one st-tlb. (#1654) · 53b8f1a7
      Lemover 提交于
      Old Edition:
      2 ld tlb but with same entries. 2 st tlb but wih the same entries.
      The 'duplicate' is used for timing optimization that each tlb can
      be placed close to mem access pipeline unit.
      
      Problem:
      The duplicate tlb takes more Power/Area.
      
      New Edition:
      Only 1 ld tlb and 1 st tlb now.
      If the area is not ok, may merge ld and st together.
      
      Fix: fix some syntax bug when changing parameters
      53b8f1a7
  28. 12 7月, 2022 1 次提交
    • W
      ldu: set load to use latency to 4 (#1623) · c837faaa
      William Wang 提交于
      This commit adds an extra cycle for load pipeline. It should fix timing problem caused by load pipeline.
      Huge perf loss is expected. Now load data result is sent to rs in load_s3, load may hit hint
      (fastUop.valid) is sent to rs in load_s2.
      
      We add a 3 cycle load to load fast forward data path. There should be enough time to forward
      data inside memory block.
      
      We will refactor code and add a load_s3 module in the future.
      
      BREAKING CHANGE: load pipeline reorginized
      c837faaa
  29. 28 6月, 2022 1 次提交
  30. 06 5月, 2022 1 次提交
    • H
      feat: parameterize load store (#1527) · 46f74b57
      Haojin Tang 提交于
      * feat: parameterize load/store pipeline, etc.
      
      * fix: use LoadPipelineWidth rather than LoadQueueSize
      
      * fix: parameterize `rdataPtrExtNext`
      
      * SBuffer: fix idx update logic
      
      * atomic: parameterize atomic logic in `MemBlock`
      
      * StoreQueue: update allow enque requirement
      
      * feat: support one load/store pipeline
      
      * feat: parameterize `EnsbufferWidth`
      
      * chore: resharp codes for better generated name
      46f74b57
  31. 28 1月, 2022 1 次提交
  32. 26 1月, 2022 1 次提交
  33. 20 1月, 2022 1 次提交
  34. 18 1月, 2022 1 次提交
  35. 13 1月, 2022 1 次提交
  36. 07 1月, 2022 1 次提交