- 25 12月, 2022 1 次提交
-
-
由 wakafa 提交于
* misc: add utility submodule * misc: adjust to new utility framework * bump utility: revert resetgen * bump huancun
-
- 21 12月, 2022 2 次提交
-
-
由 Haoyuan Feng 提交于
* L2TLB: Fix a bug of Prefetcher * MMU: Add ChiselDB * MMU: Add Fake PTW * MMU: Fix ChiselDB for dual core
-
由 bugGenerator 提交于
-
- 15 12月, 2022 1 次提交
-
-
由 Xiaokun-Pei 提交于
* modified ptw and keep performance from dropping * fixed a bug in ptw * fixed the bug in ptw * fixed ptw:the bug that eemu go wrong at the third cycle and the bug that sfence cause in MC test
-
- 11 12月, 2022 1 次提交
-
-
由 William Wang 提交于
-
- 08 12月, 2022 1 次提交
-
-
由 sfencevma 提交于
* lsu: add st-ld violation re-execute * misc: update vio check comments in LQ Co-authored-by: NLyn <lyn@Lyns-MacBook-Pro.local> Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
-
- 07 12月, 2022 1 次提交
-
-
由 sfencevma 提交于
This commit adds an uncache write buffer to accelerate uncache write For uncacheable address range, now we use atomic bit in PMA to indicate uncache write in this range should not use uncache write buffer. Note that XiangShan does not support atomic insts in uncacheable address range. * uncache: optimize write operation * pma: add atomic config * uncache: assign hartId * remove some pma atomic * extend peripheral id width Co-authored-by: NLyn <lyn@Lyns-MacBook-Pro.local>
-
- 05 12月, 2022 1 次提交
-
-
由 happy-lx 提交于
* bump difftest and wire extra signals (robidx, lqidx, sqidx etc) from ROB to difftest
-
- 02 12月, 2022 2 次提交
-
-
由 happy-lx 提交于
This intermediate architecture replays all load instructions from LQ. An independent load replay queue will be added later. Performance loss caused by changing of load replay sequences will be analyzed in the future. * memblock: load queue based replay * replay load from load queue rather than RS * use counters to delay replay logic * memblock: refactor priority * lsq-replay has higher priority than try pointchasing * RS: remove load store rs's feedback port * ld-replay: a new path for fast replay * when fast replay needed, wire it to loadqueue and it will be selected this cycle and replay to load pipline s0 in next cycle * memblock: refactor load S0 * move all the select logic from lsq to load S0 * split a tlbReplayDelayCycleCtrl out of loadqueue to speed up generating emu * loadqueue: parameterize replay
-
由 Haoyuan Feng 提交于
-
- 30 11月, 2022 1 次提交
-
-
由 Haoyuan Feng 提交于
Co-authored-by: NYinan Xu <xuyinan@ict.ac.cn>
-
- 19 11月, 2022 19 次提交
-
-
由 William Wang 提交于
-
由 William Wang 提交于
-
由 William Wang 提交于
According to the RISC-V manual, exception code 14 is reserved. See https://github.com/OpenXiangShan/NEMU/commit/9800da6a5e660dae5411c9b303833bc84bc04db4
-
由 William Wang 提交于
* atom: fix atom inst storeAccessFault gen logic * atom, pmp: atom access !r addr should raise SAF * atom: lr should raise load access fault
-
由 William Wang 提交于
* chore: fix WBQEntryReleaseUpdate bundle naming There is no real hardware change * dcache: fix replace & probeAck TtoB perm problem When dcache replaces a cacheline, it will move that cacheline data to writeback queue, and wait until refill data come. When refill data comes, it writes dcache data array and update meta for that cacheline, then wakes up cacheline release req and write data to l2 cache. In previous design, if a probe request comes before real l1 to l2 release req, it can be merged in the same writeback queue entry. Probe req will update dcache meta in mainpipe s3, then be merged in writeback queue. However, for a probe TtoB req, the following problem may happen: 1) a replace req waits for refill in writeback queue entry X 2) probe TtoB req enters mainpipe s3, set cacheline coh to B 3) probe TtoB req is merged to writeback queue entry X 4) writeback queue entry X is waken up, do probeack immediately (TtoN) 5) refill data for replace req comes from l2, a refill req enters mainpipe and update dcache meta (set cacheline being replaced coh to N) Between 4) and 5), l2 thinks that l1 coh is N, but l1 coh is actually B, here comes the problem. Temp patch for nanhu: Now we let all probe req do extra check. If it is a TtoB probe req and the coresponding cacheline release req is already in writeback queue, we set dcache meta coh to N. As we do set block in dcache mainpipe, we can do that check safely when probe req is in mainpipe.
-
由 William Wang 提交于
-
由 William Wang 提交于
When write back missed load, io.ldout.bits.uop.ctrl.replayInst should not be overwriteen by load pipeline replay check result `s3_need_replay_from_fetch`
-
由 William Wang 提交于
* dcache: remove data read resp data_dup_0 * dcache: do not use mp s2_ready to gen data_read.valid
-
由 zhanglinjuan 提交于
-
由 Yinan Xu 提交于
Move imm addition to stage 0.
-
由 William Wang 提交于
forwardData for load queue does not need data from dcache sram. In this way, we remove load queue data wdata fanin from all dcache data srams
-
由 Yinan Xu 提交于
* remove 2 buffers from l1i to l2 * add 1 buffer between l2 and xbar Latency changes: * L1D to L2: +1 * L1I to L2: -1 * PTW to L2: +1
-
由 William Wang 提交于
Report error if sc fails too many times while lrsc_addr === get_block_addr(s3_req.addr)
-
由 William Wang 提交于
-
由 William Wang 提交于
rdataVec (i.e. sram read result merge forward result) is still generated in load_s2. It will be write to load queue in load_s2
-
由 zhanglinjuan 提交于
-
由 zhanglinjuan 提交于
-
由 zhanglinjuan 提交于
-
由 William Wang 提交于
It will remove fanout from mem_release.valid releated logic
-
- 18 11月, 2022 10 次提交
-
-
由 bugGenerator 提交于
This commit includes: 1. timimg optimization: add dup register and optimize llptw mem resp select relative logic 2. l2tlb more fifo: add a blockhelper to help l2tlb behave more like a fifo to l1tlb. And fix some cases that cause page cache s has dupliacate entries (not cover all cases). * l2tlb: add duplicate reg for better fanout (#1725) page cache has large fanout: 1. addr_low -> sel data 2. level 3. sfence 4. ecc error flush solution, add duplicate reg: 1. sfence/csr reg 2. ecc error reg 3. memSelData 4. one hot level code * l2tlb: fix bug that wrongle chosen req info from llptw * l2tlb.cache: move hitCheck into StageDelay * l2tlb: optimize mem resp data selection to ptw * l2tlb.llptw: optimize timing for pmp check of llptw * l2tlb.cache: move v-bits select into stageReq * l2tlb.llptw: req that miss mem should re-access cache * l2tlb.llptw: fix bug that mix mem_ptr and cache_ptr * l2tlb.llptw: fix bug that lost a case for merge * l2tlb.llptw: fix bug of state change priority * l2tlb.prefetch: add filter buffer and perf counter * mmu: change TimeOutThreshold to 3000 * l2tlb: ptw has highest priority to enq llptw * l2tlb.cache: fix bug of bypassed logic * l2tlb.llptw: fix bug that flush failed to flush pmp check * l2tlb: add blockhelper to make l2tlb more fifo * mmu: change TimeOutThreshold to 5000 * l2tlb: new l1tlb doesn't enter ptw directly a corner case complement to: commit(3158ab8f): "l2tlb: add blockhelper to make l2tlb more fifo"
-
由 lixin 提交于
-
由 William Wang 提交于
It should reduce dcache meta write fanout. Now dcache meta write actually takes 2 cycles
-
由 William Wang 提交于
We used to clean mask in sbuffer in 1 cycle when do sbuffer enq, which introduced 64*16 fanout. To reduce fanout, now mask in sbuffer is cleaned when dcache hit resp comes. Clean mask for a line in sbuffer takes 2 cycles. Meanwhile, dcache reqIdWidth is also reduced from 64 to log2Up(nEntries) max log2Up(StoreBufferSize). This commit will not cause perf change.
-
由 lixin 提交于
-
由 zhanglinjuan 提交于
-
由 William Wang 提交于
Now we use 2 cycles to update paddr in lq. In this way, paddr in lq is still valid in load_s3
-
由 lixin 提交于
-
由 lixin 提交于
-
由 William Wang 提交于
Now lq data is divided into 8 banks by default. Write to lq data takes 2 cycles to finish Lq data will not be read in at least 2 cycles after write, so it is ok to add this delay. For example: T0: update lq meta, lq data write req start T1: lq data write finish, new wbidx selected T2: read lq data according to new wbidx selected
-