- 02 11月, 2022 2 次提交
-
-
由 Jay 提交于
* IFU <bug-fix>: deal with itlb miss for resend * IFU <bug fix>: enable crossPageFault for resend-pf Co-authored-by: NDeltaZero <lacrosseelis@gmail.com>
-
由 Lingrui98 提交于
-
- 01 11月, 2022 1 次提交
-
-
由 Haojin Tang 提交于
* freelist & refcounter: implement arch states * walk: restore and walk again when redirecting * ROB: optimize invalidation of `valid`
-
- 31 10月, 2022 1 次提交
-
-
由 wakafa 提交于
* config: minimalconfig use non-inclusive L3 cache * config: make simulation config dependent on FPGAPlatform
-
- 29 10月, 2022 1 次提交
-
-
由 Haojin Tang 提交于
-
- 21 10月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
* axi4,mem: fix typo for pending_write_resp_id * axi4,mem: fix has_write_resp condition
-
- 15 10月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
-
- 13 10月, 2022 1 次提交
-
-
由 happy-lx 提交于
Now we update data field (fwd data, uop) in load queue when load_s2 is valid. It will help to on lq wen fanout problem. State flags will be treated differently. They are still updated accurately according to loadIn.valid Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
-
- 30 9月, 2022 2 次提交
-
-
由 happy-lx 提交于
* ldu: optimize dcache hitvec wiring In previous design, hitvec is generated in load s1, then send to dcache and lsu (rs) side separately. As dcache and lsu (rs side) is far in real chip, it caused severe wiring problem. Now we generate 2 hitvec in parallel: * hitvec 1 is generated near dcache. To generate that signal, paddr from dtlb is sent to dcache in load_s1 to geerate hitvec. The hitvec is then sent to dcache to generate data array read_way_en. * hitvec 2 is generated near lsu and rs in load_s2, tag read result from dcache, as well as coh_state, is sent to lsu in load_s1, then it is used to calcuate hitvec in load_s2. hitvec 2 is used to generate hit/miss signal used by lsu. It should fix the wiring problem caused by hitvec * ldu: opt loadViolationQuery.resp.ready timing An extra release addr register is added near lsu to speed up the generation of loadViolationQuery.resp.ready * l1tlb: replace NormalPage data module and add duplicate resp result data module: add BankedSyncDataMoudleWithDup data module: divided the data array into banks and read as Async, bypass write data. RegNext the data result * #banks. choose from the chosen data. duplicate: duplicate the chosen data and return to outside(tlb). tlb return (ppn+perm) * #DUP to outside (for load unit only) TODO: load unit use different tlb resp result to different module. one for lsq, one for dcache. * l1tlb: Fix wrong vidx_bypass logic after using duplicate data module We use BankedSyncDataMoudleWithDup instead of SyncDataModuleTemplate, whose write ports are not Vec. Co-authored-by: NWilliam Wang <zeweiwang@outlook.com> Co-authored-by: NZhangZifei <1773908404@qq.com> Co-authored-by: Ngood-circle <fenghaoyuan19@mails.ucas.ac.cn>
-
由 happy-lx 提交于
* AtomicsUnit: refactor FSM in AtomicsUnit * send tlb req and sbuffer flush req at the same time * remove s_cache_resp_latch state * change `data_valid` logic: do not send dcache req until `data_valid` is true * Atomicsunit: add `s_cache_resp_latch` state back
-
- 18 9月, 2022 2 次提交
-
-
由 happy-lx 提交于
* lq: fix load to load check logic * when a load instruction missed in dcache and then refilled by dcache, waiting to be written back, if the block is released by dcache, it also needs to be marked as released * lq: refix load-load violation check logic
-
由 happy-lx 提交于
* dcache, atomicUnit: remove Atomicsreplayunit mvoe functions and replay feature in Atomicsreplayunit to Atomicsunit * Atomicsunit: fix difftest check signals
-
- 15 9月, 2022 1 次提交
-
-
由 Lemover 提交于
-
- 04 9月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
To reduce fanout of in.valid and address, delay write by one clock cycle. Should be careful whether this brings bugs.
-
- 03 9月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
-
- 02 9月, 2022 2 次提交
- 01 9月, 2022 6 次提交
- 31 8月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit fixes a bug when FMA partially issues but is flushed just after it is issues. In this case, new instruction will enter the RS and writes the data array. However, previously midResult from FMA is written into the data array two cycles after issue. This may cause the wrong data to be written into the data array. This is a rare case because usually instructions enter RS in-order, unless dispatch2 is blocked.
-
- 29 8月, 2022 2 次提交
- 23 8月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
-
- 22 8月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit optimizes the timing of load-load forwarding by making it speculatively issue requests to TLB/dcache. When load_s0 does not have a valid instruction and load_s3 writes a valid instruction back, we speculatively bypass the writeback data to load_s0 and assume there will be a pointer chasing instruction following it. A pointer chasing instruction has a base address that comes from a previous instruction with a small offset. To avoid timing issues, now only when the offset does not change the cache set index, we reduce its latency by speculatively issuing it.
-
- 17 8月, 2022 2 次提交
-
-
由 Yinan Xu 提交于
-
由 zhanglinjuan 提交于
-
- 16 8月, 2022 8 次提交
-
-
由 Yinan Xu 提交于
Move selection to stage1. Should benefit the timing for function units.
-
由 Yinan Xu 提交于
Separate deqResp for selectPtr/allocatePtr/oldestPtr.
-
由 Yinan Xu 提交于
-
由 Yinan Xu 提交于
Separate selection into dispatch/issueSelect/oldestSelect.
-
由 Yinan Xu 提交于
-
由 Yinan Xu 提交于
-
由 Yinan Xu 提交于
-
由 Yinan Xu 提交于
-
- 12 8月, 2022 1 次提交
-
-
由 Lemover 提交于
-
- 09 8月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
* rs,status: simplify deqRespSucc condition This commit optimizes the logic of deqResp in StatusArray of RS. We use ParallelMux instead of Mux1H to ensure that deqRespSucc is asserted only when deq.valid. This reduces one logic level of AND. * rs,select: optimize update logic of age matrix * fdivSqrt: add separated registers for data selection Optimize the fanout of sel valid bits. * fu: reduce fanout of emptyVec in InputBuffer
-