提交 · b35479a0bc7bed88e2778d007e139d5db3f8452b · OpenXiangShan / XiangShan

28 1月, 2023 1 次提交
- W
  
  dcache: add hardware prefetch interface · b52348ae
  由 William Wang 提交于 10月 13, 2022
  
  b52348ae
25 12月, 2022 1 次提交

Separate Utility submodule from XiangShan (#1861) · 3c02ee8f

由 wakafa 提交于 12月 25, 2022

* misc: add utility submodule

* misc: adjust to new utility framework

* bump utility: revert resetgen

* bump huancun

3c02ee8f

17 11月, 2022 1 次提交

top-down: introduce top-down counters and scripts (#1803) · eb163ef0

由 Haojin Tang 提交于 11月 17, 2022

* top-down: add initial top-down features

* rob600: enlarge queue/buffer size

* 🎨 After git pull

* ✨ Add BranchResteers->CtrlBlock

* ✨ Cg BranchResteers after pending

* ✨ Add robflush_bubble & ldReplay_bubble

* 🚑 Fix loadReplay->loadReplay.valid

* 🎨 Dlt printf

* ✨ Add stage2_redirect_cycles->CtrlBlock

* :saprkles: CtrlBlock:Add s2Redirect_when_pending

* ✨ ID:Add ifu2id_allNO_cycle

* ✨ Add ifu2ibuffer_validCnt

* ✨ Add ibuffer_IDWidth_hvButNotFull

* ✨ Fix ifu2ibuffer_validCnt

* 🚑 Fix ibuffer_IDWidth_hvButNotFull

* ✨ Fix ifu2ibuffer_validCnt->stop

* feat(buggy): parameterize load/store pipeline, etc.

* fix: use LoadPipelineWidth rather than LoadQueueSize

* fix: parameterize `rdataPtrExtNext`

* fix(SBuffer): fix idx update logic

* fix(Sbuffer): use `&&` to generate flushMask instead of `||`

* fix(atomic): parameterize atomic logic in `MemBlock`

* fix(StoreQueue): update allow enque requirement

* chore: update comments, requirements and assertions

* chore: refactor some Mux to meet original logic

* feat: reduce `LsMaxRsDeq` to 2 and delete it

* feat: support one load/store pipeline

* feat: parameterize `EnsbufferWidth`

* chore: resharp codes for better generated name

* top-down: add initial top-down features

* rob600: enlarge queue/buffer size

* top-down: add l1, l2, l3 and ddr loads bound perf counters

* top-down: dig into l1d loads bound

* top-down: move memory related counters to `Scheduler`

* top-down: add 2 Ldus and 2 Stus

* top-down: v1.0

* huancun: bump HuanCun to a version with top-down

* chore: restore parameters and update `build.sc`

* top-down: use ExcitingUtils instead of BoringUtils

* top-down: add switch of top-down counters

* top-down: add top-down scripts

* difftest: enlarge stuck limit cycles again
Co-authored-by: Ngaozeyu <gaozeyu18@mails.ucas.ac.cn>

eb163ef0

09 11月, 2022 10 次提交

J

<verifi>:ICache add condition for multiple-hit · ff1018c6
由 Jenius 提交于 10月 10, 2022

ff1018c6

Optimize ICache s2_hit_reg and Ftq timing · dc270d3b

由 Jenius 提交于 7月 26, 2022

* copy Ftq to ICache read valid signal

* move sram read data and miss data selection to IFU (after predecode)

dc270d3b

ftq: optimize to itlb and to prefetch timing · f56177cb

由 Jenius 提交于 7月 25, 2022

* copy address select signal for every copied port
* add 1 more copy for itlb request use
* add 1 cycle latency for ftq_pc_mem read before sending to IPrefetch

f56177cb

J
<bug-fix> ICacheMainPipe: fix pmp af condition · a61aefd2
由 Jenius 提交于 7月 25, 2022
```
* this bug is caused by trigger wait_state for a hit pmp af req
```
a61aefd2
J

ftq: move toICache copied registers in ftq · b004fa13
由 Jenius 提交于 7月 23, 2022

b004fa13
J

ICache: only separate dataArray to 4 × 2-way banks · afed18b5
由 Jenius 提交于 7月 20, 2022

afed18b5
L

ftq, icache: fix compilation errors · fd0ecf27
由 Lingrui98 提交于 11月 09, 2022

fd0ecf27
J
ftq: copy bpu bypass write registers · f22cf846
由 Jenius 提交于 7月 19, 2022
```
* FtqToICache add bypass write signal and use bypass signal
```
f22cf846

IFU/IPrefetch/ReplacePipe: adjust meta/data access · 2da4ac8c

由 Jenius 提交于 7月 19, 2022

* IFU: ignore ICache access bundle

* ICacheMainPipe: expand meta/data access output to 4 identical vector
output, each output is connected to a copied register trigger by FTQ
requests

* IPrefetch/ReplacePipe: expand meta/data access outpu to 4 indentical
vector output, and each output is triggered by the same signal group

2da4ac8c

[WIP]FTQ: add icache req port · c5c5edae

由 Jenius 提交于 7月 16, 2022

* separate ifu req and icache req for timing optimization

* both ifu ftq_req_ready and icache ftq_req_ready depend on each other

* ifu and icache has pc_mem register

[WIP]ICacheMainPipe: add copied registers

[WIP]ftq: read ftq_pc_mem one cycle ahead, reqs to be copied

[WIP] FTQ:  delete outside bypass

c5c5edae

02 11月, 2022 11 次提交
- J
  <bug-fix>: add s2_valid for pmp access fault · 2f12ee53
  由 Jenius 提交于 7月 14, 2022
```
* without s2_valid, invalid pmp_af will cause wait_state turn into
wait_pmp_except and incorrect read data
```
  2f12ee53
- J
  
  <bug-fix> fix page fault cause fetch finish bug · 4a9944cb
  由 Jenius 提交于 7月 06, 2022
  
  4a9944cb
- J
  <timing>: optimize ICacheMainPipe s2 timing · 227f2b93
  由 Jenius 提交于 7月 05, 2022
```
- Move tag and idx compare to s1 in secondary miss

- Delay 1 cycle when PMP report an access fault and ICache miss
```
  227f2b93
- J
  <bug-fix> fix mmio signal mismatch · 3c40eee8
  由 Jenius 提交于 7月 05, 2022
```
using RegNext causes a memory fetch req incorrectly perceived as a mmio
req
```
  3c40eee8
- J
  Revert "<bug-fix> fix mmio signal mismatch" · e81c8021
  由 Jenius 提交于 7月 06, 2022
```
This reverts commit 99529e48.
```
  e81c8021
- J
  Revert "<timing>: optimize ICacheMainPipe s2 timing" · a8fabd82
  由 Jenius 提交于 7月 06, 2022
```
This reverts commit 33b74280.
```
  a8fabd82
- J
  <timing>: optimize ICacheMainPipe s2 timing · 8e7999dd
  由 Jenius 提交于 7月 05, 2022
```
- Move tag and idx compare to s1 in secondary miss

- Delay 1 cycle when PMP report an access fault and ICache miss
```
  8e7999dd
- J
  <bug-fix> fix mmio signal mismatch · 10dc1cf2
  由 Jenius 提交于 7月 05, 2022
```
using RegNext causes a memory fetch req incorrectly perceived as a mmio
req
```
  10dc1cf2
- J
  
  <timing> : send mmio response in next cycle · 425af251
  由 Jenius 提交于 6月 28, 2022
  
  425af251
- J
  
  <timing> icache: move data select logic to s2 · 3fbf8eaf
  由 Jenius 提交于 6月 24, 2022
  
  3fbf8eaf
- J
  delete 500 cycle wait · bbf46584
  由 Jenius 提交于 6月 06, 2022
```
* add SRAM ready (resetfinish) condition for *Array (metaArray/dataArray)
req.ready
```
  bbf46584
30 9月, 2022 1 次提交

Sync timing modification of #1681 and #1793 (#1793) · 03efd994

由 happy-lx 提交于 9月 30, 2022

* ldu: optimize dcache hitvec wiring

In previous design, hitvec is generated in load s1, then send to dcache
and lsu (rs) side separately. As dcache and lsu (rs side) is far in real
chip, it caused severe wiring problem.

Now we generate 2 hitvec in parallel:

* hitvec 1 is generated near dcache.
To generate that signal, paddr from dtlb is sent to dcache in load_s1
to geerate hitvec. The hitvec is then sent to dcache to generate
data array read_way_en.

* hitvec 2 is generated near lsu and rs in load_s2, tag read result
from dcache, as well as coh_state, is sent to lsu in load_s1,
then it is used to calcuate hitvec in load_s2.  hitvec 2 is used
to generate hit/miss signal used by lsu.

It should fix the wiring problem caused by hitvec

* ldu: opt loadViolationQuery.resp.ready timing

An extra release addr register is added near lsu to speed up the
generation of loadViolationQuery.resp.ready

* l1tlb: replace NormalPage data module and add duplicate resp result

data module:
add BankedSyncDataMoudleWithDup data module:
divided the data array into banks and read as Async, bypass write data.
RegNext the data result * #banks. choose from the chosen data.

duplicate:
duplicate the chosen data and return to outside(tlb).
tlb return (ppn+perm) * #DUP to outside (for load unit only)

TODO: load unit use different tlb resp result to different module.
one for lsq, one for dcache.

* l1tlb: Fix wrong vidx_bypass logic after using duplicate data module

We use BankedSyncDataMoudleWithDup instead of SyncDataModuleTemplate,
whose write ports are not Vec.
Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
Co-authored-by: NZhangZifei <1773908404@qq.com>
Co-authored-by: Ngood-circle <fenghaoyuan19@mails.ucas.ac.cn>

03efd994

22 8月, 2022 1 次提交

rs,mem: optimize load-load forwarding timing (#1742) · c3b763d0

由 Yinan Xu 提交于 8月 22, 2022

This commit optimizes the timing of load-load forwarding by making
it speculatively issue requests to TLB/dcache.

When load_s0 does not have a valid instruction and load_s3 writes
a valid instruction back, we speculatively bypass the writeback
data to load_s0 and assume there will be a pointer chasing instruction
following it. A pointer chasing instruction has a base address that
comes from a previous instruction with a small offset. To avoid timing
issues, now only when the offset does not change the cache set index,
we reduce its latency by speculatively issuing it.

c3b763d0

18 7月, 2022 1 次提交

l1tlb: tlb's req port can be configured to be block or non-blocked (#1656) · f1fe8698

由 Lemover 提交于 7月 18, 2022

each tlb's port can be configured to be block or non-blocked.
For blocked port, there will be a req miss slot stored in tlb, but belong to
core pipeline, which means only core pipeline flush will invalid them.

For another, itlb also use PTW Filter but with only 4 entries.
Last, keep svinval extension as usual, still work.


* tlb: add blocked-tlb support, miss frontend changes

* tlb: remove tlb's sameCycle support, result will return at next cycle

* tlb: remove param ShouldBlock, move block method into TLB module

* tlb: fix handle_block's miss_req logic

* mmu.filter: change filter's req.ready to canEnqueue

when filter can't let all the req enqueue, set the req.ready to false.
canEnqueue after filtering has long latency, so we use **_fake
without filtering, but the filter will still receive the reqs if
it can(after filtering).

* mmu.tlb: change name from BTlbPtwIO to VectorTlbPtwIO

* mmu: replace itlb's repeater to filter&repeaternb

* mmu.tlb: add TlbStorageWrapper to make TLB cleaner

more: BlockTlbRequestorIO is same with TlbRequestorIO, rm it

* mmu.tlb: rm unused param in function r_req_apply, fix syntax bug

* [WIP]icache: itlb usage from non-blocked to blocked

* mmu.tlb: change parameter NBWidth to Seq of boolean

* icache.mainpipe: fix itlb's resp.ready, not always true

* mmu.tlb: add kill sigal to blocked req that needs sync but fail

in frontend, icache,itlb,next pipe may not able to sync.
blocked tlb will store miss req ang blocks req, which makes itlb
couldn't work. So add kill logic to let itlb not to store reqs.

One more thing: fix icache's blocked tlb handling logic

* icache.mainpipe: fix tlb's ready_recv logic

icache mainpipe has two ports, but these two ports may not valid
all the same time. So add new signals tlb_need_recv to record whether
stage s1 should wait for the tlb.

* tlb: when flush, just set resp.valid and pf, pf for don't use it

* tlb: flush should concern satp.changed(for blocked io now)

* mmu.tlb: add new flush that doesn't flush reqs

Sfence.vma will flush inflight reqs and flushPipe
But some other sfence(svinval...) will not. So add new flush to
distinguish these two kinds of sfence signal

morw: forget to assign resp result when ptw back, fix it

* mmu.tlb: beautify miss_req_v and miss_v relative logic

* mmu.tlb: fix bug, when ptw back and bypass, concern level to genPPN

bug: when ptw back and bypass, forgot to concern level(1GB/2MB/4KB)
when genPPN.

by the way: some funtions need ": Unit = ", add it.

* mmu.filter: fix bug of canEnqueue, mixed with tlb_req and tlb.req

* icache.mainpipe: fix bug of tlbExcp's usage, & with tlb_need_back

Icache's mainpipe has two ports, but may only port 0 is valid.
When a port is invalid, the tlbexcp should be false.(Actually, should
be ignored).
So & tlb_need_back to fix this bug.

* sfence: instr in svinval ext will also flush pipe

A difficult problem to handle:
Sfence and Svinval will flush MMU, but only Sfence(some svinval)
  will flush pipe. For itlb that some requestors are blocked and
  icache doesn't recv flush for simplicity, itlb's blocked ptw req
  should not be flushed.
It's a huge problem for MMU to handle for good or bad solutions. But
  svinval is seldom used, so disable it's effiency.

* mmu: add parameter to control mmu's sfence delay latency

Difficult problem:
  itlb's blocked req should not be abandoned, but sfence will flush
  all infight reqs. when itlb and itlb repeater's delay is not same(itlb
  is flushed, two cycles later, itlb repeater is flushed, then itlb's
  ptw req after flushing will be also flushed sliently.
So add one parameter to control the flush delay to be the same.

* mmu.tlb: fix bug of csr.priv's delay & sfence valid when req fire

1. csr.priv's delay
csr.priv should not be delayed, csr.satp should be delayed.
for excep/intr will change csr.priv, which will be changed at one
instruction's (commit?). but csrrw satp will not, so satp has more
cycles to delay.
2. sfence
when sfence valid but blocked req fire, resp should still fire.
3. satp in TlbCsrBundle
let high bits of satp.ppn to be 0.U

* tlb&icache.mainpipe: rm commented codes

* mmu: move method genPPN to entry bundle

* l1tlb: divide l1tlb flush into flush_mmu and flush_pipe

Problem:
For l1tlb, there are blocked and non-blocked req ports.
For blocked ports, there are req slots to store missed reqs.
Some mmu flush like Sfence should not flush miss slots for outside
may still need get tlb resp, no matter wrong and correct resp.
For example. sfence will flush mmu and flush pipe, but won't flush
reqs inside icache, which waiting for tlb resp.
For example, svinval instr will flush mmu, but not flush pipe. so
tlb should return correct resp, althrough the ptw req is flushed
when tlb miss.

Solution:
divide l1tlb flush into flush_mmu and flush_pipe.
The req slot is considered to be a part of core pipeline and should
only be flushed by flush_pipe.
flush_mmu will flush mmu entries and inflight ptw reqs.
When miss but sfence flushed its ptw req, re-send.

* l1tlb: code clean, correct comments and rm unused codes

* l2tlb: divide filterSize into ifiterSize and dfilterSize

* l2tlb: prefetch req won't enter miss queue. Rename MSHR to missqueue

* l1tlb: when disable vm, ptw back should not bypass tlb and should let miss req go ahead

f1fe8698

06 6月, 2022 1 次提交

delete 500 cycle wait · 19d62fa1

由 Jenius 提交于 6月 06, 2022

* add SRAM ready (resetfinish) condition for *Array (metaArray/dataArray)
req.ready

19d62fa1

26 5月, 2022 1 次提交
- J
  
  fix for chipsalliance/chisel3#2496 (#1563) · 005e809b
  由 Jiuyang Liu 提交于 5月 26, 2022
  
  005e809b
25 4月, 2022 1 次提交
- C
  fix some typos (#1537) · 1c746d3a
  由 cui fliter 提交于 4月 25, 2022
```
* fix some typos
Signed-off-by: Ncuishuang <imcusg@gmail.com>
```
  1c746d3a
16 2月, 2022 1 次提交
- J
  
  ICacheMainPipe <bug-fix>: allow tlb req when cache miss (#1467) · b127c1ed
  由 Jay 提交于 2月 16, 2022
  
  b127c1ed
13 2月, 2022 1 次提交

ITLB <timing>: delay miss and flush req for ITLB (#1457) · 91df15e5

由 Jay 提交于 2月 13, 2022

* ITLB <timing>: delay miss and flush req for ITLB

* add 2 ILTB requestor and delete tlb_arb

* Bump huancun

* ICacheMainPipe <bug-fix>: fix slot invalid condition

* ITLB <timing>: add port to 6

* ICacheMainPipe <bug-fix>: stop pipe when tlb miss

* ICacheMainPipe <bug-fix>: fix illegal flush
Co-authored-by: NLinJiawei <linjiawei20s@ict.ac.cn>

91df15e5

01 2月, 2022 1 次提交
- J
  
  ICache <bug-fix>: fix meta error when reset (#1447) · e8e4462c
  由 Jay 提交于 2月 01, 2022
  
  e8e4462c
28 1月, 2022 1 次提交

ICache <timing>: move parity decode to pipeline (#1443) · 79b191f7

由 Jay 提交于 1月 28, 2022

* ICache <timing>: move parity decode to pipe

* ICacheMainPipe <timing>: remove parity af

* ReplacePipe <timing>: delay error generating

79b191f7

26 1月, 2022 1 次提交

ICache : fix 2 potential rule violations according to TL specification (#1444) · 00240ba6

由 Jay 提交于 1月 26, 2022

* ReplacePipe: block miss until get ReleaseAck

* IPrefetch: cancle prefetch req when meet MSHR

* Fetch <perf>: add fetch bubble performance counters

00240ba6

23 1月, 2022 1 次提交

Fetch: optimization timing for IFU/ICache/IPrefetch (#1432) · 61e1db30

由 Jay 提交于 1月 23, 2022

* IFU <timing>: f2_data select signal optimization

* ICacheMainPipe <timing>: latch fetch req when tlb miss

* Frontend <timing>: add additional PMP checker

* Ftq <timing>: delete flush condition for prefetch.req

* ICacheMainPipe <timing>: move hit state change to s2

* ICache <bug-fix> delete PMP check assertion

* ICache <bug-fix> fix parity error condition

* ICacheMainPipe <bug-fix>: fix tlb resp condition

* when TLB req has been latched into tlb_slot, the
tlb_all_resp condition, which affects s0_fire should
depend on the slot result.

61e1db30

22 1月, 2022 4 次提交
- J
  ICacheMainPipe <bug-fix>: fix tlb resp condition · a11ea8d0
  由 JinYue 提交于 1月 19, 2022
```
* when TLB req has been latched into tlb_slot, the
tlb_all_resp condition, which affects s0_fire should
depend on the slot result.
```
  a11ea8d0
- J
  
  ICacheMainPipe <timing>: move hit state change to s2 · 30aee68a
  由 JinYue 提交于 1月 18, 2022
  
  30aee68a
- J
  
  ICacheMainPipe <timing>: latch fetch req when tlb miss · 71e336ff
  由 JinYue 提交于 1月 14, 2022
  
  71e336ff
- J
  
  IFU <timing>: f2_data select signal optimization · 0bca1ccb
  由 JinYue 提交于 1月 14, 2022
  
  0bca1ccb

OpenXiangShan / XiangShan 10 个月 前同步成功

OpenXiangShan / XiangShan
10 个月前同步成功