提交 · 16c3b0b7e4c458adc29470b068f799266aa34fab · OpenXiangShan / XiangShan

08 12月, 2022 1 次提交

ldu: add st-ld violation re-execute (#1849) · 16c3b0b7

由 sfencevma 提交于 12月 08, 2022

* lsu: add st-ld violation re-execute

* misc: update vio check comments in LQ
Co-authored-by: NLyn <lyn@Lyns-MacBook-Pro.local>
Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>

16c3b0b7

07 12月, 2022 1 次提交

Uncache: optimize write operation (#1844) · 37225120

由 sfencevma 提交于 12月 07, 2022

This commit adds an uncache write buffer to accelerate uncache write

For uncacheable address range, now we use atomic bit in PMA to indicate
uncache write in this range should not use uncache write buffer.

Note that XiangShan does not support atomic insts in uncacheable address range.

* uncache: optimize write operation

* pma: add atomic config

* uncache: assign hartId

* remove some pma atomic

* extend peripheral id width
Co-authored-by: NLyn <lyn@Lyns-MacBook-Pro.local>

37225120

05 12月, 2022 1 次提交
- H
  ROB, difftest: add robidx support (#1845) · b211808b
  由 happy-lx 提交于 12月 05, 2022
```
* bump difftest and wire extra signals (robidx, lqidx, sqidx etc)
from ROB to difftest
```
  b211808b
02 12月, 2022 2 次提交

Replay all load instructions from LQ (#1838) · a760aeb0

由 happy-lx 提交于 12月 02, 2022

This intermediate architecture replays all load instructions from LQ.
An independent load replay queue will be added later.

Performance loss caused by changing of load replay sequences will be
analyzed in the future.

* memblock: load queue based replay

* replay load from load queue rather than RS
* use counters to delay replay logic

* memblock: refactor priority

* lsq-replay has higher priority than try pointchasing

* RS: remove load store rs's feedback port

* ld-replay: a new path for fast replay

* when fast replay needed, wire it to loadqueue and it will be selected
this cycle and replay to load pipline s0 in next cycle

* memblock: refactor load S0

* move all the select logic from lsq to load S0
* split a tlbReplayDelayCycleCtrl out of loadqueue to speed up
generating emu

* loadqueue: parameterize replay

a760aeb0

H

mmu: increase mmu timeout to 10000 (#1839) · 914b8455
由 Haoyuan Feng 提交于 12月 02, 2022

914b8455

30 11月, 2022 1 次提交
- H
  rob, mmu: fix bug of not specifying signal width (#1840) · f3034303
  由 Haoyuan Feng 提交于 11月 30, 2022
```
Co-authored-by: NYinan Xu <xuyinan@ict.ac.cn>
```
  f3034303
22 11月, 2022 1 次提交
- W
  Merge pull request #1831 from OpenXiangShan/nanhu-lsu-timing-to-master · 5da19fb3
  由 William Wang 提交于 11月 22, 2022
```
Rebase nanhu lsu timing opt to master
```
  5da19fb3
21 11月, 2022 1 次提交
- W
  
  ci: bump ready-to-run nemu · 688bb537
  由 William Wang 提交于 11月 21, 2022
  
  688bb537
19 11月, 2022 20 次提交

W

lsu: fix nanhu cherry-pick conflict · 34ffc2fb
由 William Wang 提交于 11月 05, 2022

34ffc2fb
W

atom: lr should raise load misalign exception · 8c343485
由 William Wang 提交于 10月 31, 2022

8c343485
W

ci: add extra pmp test · b4edc553
由 William Wang 提交于 10月 31, 2022

b4edc553

csr: medeleg write should have 0xb3ff mask · 5e4ec482

由 William Wang 提交于 10月 29, 2022

According to the RISC-V manual, exception code 14 is reserved.

See https://github.com/OpenXiangShan/NEMU/commit/9800da6a5e660dae5411c9b303833bc84bc04db4

5e4ec482

Fix atom inst pmp inplementation (#1813) · 0fedb24c

由 William Wang 提交于 11月 19, 2022

* atom: fix atom inst storeAccessFault gen logic

* atom, pmp: atom access !r addr should raise SAF

* atom: lr should raise load access fault

0fedb24c

dcache: fix replace & probeAck TtoB perm problem (#1791) · b8f6ff86

由 William Wang 提交于 9月 26, 2022

* chore: fix WBQEntryReleaseUpdate bundle naming

There is no real hardware change

* dcache: fix replace & probeAck TtoB perm problem

When dcache replaces a cacheline, it will move that cacheline data to
writeback queue, and wait until refill data come. When refill data
comes, it writes dcache data array and update meta for that cacheline,
then wakes up cacheline release req and write data to l2 cache.

In previous design, if a probe request comes before real l1 to l2 release
req, it can be merged in the same writeback queue entry. Probe req will
update dcache meta in mainpipe s3, then be merged in writeback queue.
However, for a probe TtoB req, the following problem may happen:

1) a replace req waits for refill in writeback queue entry X
2) probe TtoB req enters mainpipe s3, set cacheline coh to B
3) probe TtoB req is merged to writeback queue entry X
4) writeback queue entry X is waken up, do probeack immediately (TtoN)
5) refill data for replace req comes from l2, a refill req enters mainpipe
and update dcache meta (set cacheline being replaced coh to N)

Between 4) and 5), l2 thinks that l1 coh is N, but l1 coh is actually B,
here comes the problem.

Temp patch for nanhu:

Now we let all probe req do extra check. If it is a TtoB probe req and the
coresponding cacheline release req is already in writeback queue, we set
dcache meta coh to N. As we do set block in dcache mainpipe, we can do
that check safely when probe req is in mainpipe.

b8f6ff86

W

dcache: optimize data sram read fanout (#1784) · a19ae480
由 William Wang 提交于 9月 22, 2022

a19ae480

ldu: fix replay from fetch signal for missed load (#1780) · 4b7b4cc9

由 William Wang 提交于 9月 12, 2022

When write back missed load, io.ldout.bits.uop.ctrl.replayInst
should not be overwriteen by load pipeline replay check result
`s3_need_replay_from_fetch`

4b7b4cc9

W
dcache: do not use mp s2_ready to gen data_read.valid (#1756) · 774f100a
由 William Wang 提交于 9月 03, 2022
```
* dcache: remove data read resp data_dup_0

* dcache: do not use mp s2_ready to gen data_read.valid
```
774f100a
Z

MemBlock: add pipeline for reqs between lsq and uncache (#1760) · a86e4de7
由 zhanglinjuan 提交于 9月 01, 2022

a86e4de7
Y
ld,rs: optimize load-load forward timing (#1762) · 74fe3640
由 Yinan Xu 提交于 9月 01, 2022
```
Move imm addition to stage 0.
```
74fe3640

ldu: remove dcache sram data from forwardData (#1754) · cc24c304

由 William Wang 提交于 8月 31, 2022

forwardData for load queue does not need data from dcache sram.
In this way, we remove load queue data wdata fanin from all dcache
data srams

cc24c304

Optimize buffers between L1 and L2 · 2fd089ae

由 Yinan Xu 提交于 8月 30, 2022

* remove 2 buffers from l1i to l2
* add 1 buffer between l2 and xbar

Latency changes:
* L1D to L2: +1
* L1I to L2: -1
* PTW to L2: +1

2fd089ae

W
dcache: update sc fail assert (#1745) · dc6f6b7b
由 William Wang 提交于 8月 24, 2022
```
Report error if sc fails too many times while
lrsc_addr === get_block_addr(s3_req.addr)
```
dc6f6b7b
W

ldu: opt dcache tag match hit for ldu timing (#1744) · 27dc8a4d
由 William Wang 提交于 8月 24, 2022

27dc8a4d

ldu: select data in load_s3 (#1743) · cb9c18dc

由 William Wang 提交于 8月 24, 2022

rdataVec (i.e. sram read result merge forward result) is still
generated in load_s2. It will be write to load queue in load_s2

cb9c18dc

Z

BankedDataArray: delay 1 cycle for writing for timing reason (#1747) · ea329fc7
由 zhanglinjuan 提交于 8月 24, 2022

ea329fc7
Z

MainPipe: fix bug in lrsc_count · 1bb97764
由 zhanglinjuan 提交于 8月 16, 2022

1bb97764
Z

MainPipe: fix fanout (#1735) · 6c7e5e86
由 zhanglinjuan 提交于 8月 13, 2022

6c7e5e86
W
dcache: only update wbq addr when allocate (#1731) · 84026448
由 William Wang 提交于 8月 11, 2022
```
It will remove fanout from mem_release.valid releated logic
```
84026448

18 11月, 2022 12 次提交

l2tlb: add dup register & add blockhelper & llptw mem resp select timing optimization (#1752) · 7797f035

由 bugGenerator 提交于 11月 18, 2022

This commit includes:
1. timimg optimization: add dup register and optimize llptw mem resp select relative logic
2. l2tlb more fifo: add a blockhelper to help l2tlb behave more like a fifo to l1tlb. And fix some cases that cause page cache s has dupliacate entries (not cover all cases).

* l2tlb: add duplicate reg for better fanout (#1725)

page cache has large fanout:
1. addr_low -> sel data
2. level
3. sfence
4. ecc error flush

solution, add duplicate reg:
1. sfence/csr reg
2. ecc error reg
3. memSelData
4. one hot level code

* l2tlb: fix bug that wrongle chosen req info from llptw

* l2tlb.cache: move hitCheck into StageDelay

* l2tlb: optimize mem resp data selection to ptw

* l2tlb.llptw: optimize timing for pmp check of llptw

* l2tlb.cache: move v-bits select into stageReq

* l2tlb.llptw: req that miss mem should re-access cache

* l2tlb.llptw: fix bug that mix mem_ptr and cache_ptr

* l2tlb.llptw: fix bug that lost a case for merge

* l2tlb.llptw: fix bug of state change priority

* l2tlb.prefetch: add filter buffer and perf counter

* mmu: change TimeOutThreshold to 3000

* l2tlb: ptw has highest priority to enq llptw

* l2tlb.cache: fix bug of bypassed logic

* l2tlb.llptw: fix bug that flush failed to flush pmp check

* l2tlb: add blockhelper to make l2tlb more fifo

* mmu: change TimeOutThreshold to 5000

* l2tlb: new l1tlb doesn't enter ptw directly

a corner case complement to:
commit(3158ab8f): "l2tlb: add blockhelper to make l2tlb more fifo"

7797f035

L

dcache: rename `dups` to `dup` · 779109e3
由 lixin 提交于 8月 10, 2022

779109e3
W
dcache: divide meta array into nWays banks (#1723) · 93f90faa
由 William Wang 提交于 8月 10, 2022
```
It should reduce dcache meta write fanout. Now dcache meta write
actually takes 2 cycles
```
93f90faa

sbuffer: opt mask clean fanout (#1720) · 8b1251e1

由 William Wang 提交于 8月 10, 2022

We used to clean mask in sbuffer in 1 cycle when do sbuffer enq,
which introduced 64*16 fanout.

To reduce fanout, now mask in sbuffer is cleaned when dcache hit resp
comes. Clean mask for a line in sbuffer takes 2 cycles.

Meanwhile, dcache reqIdWidth is also reduced from 64 to
log2Up(nEntries) max log2Up(StoreBufferSize).

This commit will not cause perf change.

8b1251e1

L

dcache: duplicate 3 more regs in cacheOpDecoder · 476e71e5
由 lixin 提交于 8月 10, 2022

476e71e5
Z

MainPipe: fix fanout of regs in stage 3 (#1718) · ca18e2c6
由 zhanglinjuan 提交于 8月 09, 2022

ca18e2c6
W
lq: update paddr in lq in load_s1 and load_s2 (#1707) · 0a47e4a1
由 William Wang 提交于 8月 09, 2022
```
Now we use 2 cycles to update paddr in lq. In this way,
paddr in lq is still valid in load_s3
```
0a47e4a1
L

dcache: duplicate cache_req_valid · 72e3aa13
由 lixin 提交于 8月 09, 2022

72e3aa13
L

dcache: duplicate regs in cacheOpDecoder · e47fc57c
由 lixin 提交于 8月 09, 2022

e47fc57c

lq: add 1 extra stage for lq data write (#1705) · 39f2ec76

由 William Wang 提交于 8月 09, 2022

Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish

Lq data will not be read in at least 2 cycles after write, so it is ok
to add this delay. For example:
T0: update lq meta, lq data write req start
T1: lq data write finish, new wbidx selected
T2: read lq data according to new wbidx selected

39f2ec76

W

misc: fix nanhu lsu cherry-pick conflict · c047ef9c
由 William Wang 提交于 11月 18, 2022

c047ef9c
W

std: add an extra pipe stage for std (#1704) · 0a992150
由 William Wang 提交于 8月 06, 2022

0a992150

OpenXiangShan / XiangShan 12 个月 前同步成功

OpenXiangShan / XiangShan
12 个月前同步成功