提交 · 688bb537b030d959a1dcccdfd38f60806723576f · OpenXiangShan / XiangShan

21 11月, 2022 1 次提交
- W
  
  ci: bump ready-to-run nemu · 688bb537
  由 William Wang 提交于 11月 21, 2022
  
  688bb537
19 11月, 2022 20 次提交

W

lsu: fix nanhu cherry-pick conflict · 34ffc2fb
由 William Wang 提交于 11月 05, 2022

34ffc2fb
W

atom: lr should raise load misalign exception · 8c343485
由 William Wang 提交于 10月 31, 2022

8c343485
W

ci: add extra pmp test · b4edc553
由 William Wang 提交于 10月 31, 2022

b4edc553

csr: medeleg write should have 0xb3ff mask · 5e4ec482

由 William Wang 提交于 10月 29, 2022

According to the RISC-V manual, exception code 14 is reserved.

See https://github.com/OpenXiangShan/NEMU/commit/9800da6a5e660dae5411c9b303833bc84bc04db4

5e4ec482

Fix atom inst pmp inplementation (#1813) · 0fedb24c

由 William Wang 提交于 11月 19, 2022

* atom: fix atom inst storeAccessFault gen logic

* atom, pmp: atom access !r addr should raise SAF

* atom: lr should raise load access fault

0fedb24c

dcache: fix replace & probeAck TtoB perm problem (#1791) · b8f6ff86

由 William Wang 提交于 9月 26, 2022

* chore: fix WBQEntryReleaseUpdate bundle naming

There is no real hardware change

* dcache: fix replace & probeAck TtoB perm problem

When dcache replaces a cacheline, it will move that cacheline data to
writeback queue, and wait until refill data come. When refill data
comes, it writes dcache data array and update meta for that cacheline,
then wakes up cacheline release req and write data to l2 cache.

In previous design, if a probe request comes before real l1 to l2 release
req, it can be merged in the same writeback queue entry. Probe req will
update dcache meta in mainpipe s3, then be merged in writeback queue.
However, for a probe TtoB req, the following problem may happen:

1) a replace req waits for refill in writeback queue entry X
2) probe TtoB req enters mainpipe s3, set cacheline coh to B
3) probe TtoB req is merged to writeback queue entry X
4) writeback queue entry X is waken up, do probeack immediately (TtoN)
5) refill data for replace req comes from l2, a refill req enters mainpipe
and update dcache meta (set cacheline being replaced coh to N)

Between 4) and 5), l2 thinks that l1 coh is N, but l1 coh is actually B,
here comes the problem.

Temp patch for nanhu:

Now we let all probe req do extra check. If it is a TtoB probe req and the
coresponding cacheline release req is already in writeback queue, we set
dcache meta coh to N. As we do set block in dcache mainpipe, we can do
that check safely when probe req is in mainpipe.

b8f6ff86

W

dcache: optimize data sram read fanout (#1784) · a19ae480
由 William Wang 提交于 9月 22, 2022

a19ae480

ldu: fix replay from fetch signal for missed load (#1780) · 4b7b4cc9

由 William Wang 提交于 9月 12, 2022

When write back missed load, io.ldout.bits.uop.ctrl.replayInst
should not be overwriteen by load pipeline replay check result
`s3_need_replay_from_fetch`

4b7b4cc9

W
dcache: do not use mp s2_ready to gen data_read.valid (#1756) · 774f100a
由 William Wang 提交于 9月 03, 2022
```
* dcache: remove data read resp data_dup_0

* dcache: do not use mp s2_ready to gen data_read.valid
```
774f100a
Z

MemBlock: add pipeline for reqs between lsq and uncache (#1760) · a86e4de7
由 zhanglinjuan 提交于 9月 01, 2022

a86e4de7
Y
ld,rs: optimize load-load forward timing (#1762) · 74fe3640
由 Yinan Xu 提交于 9月 01, 2022
```
Move imm addition to stage 0.
```
74fe3640

ldu: remove dcache sram data from forwardData (#1754) · cc24c304

由 William Wang 提交于 8月 31, 2022

forwardData for load queue does not need data from dcache sram.
In this way, we remove load queue data wdata fanin from all dcache
data srams

cc24c304

Optimize buffers between L1 and L2 · 2fd089ae

由 Yinan Xu 提交于 8月 30, 2022

* remove 2 buffers from l1i to l2
* add 1 buffer between l2 and xbar

Latency changes:
* L1D to L2: +1
* L1I to L2: -1
* PTW to L2: +1

2fd089ae

W
dcache: update sc fail assert (#1745) · dc6f6b7b
由 William Wang 提交于 8月 24, 2022
```
Report error if sc fails too many times while
lrsc_addr === get_block_addr(s3_req.addr)
```
dc6f6b7b
W

ldu: opt dcache tag match hit for ldu timing (#1744) · 27dc8a4d
由 William Wang 提交于 8月 24, 2022

27dc8a4d

ldu: select data in load_s3 (#1743) · cb9c18dc

由 William Wang 提交于 8月 24, 2022

rdataVec (i.e. sram read result merge forward result) is still
generated in load_s2. It will be write to load queue in load_s2

cb9c18dc

Z

BankedDataArray: delay 1 cycle for writing for timing reason (#1747) · ea329fc7
由 zhanglinjuan 提交于 8月 24, 2022

ea329fc7
Z

MainPipe: fix bug in lrsc_count · 1bb97764
由 zhanglinjuan 提交于 8月 16, 2022

1bb97764
Z

MainPipe: fix fanout (#1735) · 6c7e5e86
由 zhanglinjuan 提交于 8月 13, 2022

6c7e5e86
W
dcache: only update wbq addr when allocate (#1731) · 84026448
由 William Wang 提交于 8月 11, 2022
```
It will remove fanout from mem_release.valid releated logic
```
84026448

18 11月, 2022 19 次提交

L

dcache: rename `dups` to `dup` · 779109e3
由 lixin 提交于 8月 10, 2022

779109e3
W
dcache: divide meta array into nWays banks (#1723) · 93f90faa
由 William Wang 提交于 8月 10, 2022
```
It should reduce dcache meta write fanout. Now dcache meta write
actually takes 2 cycles
```
93f90faa

sbuffer: opt mask clean fanout (#1720) · 8b1251e1

由 William Wang 提交于 8月 10, 2022

We used to clean mask in sbuffer in 1 cycle when do sbuffer enq,
which introduced 64*16 fanout.

To reduce fanout, now mask in sbuffer is cleaned when dcache hit resp
comes. Clean mask for a line in sbuffer takes 2 cycles.

Meanwhile, dcache reqIdWidth is also reduced from 64 to
log2Up(nEntries) max log2Up(StoreBufferSize).

This commit will not cause perf change.

8b1251e1

L

dcache: duplicate 3 more regs in cacheOpDecoder · 476e71e5
由 lixin 提交于 8月 10, 2022

476e71e5
Z

MainPipe: fix fanout of regs in stage 3 (#1718) · ca18e2c6
由 zhanglinjuan 提交于 8月 09, 2022

ca18e2c6
W
lq: update paddr in lq in load_s1 and load_s2 (#1707) · 0a47e4a1
由 William Wang 提交于 8月 09, 2022
```
Now we use 2 cycles to update paddr in lq. In this way,
paddr in lq is still valid in load_s3
```
0a47e4a1
L

dcache: duplicate cache_req_valid · 72e3aa13
由 lixin 提交于 8月 09, 2022

72e3aa13
L

dcache: duplicate regs in cacheOpDecoder · e47fc57c
由 lixin 提交于 8月 09, 2022

e47fc57c

lq: add 1 extra stage for lq data write (#1705) · 39f2ec76

由 William Wang 提交于 8月 09, 2022

Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish

Lq data will not be read in at least 2 cycles after write, so it is ok
to add this delay. For example:
T0: update lq meta, lq data write req start
T1: lq data write finish, new wbidx selected
T2: read lq data according to new wbidx selected

39f2ec76

W

misc: fix nanhu lsu cherry-pick conflict · c047ef9c
由 William Wang 提交于 11月 18, 2022

c047ef9c
W

std: add an extra pipe stage for std (#1704) · 0a992150
由 William Wang 提交于 8月 06, 2022

0a992150
Z

WritebackQueue: fix bug when ProbeAck is merged with a ReleaseData (#1709) · 5c01cc3c
由 zhanglinjuan 提交于 8月 06, 2022

5c01cc3c
H

dcache: duplicate registers for better fanout (#1700) · c3a5fe5f
由 happy-lx 提交于 8月 04, 2022

c3a5fe5f

dcache: fix fanout · b11ec622

由 lixin 提交于 7月 25, 2022

* pipelineReg in miss queue
* translated_cache_req_opCode and io_cache_req_valid_reg in cacheOpDecoder
* r_way_en_reg in bankedDataArray

b11ec622

dcache: delay wbq data update for 1 cycle (#1701) · 7a919e05

由 William Wang 提交于 8月 03, 2022

This commit and an extra cycle for miss queue store data and mask write.
For now, there are 18 missqueue entries. Each entry has a 512 bit
data reg and a 64 bit mask reg. If we update writeback queue data in 1
cycle, the fanout will be at least 18x(512+64) = 10368.

Now writeback queue req meta update is unchanged, however, data and mask
update will happen 1 cycle after req fire or release update fire (T0).
In T0, data and meta will be written to a buffer in missqueue.
In T1, s_data_merge or s_data_override in each missqueue entry will
be used as data and mask wen.

7a919e05

W

sq: always update data/addrModule when st s1_valid (#1703) · 29b5bc3c
由 William Wang 提交于 8月 03, 2022

29b5bc3c
W

dcache: use MissReqWoStoreData in missq entry · e771db6c
由 William Wang 提交于 8月 01, 2022

e771db6c

dcache: delay missq st data/mask write for 1 cycle · c731e79f

由 William Wang 提交于 8月 01, 2022

This commit and an extra cycle for miss queue store data and mask write.
For now, there are 16 missqueue entries. Each entry has a 512 bit store
data reg and a 64 bit store mask. If we update miss queue data in 1
cycle, the fanout will be at least 16x(512+64) = 9216.

Now missqueue req meta update is unchanged, however, store data and mask
update will happen 1 cycle after primary fire or secondary fire (T0).
In T0, store data and meta will be written to a buffer in missqueue.
In T1, s_write_storedata in each missqueue entry will be used as store
data and mask wen.

Miss queue entry data organization is also optimized. 512 bit
req.store_data is removed from miss queue entry. It should save
8192 bits in total.

c731e79f

W

dcache: fix rowBits parameter usage · af22dd7c
由 William Wang 提交于 8月 01, 2022

af22dd7c

OpenXiangShan / XiangShan 10 个月 前同步成功

OpenXiangShan / XiangShan
10 个月前同步成功