提交 · 31ebfb1dd00f6473404eca1f5d9153cc53e133a7 · OpenXiangShan / XiangShan

06 9月, 2021 1 次提交

backend, rename: support elimination of move instruction whose lsrc is 0 + bug fix (#1008) · 31ebfb1d

由 YikeZhou 提交于 9月 06, 2021

* backend, rename: support elimination of mv inst whose lsrc=0
[known bug] instr page fault not properly raised after sfence.vma

* backend, roq: [bug fix] won't label me with exception as writebacked

31ebfb1d

05 9月, 2021 4 次提交

J

FPToFP: fix precision width && reuse fcmp to compute min/max (#1005) · 842f7991
由 Jiawei Lin 提交于 9月 05, 2021

842f7991

backend,exu: load balance between issue ports (#947) · bd278897

由 Yinan Xu 提交于 9月 05, 2021

This commit adds support for load balance between different issue ports
when the function unit is not pipelined and the reservation station has
more than one issue ports.

We use a ping pong bit to decide which port to issue the instruction. At
every clock cycle, the bit is flipped.

bd278897

mmu.l2tlb: l2tlb now supports multiple mem access at the same time (#1003) · b848eea5

由 Lemover 提交于 9月 05, 2021

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry only supports page leaf entry
ptw supports all the three level entries

* mmu.tlb: fix bug of mq.refill_vpn and out.ready

b848eea5

utils,MaskData: assert wmask is wider than data (#1001) · 5dabf2df

由 Yinan Xu 提交于 9月 05, 2021

This commit adds assertion in MaskData to check the width of mask
and data. When the width of mask is smaller than the width of data,
(~mask & data) and (mask & data) will always clear the upper bits
of the data. This usually causes unexpected behavior.

This commit adds explicit width declarations where MaskData is used.

5dabf2df

04 9月, 2021 1 次提交

FMA: separate fmul/fadd/fma (#996) · 4b65fc7e

由 Jiawei Lin 提交于 9月 04, 2021

* FMA: spearate fadd/fmul/fma

* exu: enable fast uop out from fmacExeUnit
Co-authored-by: NYinan Xu <xuyinan@ict.ac.cn>

4b65fc7e

03 9月, 2021 3 次提交
- J
  
  use ExtModule instead of Chisel3.BlackBox. (#988) · 510ae4ee
  由 Jiuyang Liu 提交于 9月 03, 2021
  
  510ae4ee
- J
  Multiplier: adjust pipeline (#993) · c3d7991b
  由 Jiawei Lin 提交于 9月 03, 2021
```
* Multiplier: adjust pipeline
```
  c3d7991b
- Y
  backend,fu: add InputBuffer for fdivSqrt (#990) · 6cdd85d9
  由 Yinan Xu 提交于 9月 03, 2021
```
This commit adds an 8-entry buffer for fdivSqrt function unit input.
Set hasInputBuffer to true to enable input buffers for other function
units.
```
  6cdd85d9
02 9月, 2021 4 次提交

l0tlb: add a new level tlb, a load tlb and a store tlb (#961) · a0301c0d

由 Lemover 提交于 9月 02, 2021

* Revert "Revert "l0tlb: add a new level tlb to each mem pipeline (#936)" (#945)"

This reverts commit b052b972.

* fu: remove unused import

* mmu.tlb: 2 load/store pipeline has 1 dtlb

* mmu: remove btlb, the l1-tlb

* mmu: set split-tlb to 32 to check perf effect

* mmu: wrap tlb's param with TLBParameters

* mmu: add params 'useBTlb'

dtlb size is small: normal 8, super 2

* mmu.tlb: add Bundle TlbEntry, simplify tlb hit logic(coding)

* mmu.tlb: seperate tlb's storage, relative hit/sfence logic

tlb now supports full-associate, set-associate, directive-associate.
more: change tlb's parameter usage, change util.Random to support
case that mod is 1.

* mmu.tlb: support normalAsVictim, super(fa) -> normal(sa/da)

be carefull to use tlb's parameter, only a part of param combination
is supported

* mmu.tlb: fix bug of hit method and victim write

* mmu.tlb: add tlb storage's perf counter

* mmu.tlb: rewrite replace part, support set or non-set

* mmu.tlb: add param outReplace to receive out replace index

* mmu.tlb: change param superSize to superNWays

add param superNSets, which should always be 1

* mmu.tlb: change some perf counter's name and change some params

* mmu.tlb: fix bug of replace io bundle

* mmu.tlb: remove unused signal wayIdx in tlbstorageio

* mmu.tlb: separate tlb_ld/st into two 'same' tlb

* mmu.tlb: when nWays is 1, replace returns 0.U

before, replace will return 1.U, no influence for refill but bad
for perf counter

* mmu.tlb: give tlb_ld and tlb_st a name (in waveform)

a0301c0d

W

chore: fix frontend / memblock merge conflict · 154904ce
由 William Wang 提交于 9月 02, 2021

154904ce

rs,mem: support fast load-to-load wakeup and issue (#984) · 718f8a60

由 Yinan Xu 提交于 9月 02, 2021

This PR adds support for fast load-to-load wakeup and issue. In load-to-load fast wakeup and issue, load-to-load latency is reduced to 2 cycles.

Now a load instruction can wakeup another load instruction at LOAD stage 1. When the producer load instruction arrives at stage 2, the consumer load instruction is issued to load stage 0 and using data from the producer to generate load address.

In reservation station, load can be dequeued from staged 1 when stage 2 does not have a valid instruction. If the fast load is not accepted, from the next cycle on, the load will dequeue as normal.

Timing in reservation station (for imm read) and load unit (for writeback data selection) to be optimized later.

* backend,rs: issue load one cycle earlier when possible

This commit adds support for issuing load instructions one cycle
earlier if the load instruction is wakeup by another load. An extra
2-bit UInt is added to IO.

* mem: add load to load addr fastpath framework

* mem: enable load to load forward

* mem: add load-load forward counter
Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>

718f8a60

Y
Rename: fix doAllocate logic in refactored version · 4efb89cb
由 YikeZhou 提交于 9月 02, 2021
```
MEFreeList: remove useless code + give specified
(instead of DontCare) value to phy reg allocated port
```
4efb89cb

01 9月, 2021 10 次提交
- L
  
  frontend: code clean ups · 09c6f1dd
  由 Lingrui98 提交于 9月 01, 2021
  
  09c6f1dd
- L
  icache: add license · 290c77af
  由 Lingrui98 提交于 9月 01, 2021
```
config: remove MinimalSimConfigForFetch

bundle: code clean ups

bundle, xscore: code clean ups
```
  290c77af
- L
  
  ftq: fix bpuInfo csr perf counters · 142e964c
  由 Lingrui98 提交于 9月 01, 2021
  
  142e964c
- L
  
  frontend: remove deprecated code · 0659cc94
  由 Lingrui98 提交于 9月 01, 2021
  
  0659cc94
- J
  IntToFP: support fully pipelined work mode (#983) · e174d629
  由 Jiawei Lin 提交于 9月 01, 2021
```
* IntToFP: support fully pipelined mode
```
  e174d629
- W
  Revert "mem: add load to load addr fastpath framework" · ea04bf23
  由 William Wang 提交于 9月 01, 2021
```
This reverts commit e3f759ae.
```
  ea04bf23
- W
  
  sbuffer: fix full eviction trigger logic · 86d8a1ad
  由 William Wang 提交于 9月 01, 2021
  
  86d8a1ad
- W
  
  sbuffer: add perf conuter · f5aff2a7
  由 William Wang 提交于 9月 01, 2021
  
  f5aff2a7
- J
  
  expand ICache to 8-way 128KB. · 845af832
  由 JinYue 提交于 9月 01, 2021
  
  845af832
- Y
  backend, fu: support fastUopOut for pipelined fu (#966) · b2482bc1
  由 Yinan Xu 提交于 9月 01, 2021
```
This commit adds fastUopOut support for pipelined function units via
implementing fastUopOut in trait HasPipelineReg.

The following function units now support fastUopOut:
- MUL
- FMA
- F2I
- F2F
```
  b2482bc1
31 8月, 2021 4 次提交

J
fudian: The new floating-point lib to replace hardfloat (#975) · dc597826
由 Jiawei Lin 提交于 8月 31, 2021
```
* Add submodule 'fudian'

* IntToFP: use fudian

* FMA: use fudian.CMA

* FPToInt: remove recode format
```
dc597826
L

ftq: fix a bug of modifying entry_hit_status too early when ifu stalls · b58d2039
由 Lingrui98 提交于 8月 31, 2021

b58d2039

Alu: optimize timing for bitmanip (#979) · 28c18878

由 zfw 提交于 8月 31, 2021

* Alu: optimize timing

This pull request optimizes timing by adding a 32bit adder for addw and changing the encode.

28c18878

backend,exu: connect writeback when possible (#977) · dd381594

由 Yinan Xu 提交于 8月 31, 2021

This commit optimizes ExuBlock timing by connecting writeback when
possible.

The timing priorities are RegNext(rs.fastUopOut) > fu.writeback >
arbiter.out(--> io.rfWriteback --> rs.writeback). The higher priority,
the better timing.

(1) When function units have exclusive writeback ports, their
wakeup ports for reservation stations can be connected directly from
function units' writeback ports. Special case: when the function unit
has fastUopOut, valid and uop should be RegNext.

(2) If the reservation station has fastUopOut for all instructions
in this exu, we should replace io.fuWriteback with RegNext(fastUopOut).
In this case, the corresponding execution units must have exclusive
writeback ports, unless it's impossible that rs can ensure the
instruction is able to write the regfile.

(3) If the reservation station has fastUopOut for all instructions in
this exu, we should replace io.rfWriteback (rs.writeback) with
RegNext(rs.wakeupOut).

dd381594

30 8月, 2021 3 次提交
- R
  
  update base table update logic, update pred table and alt_pred table update logic · 9aee2f1b
  由 rvcoesjw 提交于 8月 30, 2021
  
  9aee2f1b
- Y
  
  MEFreeList: replace "+" with "+&" in reduceTree · 90f13a3a
  由 YikeZhou 提交于 8月 30, 2021
  
  90f13a3a
- J
  Bump chisel to 3.5 (#974) · c21bff99
  由 Jiawei Lin 提交于 8月 30, 2021
```
* bump chisel to 3.5

* Remove deprecated 'toBool' && disable tl monitor

* Update RocketChip / Re-enable TLMonitor

* Makefile: remove '--infer-rw'
```
  c21bff99
29 8月, 2021 2 次提交

l2tlb: mem access now takes 512 bits, 8 ptes (#973) · 5854c1ed

由 Lemover 提交于 8月 29, 2021

* mmu: wrap l2tlb's param withL2TLBParameters

* mmu.l2tlb: add param blockBytes: 64, 8 ptes

* mmu.l2tlb: set l2tlb cache size to l2:256, l3:4096

* mmu.l2tlb: add config print

* mmu.l2tlb: fix bug of resp data indices choosen and opt coding style

5854c1ed

rs,bypass: add left and right bypass strategy (#971) · 605f31fc

由 Yinan Xu 提交于 8月 29, 2021

* rs,bypass: remove optBuf for valid bits

* rs,bypass: add left and right bypass strategy

This commit adds another bypass network implementation to optimize timing of the first stage of function units.

In BypassNetworkLeft, we bypass data at the same cycle that function units write data back. This increases the length of the critical path of the last stage of function units but reduces the length of the critical path of the first stage of function units. Some function units that require a shorter stage zero, like LOAD, may use BypassNetworkLeft.

In this commit, we set all bypass networks to the left style, but we will make it configurable depending on different function units in the future.

605f31fc

28 8月, 2021 4 次提交

rs,age: optimize timing for output (#970) · 9bc8f3e1

由 Yinan Xu 提交于 8月 28, 2021

This commit changes how io.out is computed for age detector. We use a
register to keep track of the position of the oldest instruction. Since
the updating information has better timing than issue, this could
optimize the timing of issue logic.

9bc8f3e1

tage-sc: fix performance bugs · f2a26b84

由 Lingrui98 提交于 8月 28, 2021

* modify UBitPeriod to one-eights of the previous value to adapt
  to nRows enlarged by eight times
* fix a bug assigning sc update mask

f2a26b84

L
bpu: add redirect logic between stages for circumstances where directions... · c14b8e27
由 Lingrui98 提交于 8月 28, 2021
```
bpu: add redirect logic between stages for circumstances where directions differ but targets remain the same
```
c14b8e27
L

ubtb: add update bypass reg to avoid multiple hits at prediction · 72751938
由 Lingrui98 提交于 8月 27, 2021

72751938

27 8月, 2021 4 次提交

L

ftq: add perf counter for predecode redirect · c92646b5
由 Lingrui98 提交于 8月 27, 2021

c92646b5

rs,age: use less registers for age matrix (#964) · 38683dba

由 Yinan Xu 提交于 8月 27, 2021

This commit reduces register usage in age detector via using the
upper matrix only. Since the age matrix is symmetric, age(i)(j)
equals !age(j)(i). Besides, age(i)(i) is the same as valid(i).
Thus, we also remove validVec in this commit.

38683dba

backend,fu: allow early arbitration via fastUopOut (#962) · f83b578a

由 Yinan Xu 提交于 8月 27, 2021

This commit adds a fastUopOut option to function units. This allows the
function units to give valid and uop one cycle before its output data is
ready. FastUopOut lets writeback arbitration happen one cycle before
data is ready and helps optimize the timing.

Since some function units are not ready for this new feature, this
commit adds a fastImplemented option to allow function units to have
fastUopOut but the data is still at the same cycle as uop. This option
will delay the data for one cycle and may cause performance degradation.
FastImplemented should be true after function units support fastUopOut.

f83b578a

L

ftb, ubtb: only store lower bits of target · e6231032
由 Lingrui98 提交于 8月 27, 2021

e6231032

OpenXiangShan / XiangShan 10 个月 前同步成功

OpenXiangShan / XiangShan
10 个月前同步成功