提交 · 3ad99c7ff9c928dc40cc7d4b0ec7ca0c83ec26eb · OpenXiangShan / XiangShan

11 9月, 2021 1 次提交
- Z
  
  BPU: Remove the false_hit_fix branch from the list of auto-run ci · 3ad99c7f
  由 zoujr 提交于 9月 11, 2021
  
  3ad99c7f
10 9月, 2021 3 次提交

Z

BPU: Fix bug that false hit in coremark 10 · 7f36ad77
由 zoujr 提交于 9月 10, 2021

7f36ad77

Use HuanCun instead of block-inclusive-cache (#1016) · a1ea7f76

由 Jiawei Lin 提交于 9月 10, 2021

* misc: add submodule huancun

* huancun: integrate huancun to SoC as L3

* remove l2prefetcher

* update huancun

* Bump HuanCun

* Use HuanCun instead old L2/L3

* bump huancun

* bump huancun

* Set L3NBanks to 4

* Update rocketchip

* Bump huancun

* Bump HuanCun

* Optimize debug configs

* Configs: fix L3 bug

* Add TLLogger

* TLLogger: fix release ack address

* Support write prefix into database

* Recoding more tilelink info

* Add a database output format converter

* missqueue: add difftest port for memory difftest during refill

* misc: bump difftest

* misc: bump difftest & huancun

* missqueue: do not check refill data when get Grant

* Add directory debug tool

* config: increase client dir size for non-inclusive cache

* Bump difftest and huancun

* Update l2/l3 cache configs

* Remove deprecated fpga/*

* Remove cache test

* Remove L2 preftecher

* bump huancun

* Params: turn on l2 prefetch by default

* misc: remove duplicate chisel-tester2

* misc: remove sifive inclusive cache

* bump difftest

* bump huancun

* config: use 4MB L3 cache

* bump huancun

* bump difftest

* bump difftest
Co-authored-by: Nwangkaifan <wangkaifan@ict.ac.cn>
Co-authored-by: NTangDan <tangdan@ict.ac.cn>

a1ea7f76

backend, rs: parallelize selection and data read (#1018) · 66c2a07b

由 Yinan Xu 提交于 9月 10, 2021

This commit changes how uop and data are read in reservation stations.
It helps the issue timing.

Previously, we access payload array and data array after we decide the
instructions that we want to issue. This method makes issue selection
and array access serialized and brings critial path.

In this commit, we add one more read port to payload array and data
array. This extra read port is for the oldest instruction. We decide
whether to issue the oldest instruction and read uop/data
simultaneously. This change reduces the critical path to each selection
logic + read + Mux (previously it's selection + arbitration + read).

Variable oldestOverride indicates whether we choose the oldest ready
instruction instead of the normal selection. An oldestFirst option is
added to RSParams to parameterize whether we need the age logic. By
default, it is set to true unless the RS is for ALU. If the timing for
aged ALU rs meets, we will enable it later.

66c2a07b

09 9月, 2021 3 次提交

mmu.l2tlb: partially rewrite fsm and miss queue for bug and optimization (#1007) · cc5a5f22

由 Lemover 提交于 9月 09, 2021

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry only supports page leaf entry
ptw supports all the three level entries

* mmu.tlb: fix bug of mq.refill_vpn and out.ready

* mmu.tlb: fix bug of perf counter

* mmu.tlb: l2tlb's l3 now 128 sets and 4 ways

* mmu.tlb: miss queue now will 'merge' same mem req addr

* mmu.l2tlb: ptw doesn't access last level pte

* mmu.l2tlb: add mem req mask into ptw

func block_decoupled doesn't work well and has bug in signal ready

* mmu.l2tlb: fix bug of sfence to fsm

add a new state s_check_pte to ptw
fsm now take memPte from outside, doesn't store it inside
mem_resp_valid will arrive a cycle before mem_resp_data

* mmu.l2tlb: rm some state in fsm

* mmu.tlb: set itlb default size

* mmu.l2tlb: unkonwn mq wait bug, change code style to avoid it

* mmu.l2tlb: opt, mq's entry with cache_l3 would not be blocked

* mmu.l2tlb: add many time out assert

* mmu.l2tlb: fix bug of mq enq state change & wait_id

* Revert "mmu.tlb: l2tlb's l3 now 128 sets and 4 ways"

This reverts commit 216e4192e4b01e68ce5502135318bc2473434907.

* Revert "mmu.tlb: set itlb default size"

This reverts commit 670bf1e408384964c601c0a55defbc767eb80698.

* mmu.l2tlb: set miss queue size to 9 and set filter size to 8

if they are equal, itlb may loss its req

cc5a5f22

backend: support instruction fusion cases (#1011) · 88825c5c

由 Yinan Xu 提交于 9月 09, 2021

This commit adds some simple instruction fusion cases in decode stage.
Currently we only implement instruction pairs that can be fused into
RV64GCB instructions.

Instruction fusions are detected in the decode stage by FusionDecoder.
The decoder checks every two instructions and marks the first
instruction fused if they can be fused into one instruction. The second
instruction is removed by setting the valid field to false.

Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.

Currently, ftq in frontend needs every instruction to commit. However,
the second instruction is removed from the pipeline and will not commit.
To solve this issue, we temporarily add more bits to isFused to indicate
the offset diff of the two fused instruction. There are four
possibilities now. This feature may be removed later.

This commit also adds more instruction fusion cases that need changes
in both the decode stage and the funtion units. In this commit, we add
some opcode to the function units and fuse the new instruction pairs
into these new internal uops.

The list of opcodes we add in this commit is shown below:
- szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
- szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
- byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
- sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
- sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
- sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
- sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
- oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
- oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
- orh48: mask off the first 16 bits and or with another operand
         (`andi r1, r0, -256`` + `or r1, r1, r2`)

Furthermore, this commit adds some complex instruction fusion cases to
the decode stage and function units. The complex instruction fusion cases
are detected after the instructions are decoded into uop and their
CtrlSignals are used for instruction fusion detection.

We add the following complex instruction fusion cases:
- addwbyte: addw and mask it with 0xff (extract the first byte)
- addwbit: addw and mask it with 0x1 (extract the first bit)
- logiclsb: logic operation and mask it with 0x1 (extract the first bit)
- mulw7: andi 127 and mulw instructions.
        Input to mul is AND with 0x7f if mulw7 bit is set to true.

88825c5c

L
mmu.tlb: set itlb's and l2tlb's size (#1014) · fa086d5e
由 Lemover 提交于 9月 09, 2021
```
* mmu.tlb: l2tlb's l3 now 128 sets and 4 ways

* mmu.tlb: set itlb default size
```
fa086d5e

08 9月, 2021 1 次提交
- Z
  alu, decode: fix alu instruction and change instruction name (#1012) · 0a6fa50e
  由 zfw 提交于 9月 08, 2021
```
* Alu: fix andn, orn, xnor

* Decode: change instruction name
```
  0a6fa50e
06 9月, 2021 4 次提交

S
Merge pull request #1002 from OpenXiangShan/decoupled-frontend · 31e152ef
由 Steve Gou 提交于 9月 06, 2021
```
add new ittage indirect target predictor
```
31e152ef
W
Merge pull request #987 from OpenXiangShan/fast-refill · 0292440a
由 William Wang 提交于 9月 06, 2021
```
dcache,lq: make dcache to lq refill faster
```
0292440a

exu: select RegNext(fflags) if fastNotImplemented (#1006) · 698b404a

由 Yinan Xu 提交于 9月 06, 2021

This commit assigns exu.io.out.fflags to RegNext(fu.io.fflags) if the
function unit has fastUopOut but has not implemented it. Previously
it causes a bug that fflags may be one cycle earlier than expected.

This commit also removes the extra logic in FmacExeUnit and
FmiscExeUnit. They are exactly the same as ExeUnit now.

698b404a

backend, rename: support elimination of move instruction whose lsrc is 0 + bug fix (#1008) · 31ebfb1d

由 YikeZhou 提交于 9月 06, 2021

* backend, rename: support elimination of mv inst whose lsrc=0
[known bug] instr page fault not properly raised after sfence.vma

* backend, roq: [bug fix] won't label me with exception as writebacked

31ebfb1d

05 9月, 2021 5 次提交

J

FPToFP: fix precision width && reuse fcmp to compute min/max (#1005) · 842f7991
由 Jiawei Lin 提交于 9月 05, 2021

842f7991
L

Merge remote-tracking branch 'origin/master' into decoupled-frontend · d392ebe5
由 Lingrui98 提交于 9月 05, 2021

d392ebe5

backend,exu: load balance between issue ports (#947) · bd278897

由 Yinan Xu 提交于 9月 05, 2021

This commit adds support for load balance between different issue ports
when the function unit is not pipelined and the reservation station has
more than one issue ports.

We use a ping pong bit to decide which port to issue the instruction. At
every clock cycle, the bit is flipped.

bd278897

mmu.l2tlb: l2tlb now supports multiple mem access at the same time (#1003) · b848eea5

由 Lemover 提交于 9月 05, 2021

* mmu.l2tlb: l2tlb now support multiple parallel mem accesses

8 missqueue entry and 1 page table worker
mq entry only supports page leaf entry
ptw supports all the three level entries

* mmu.tlb: fix bug of mq.refill_vpn and out.ready

b848eea5

utils,MaskData: assert wmask is wider than data (#1001) · 5dabf2df

由 Yinan Xu 提交于 9月 05, 2021

This commit adds assertion in MaskData to check the width of mask
and data. When the width of mask is smaller than the width of data,
(~mask & data) and (mask & data) will always clear the upper bits
of the data. This usually causes unexpected behavior.

This commit adds explicit width declarations where MaskData is used.

5dabf2df

04 9月, 2021 3 次提交
- J
  Makefile: add '--gen-mem-verilog' (#1000) · dfc810ae
  由 Jiawei Lin 提交于 9月 04, 2021
```
* Makefile: add '--gen-mem-verilog'
```
  dfc810ae
- J
  FMA: separate fmul/fadd/fma (#996) · 4b65fc7e
  由 Jiawei Lin 提交于 9月 04, 2021
```
* FMA: spearate fadd/fmul/fma

* exu: enable fast uop out from fmacExeUnit
Co-authored-by: NYinan Xu <xuyinan@ict.ac.cn>
```
  4b65fc7e
- L
  
  Merge remote-tracking branch 'origin/master' into decoupled-frontend · 9eb7e915
  由 Lingrui98 提交于 9月 04, 2021
  
  9eb7e915
03 9月, 2021 15 次提交
- J
  
  use ExtModule instead of Chisel3.BlackBox. (#988) · 510ae4ee
  由 Jiuyang Liu 提交于 9月 03, 2021
  
  510ae4ee
- L
  
  Merge remote-tracking branch 'origin/gen-sram-conf' into decoupled-frontend · 03ebac49
  由 Lingrui98 提交于 9月 03, 2021
  
  03ebac49
- L
  
  Makefile: add '--gen-mem-verilog' · a9bb1d5a
  由 LinJiawei 提交于 9月 03, 2021
  
  a9bb1d5a
- L
  
  parameters: ras size 32, btb size 4096 · ba4cf515
  由 Lingrui98 提交于 9月 03, 2021
  
  ba4cf515
- W
  
  Merge remote-tracking branch 'origin/master' into fast-refill · b460b7e4
  由 William Wang 提交于 9月 03, 2021
  
  b460b7e4
- S
  Merge pull request #992 from OpenXiangShan/decoupled-frontend-indirect · 1c8d55c9
  由 Steve Gou 提交于 9月 03, 2021
```
frontend: add ittage indirect predictor
```
  1c8d55c9
- G
  
  frontend: ittage: switch to full length jmp target · e5d060c1
  由 Guokai Chen 提交于 9月 03, 2021
  
  e5d060c1
- L
  
  bundle: add a full target in update bundle · abdbe4b7
  由 Lingrui98 提交于 9月 03, 2021
  
  abdbe4b7
- G
  
  frontend: ittage fix update valid condition · b0ac2a69
  由 Guokai Chen 提交于 9月 03, 2021
  
  b0ac2a69
- L
  
  bundle: add a full target in update bundle · 8ffcd86a
  由 Lingrui98 提交于 9月 03, 2021
  
  8ffcd86a
- J
  Multiplier: adjust pipeline (#993) · c3d7991b
  由 Jiawei Lin 提交于 9月 03, 2021
```
* Multiplier: adjust pipeline
```
  c3d7991b
- W
  Merge pull request #923 from OpenXiangShan/vaddr-fwd · 12233653
  由 William Wang 提交于 9月 03, 2021
```
mem: use vaddr based store to load forward for better timing
```
  12233653
- Y
  backend,fu: add InputBuffer for fdivSqrt (#990) · 6cdd85d9
  由 Yinan Xu 提交于 9月 03, 2021
```
This commit adds an 8-entry buffer for fdivSqrt function unit input.
Set hasInputBuffer to true to enable input buffers for other function
units.
```
  6cdd85d9
- G
  
  frontend: add ittage indirect predictor · 60f966c8
  由 Guokai Chen 提交于 9月 03, 2021
  
  60f966c8
- L
  ftq: modify jmpTarget in FtbEntry whenever jalr target changes · 3bcae573
  由 Lingrui98 提交于 9月 03, 2021
```
* previously we only modify jmpTarget on misprediction, and that's
  because we only use ftb to predict jalr target. However, with the
  presence of an indirect branch predictor, there exists such case
  that an indirect branch is correctly predicted when the target in
  ftb entry is wrong.
```
  3bcae573
02 9月, 2021 5 次提交

l0tlb: add a new level tlb, a load tlb and a store tlb (#961) · a0301c0d

由 Lemover 提交于 9月 02, 2021

* Revert "Revert "l0tlb: add a new level tlb to each mem pipeline (#936)" (#945)"

This reverts commit b052b972.

* fu: remove unused import

* mmu.tlb: 2 load/store pipeline has 1 dtlb

* mmu: remove btlb, the l1-tlb

* mmu: set split-tlb to 32 to check perf effect

* mmu: wrap tlb's param with TLBParameters

* mmu: add params 'useBTlb'

dtlb size is small: normal 8, super 2

* mmu.tlb: add Bundle TlbEntry, simplify tlb hit logic(coding)

* mmu.tlb: seperate tlb's storage, relative hit/sfence logic

tlb now supports full-associate, set-associate, directive-associate.
more: change tlb's parameter usage, change util.Random to support
case that mod is 1.

* mmu.tlb: support normalAsVictim, super(fa) -> normal(sa/da)

be carefull to use tlb's parameter, only a part of param combination
is supported

* mmu.tlb: fix bug of hit method and victim write

* mmu.tlb: add tlb storage's perf counter

* mmu.tlb: rewrite replace part, support set or non-set

* mmu.tlb: add param outReplace to receive out replace index

* mmu.tlb: change param superSize to superNWays

add param superNSets, which should always be 1

* mmu.tlb: change some perf counter's name and change some params

* mmu.tlb: fix bug of replace io bundle

* mmu.tlb: remove unused signal wayIdx in tlbstorageio

* mmu.tlb: separate tlb_ld/st into two 'same' tlb

* mmu.tlb: when nWays is 1, replace returns 0.U

before, replace will return 1.U, no influence for refill but bad
for perf counter

* mmu.tlb: give tlb_ld and tlb_st a name (in waveform)

a0301c0d

W

chore: fix frontend / memblock merge conflict · 588e93e0
由 William Wang 提交于 9月 02, 2021

588e93e0
W

chore: fix frontend / memblock merge conflict · 154904ce
由 William Wang 提交于 9月 02, 2021

154904ce
W

Merge remote-tracking branch 'origin/master' into fast-refill · b603de60
由 William Wang 提交于 9月 02, 2021

b603de60
W

Merge branch 'master' into vaddr-fwd · b9ec0501
由 William Wang 提交于 9月 02, 2021

b9ec0501

OpenXiangShan / XiangShan 11 个月 前同步成功

OpenXiangShan / XiangShan
11 个月前同步成功