提交 · 6474c47fd580c2adf74e4bf4adac858188f021fe · OpenXiangShan / XiangShan

14 7月, 2022 1 次提交

rob: optimize timing for commit and walk (#1644) · 6474c47f

由 Yinan Xu 提交于 7月 14, 2022

* rob: separate walk and commit valid bits

* rob: optimize instrCnt timing

* rob: fix blockCommit condition when flushPipe

When flushPipe is enabled, it will block commits in ROB. However,
in the deqPtrModule, the commit is not blocked. This commit fixes
the issue.

6474c47f

13 7月, 2022 1 次提交

decode: move the soft-prefetch decoder to rename (#1646) · f025d715

由 Yinan Xu 提交于 7月 13, 2022

This commit moves the decoder of software prefetch instructions to
the rename stage.

Previously the decoding of software prefetch instructions affects
the imm gen and causes a long critical path.

f025d715

12 7月, 2022 1 次提交

ctrl: optimize freelist timing (#1633) · 66b2c4a4

由 Yinan Xu 提交于 7月 12, 2022

* rat: map all arch registers to zero when init

* freelist: fix stepBack width

* freelist: fix timing of free offset

66b2c4a4

09 7月, 2022 1 次提交

decode: move fusion decoder result Mux to rename (#1631) · 0febc381

由 Yinan Xu 提交于 7月 09, 2022

This commit moves the fusion decoder to both decode and rename stage.

In the decode stage, fusion decoder determines whether the instruction
pairs can be fused. Valid bits of decode are not affected by fusion
decoder. This should fix the timing issues of rename.valid.

In the rename stage, some fields are updated according the result of
fusion decoder. This will bring a minor timing path to both valid and
other fields in uop in the rename stage. However, since freelist and
rat have worse timing. This should not cause timing issues.

0febc381

06 7月, 2022 2 次提交

rob: add separated optimized walk valid bits (#1614) · c51eab43

由 Yinan Xu 提交于 7月 06, 2022

Some modules rely on the walk valid bits of ROB. This commit
optimizes the timing by providing separated walk valid bits, which
is far better than the commit valid bits.

c51eab43

dpq: optimize read and write timing of data module (#1610) · 00210c34

由 Yinan Xu 提交于 7月 06, 2022

This commit changes the data modules in Dispatch Queue. We use one-hot
indices to read and write the data array.

00210c34

25 6月, 2022 1 次提交

freelist: optimize timing of read and writing (#1593) · 5ef86c38

由 Yinan Xu 提交于 6月 25, 2022

This commit optimizes the timing of freelist by changing the updating
function of headPtr and tailPtr.

We maintains an one-hot representation of headPtr and further uses it to
read the free registers from the list, which should be better than the
previous implementation where headPtr is used to indexed into the queue.

The update of tailPtr and the freelist is delayed by one cycle to
optimize the timing. Because freelist allocates new registers in the
next cycle iff there are more than RenameWidth free registers in this
cycle. The freed registers in this cycle will never be used in the next
cycle. Thus, we can delay the updating of queue data to the next cycle.
We also move the update of tailPtr to the next cycle, since PopCount
takes a long timing and we move the last adder to the next cycle. Now
the adder works parallely with PopCount. That is, the updating of
tailPtr is pipelined.

5ef86c38

21 6月, 2022 1 次提交
- Y
  core,perf: optimize timing for some registers (#1589) · 0c2f5c4a
  由 Yinan Xu 提交于 6月 21, 2022
```
This commit adds some registers for performance counters to optimize
the timing. Pipelines are added.
```
  0c2f5c4a
20 6月, 2022 1 次提交
- Y
  
  decode: parallel fusion decoder and rat read (#1588) · a0db5a4b
  由 Yinan Xu 提交于 6月 20, 2022
  
  a0db5a4b
20 12月, 2021 1 次提交
- L
  
  Merge branch 'master' into trigger · a4e57ea3
  由 Li Qianruo 提交于 12月 20, 2021
  
  a4e57ea3
16 12月, 2021 1 次提交
- Y
  
  rename: check valid condition for lui (#1368) · 89c0fb0a
  由 Yinan Xu 提交于 12月 16, 2021
  
  89c0fb0a
15 12月, 2021 2 次提交

Debug Mode: support difftest with spike (#1363) · f1c56d6c

由 Li Qianruo 提交于 12月 15, 2021

* Debug Mode: support basic difftest with spike

* Debug Mode: fix some bugs

Bugs fixed are:
1. All interrupts and exceptions cause debug mode to enter park loop
2. Debug interrupt ignored due to flushPipe

f1c56d6c

rename: add fused lui and load (#1356) · fd7603d9

由 Yinan Xu 提交于 12月 15, 2021

This commit adds fused load support by bypassing LUI results to load.

For better timing, detection is done at the rename stage. Imm is stored
in psrc(1), psrc(0) and imm.

fd7603d9

14 12月, 2021 1 次提交
- Y
  
  difftest: move sc_valid to AtomicsUnit (#1350) · e13d224a
  由 Yinan Xu 提交于 12月 14, 2021
  
  e13d224a
10 12月, 2021 1 次提交

core: refactor hardware performance counters (#1335) · 1ca0e4f3

由 Yinan Xu 提交于 12月 10, 2021

This commit optimizes the coding style and timing for hardware
performance counters.

By default, performance counters are RegNext(RegNext(_)).

1ca0e4f3

26 11月, 2021 1 次提交

refCounter: optimize timing for freeRegs (#1255) · 459d1cae

由 Yinan Xu 提交于 11月 26, 2021

This commit changes how isFreed is calculated. Instead of using
refCounter in the next, we compute it at this cycle and RegNext it.

459d1cae

23 11月, 2021 1 次提交

mem,mdp: use robIdx instead of sqIdx (#1242) · 980c1bc3

由 William Wang 提交于 11月 23, 2021

* mdp: implement SSIT with sram

* mdp: use robIdx instead of sqIdx

Dispatch refactor moves lsq enq to dispatch2, as a result, mdp can not
get correct sqIdx in dispatch. Unlike robIdx, it is hard to maintain a
"speculatively assigned" sqIdx, as it is hard to track store insts in
dispatch queue. Yet we can still use "speculatively assigned" robIdx
for memory dependency predictor.

For now, memory dependency predictor uses "speculatively assigned"
robIdx to track inflight store.

However, sqIdx is still used to track those store which's addr is valid
but data it not valid. When load insts try to get forward data from
those store, load insts will get that store's sqIdx and wait in RS.
They will not waken until store data with that sqIdx is issued.

* mdp: add track robIdx recover logic

980c1bc3

12 11月, 2021 1 次提交

difftest: add basic difftest features for releases (#1219) · cbe9a847

由 Yinan Xu 提交于 11月 12, 2021

* difftest: add basic difftest features for releases

This commit adds basic difftest features for every release, no matter
it's for simulation or physical design. The macro SYNTHESIS is used to
skip these logics when synthesizing the design. This commit aims at
allowing designs for physical design to be verified.

* bump ready-to-run

* difftest: add int and fp writeback data

cbe9a847

24 10月, 2021 1 次提交

lsq: enqueue at dispatch2 stage (#1167) · 7057cff8

由 Yinan Xu 提交于 10月 24, 2021

This commit changes when instructions enter load/store queue.
Now, at dispatch2, load/store instructions enter load/store queue.

7057cff8

23 10月, 2021 1 次提交
- R
  add performance counters at core and hauncun (#1156) · cd365d4c
  由 rvcoresjw 提交于 10月 23, 2021
```
* Add perf counters
* add reg from hpm counter source
* add print perfcounter enable
```
  cd365d4c
22 10月, 2021 1 次提交

rob: optimize bits width in storage (#1155) · c3abb8b6

由 Yinan Xu 提交于 10月 22, 2021

This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits.

* isFused is merged with commitType (2 bits reduced)
* crossPageIPFFix is used only in ExceptionGen (1 bit reduced)
* rename: reduce ldest usages
* decode: set isMove to false if ldest is zero

c3abb8b6

21 10月, 2021 1 次提交

refCounter: delay de-allocation for one more cycle (#1144) · 103fe42b

由 Yinan Xu 提交于 10月 21, 2021

This commit changes how de-allocation is done in RefCounter. One cycle
after we update the reference counters, the free registers are released
to the freelist.

Previous version creates a critical path, starting from deallocate ports
and ending at freelist registers. This commit adds one more cycle in the
allocation --> updating reference counters --> freeing physical
registers --> allocation loop.

103fe42b

17 10月, 2021 2 次提交

rename: don't update refCounter 0 (#1126) · ca1763c2

由 Yinan Xu 提交于 10月 17, 2021

This commit removes the update logic for ref counter 0.

For simplicity, we don't count the number of references for physical
register 0. It should never be released to freelist.

Previously we track register 0's references. It works fine but it makes
the performance counters confusing because it may increase to a large
number. It never causes real issues.

ca1763c2

backend: remove lsrc usages after rename (#1124) · a020ce37

由 Yinan Xu 提交于 10月 17, 2021

This commit removes lsrc usages in the fence unit and lsrc is no longer
needed after an instruction is renamed. It helps timing and area.

lsrc is placed in imm at rename stage (the last stage we need lsrc).
They are extracted in the fence unit. Imm needs to go through the
pipelines because Jump needs it (and we re-use it for lsrc).

a020ce37

16 10月, 2021 2 次提交

rename: support full-featured move elimination (#1123) · 70224bf6

由 Yinan Xu 提交于 10月 16, 2021

This commit optimizes the move elimination implementation.

Reference counting for every physical register is recorded. Originally
0-31 registers have counters of ones. Every time the physical register
is allocated or deallocated, the counter is increased or decreased by
one. When the counter becomes zero from a non-zero value, the register
is freed and released to freelist.

70224bf6

core: use redirect ports for flush (#1121) · f4b2089a

由 Yinan Xu 提交于 10月 16, 2021

This commit removes flush IO for every module. Flush now re-uses
redirect ports to flush the instructions.

f4b2089a

11 10月, 2021 1 次提交

bump chisel and code clean up (#1104) · aef67050

由 Yinan Xu 提交于 10月 11, 2021

* bump chisel to 3.5.0-RC1

We don't want to use SNAPSHOT version any more because we don't know
what will happen when we wake up in the morning.

* misc: remove TMA_* to avoid conflicts

aef67050

10 10月, 2021 1 次提交

renameTable: optimize read and write timing (#1101) · 7fa2c198

由 Yinan Xu 提交于 10月 10, 2021

This commit optimizes RenameTable's timing.

Read addresses come from instruction buffer directly and has best
timing. So we let data read at decode stage and bypass write data
from this clock cycle to the read data at next cycle.

For write, we latch the write request and process it at the next cycle.

7fa2c198

28 9月, 2021 1 次提交
- Y
  misc: code clean up (#1073) · 9aca92b9
  由 Yinan Xu 提交于 9月 28, 2021
```
* rename Roq to Rob

* remove trailing whitespaces

* remove unused parameters
```
  9aca92b9
22 9月, 2021 1 次提交
- Y
  backend, freelist: shrink verilog size by using scala variable · c63125be
  由 YikeZhou 提交于 9月 22, 2021
```
instead of chisel var in MEFreeList.scala
```
  c63125be
21 9月, 2021 1 次提交
- Y
  
  backend, freelist: simplify walk logic · 802dc347
  由 YikeZhou 提交于 9月 21, 2021
  
  802dc347
19 9月, 2021 3 次提交

Y

backend, freelist: remove unused log & assertions · 20acd4ae
由 YikeZhou 提交于 9月 19, 2021

20acd4ae
Y
backend, freelist: modify free list allocatePhyReg logic · 8949e3b0
由 YikeZhou 提交于 9月 19, 2021
```
1) generate ptr and preg in a vec first
2) use renameEnable to replace common parts in allocating logic
```
8949e3b0

core: add timer counters for important stages (#1045) · ebb8ebf8

由 Yinan Xu 提交于 9月 19, 2021

This commit adds timer counters for some important pipeline stages,
including rename, dispatch, dispatch2, select, issue, execute, commit.
We add performance counters for different types of instructions to see
the latency in different pipeline stages.

ebb8ebf8

18 9月, 2021 1 次提交

backend, freelist: opt flush process in MEFreeList · 23304efd

由 YikeZhou 提交于 9月 18, 2021

1) bug fix: updateArchRefCounter should be related with pdest, not
old_pdest
2) remove complicated logic of headPtr recovery when flushing

23304efd

13 9月, 2021 1 次提交
- Y
  
  backend, rename: elimination psrc directly from intRat · 0153cd55
  由 YikeZhou 提交于 9月 13, 2021
  
  0153cd55
12 9月, 2021 1 次提交
- Y
  
  backend, rename: optimize MEFreeList free logic · 62d2a04b
  由 YikeZhou 提交于 9月 12, 2021
  
  62d2a04b
09 9月, 2021 1 次提交

backend: support instruction fusion cases (#1011) · 88825c5c

由 Yinan Xu 提交于 9月 09, 2021

This commit adds some simple instruction fusion cases in decode stage.
Currently we only implement instruction pairs that can be fused into
RV64GCB instructions.

Instruction fusions are detected in the decode stage by FusionDecoder.
The decoder checks every two instructions and marks the first
instruction fused if they can be fused into one instruction. The second
instruction is removed by setting the valid field to false.

Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.

Currently, ftq in frontend needs every instruction to commit. However,
the second instruction is removed from the pipeline and will not commit.
To solve this issue, we temporarily add more bits to isFused to indicate
the offset diff of the two fused instruction. There are four
possibilities now. This feature may be removed later.

This commit also adds more instruction fusion cases that need changes
in both the decode stage and the funtion units. In this commit, we add
some opcode to the function units and fuse the new instruction pairs
into these new internal uops.

The list of opcodes we add in this commit is shown below:
- szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
- szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
- byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
- sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
- sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
- sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
- sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
- oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
- oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
- orh48: mask off the first 16 bits and or with another operand
         (`andi r1, r0, -256`` + `or r1, r1, r2`)

Furthermore, this commit adds some complex instruction fusion cases to
the decode stage and function units. The complex instruction fusion cases
are detected after the instructions are decoded into uop and their
CtrlSignals are used for instruction fusion detection.

We add the following complex instruction fusion cases:
- addwbyte: addw and mask it with 0xff (extract the first byte)
- addwbit: addw and mask it with 0x1 (extract the first bit)
- logiclsb: logic operation and mask it with 0x1 (extract the first bit)
- mulw7: andi 127 and mulw instructions.
        Input to mul is AND with 0x7f if mulw7 bit is set to true.

88825c5c

06 9月, 2021 2 次提交
- Y
  
  MEFreeList: use tailPtr instead of tailPtrNext in free reg cnt · e92092e7
  由 YikeZhou 提交于 9月 06, 2021
  
  e92092e7
- Y
  backend, rename: support elimination of move instruction whose lsrc is 0 + bug fix (#1008) · 31ebfb1d
  由 YikeZhou 提交于 9月 06, 2021
```
* backend, rename: support elimination of mv inst whose lsrc=0
[known bug] instr page fault not properly raised after sfence.vma

* backend, roq: [bug fix] won't label me with exception as writebacked
```
  31ebfb1d

OpenXiangShan / XiangShan 11 个月 前同步成功

OpenXiangShan / XiangShan
11 个月前同步成功