- 14 7月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
* rob: separate walk and commit valid bits * rob: optimize instrCnt timing * rob: fix blockCommit condition when flushPipe When flushPipe is enabled, it will block commits in ROB. However, in the deqPtrModule, the commit is not blocked. This commit fixes the issue.
-
- 13 7月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit moves the decoder of software prefetch instructions to the rename stage. Previously the decoding of software prefetch instructions affects the imm gen and causes a long critical path.
-
- 12 7月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
* rat: map all arch registers to zero when init * freelist: fix stepBack width * freelist: fix timing of free offset
-
- 09 7月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit moves the fusion decoder to both decode and rename stage. In the decode stage, fusion decoder determines whether the instruction pairs can be fused. Valid bits of decode are not affected by fusion decoder. This should fix the timing issues of rename.valid. In the rename stage, some fields are updated according the result of fusion decoder. This will bring a minor timing path to both valid and other fields in uop in the rename stage. However, since freelist and rat have worse timing. This should not cause timing issues.
-
- 06 7月, 2022 2 次提交
-
-
由 Yinan Xu 提交于
Some modules rely on the walk valid bits of ROB. This commit optimizes the timing by providing separated walk valid bits, which is far better than the commit valid bits.
-
由 Yinan Xu 提交于
This commit changes the data modules in Dispatch Queue. We use one-hot indices to read and write the data array.
-
- 25 6月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit optimizes the timing of freelist by changing the updating function of headPtr and tailPtr. We maintains an one-hot representation of headPtr and further uses it to read the free registers from the list, which should be better than the previous implementation where headPtr is used to indexed into the queue. The update of tailPtr and the freelist is delayed by one cycle to optimize the timing. Because freelist allocates new registers in the next cycle iff there are more than RenameWidth free registers in this cycle. The freed registers in this cycle will never be used in the next cycle. Thus, we can delay the updating of queue data to the next cycle. We also move the update of tailPtr to the next cycle, since PopCount takes a long timing and we move the last adder to the next cycle. Now the adder works parallely with PopCount. That is, the updating of tailPtr is pipelined.
-
- 21 6月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds some registers for performance counters to optimize the timing. Pipelines are added.
-
- 20 6月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
-
- 20 12月, 2021 1 次提交
-
-
由 Li Qianruo 提交于
-
- 16 12月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
-
- 15 12月, 2021 2 次提交
-
-
由 Li Qianruo 提交于
* Debug Mode: support basic difftest with spike * Debug Mode: fix some bugs Bugs fixed are: 1. All interrupts and exceptions cause debug mode to enter park loop 2. Debug interrupt ignored due to flushPipe
-
由 Yinan Xu 提交于
This commit adds fused load support by bypassing LUI results to load. For better timing, detection is done at the rename stage. Imm is stored in psrc(1), psrc(0) and imm.
-
- 14 12月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
-
- 10 12月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit optimizes the coding style and timing for hardware performance counters. By default, performance counters are RegNext(RegNext(_)).
-
- 26 11月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit changes how isFreed is calculated. Instead of using refCounter in the next, we compute it at this cycle and RegNext it.
-
- 23 11月, 2021 1 次提交
-
-
由 William Wang 提交于
* mdp: implement SSIT with sram * mdp: use robIdx instead of sqIdx Dispatch refactor moves lsq enq to dispatch2, as a result, mdp can not get correct sqIdx in dispatch. Unlike robIdx, it is hard to maintain a "speculatively assigned" sqIdx, as it is hard to track store insts in dispatch queue. Yet we can still use "speculatively assigned" robIdx for memory dependency predictor. For now, memory dependency predictor uses "speculatively assigned" robIdx to track inflight store. However, sqIdx is still used to track those store which's addr is valid but data it not valid. When load insts try to get forward data from those store, load insts will get that store's sqIdx and wait in RS. They will not waken until store data with that sqIdx is issued. * mdp: add track robIdx recover logic
-
- 12 11月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
* difftest: add basic difftest features for releases This commit adds basic difftest features for every release, no matter it's for simulation or physical design. The macro SYNTHESIS is used to skip these logics when synthesizing the design. This commit aims at allowing designs for physical design to be verified. * bump ready-to-run * difftest: add int and fp writeback data
-
- 24 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit changes when instructions enter load/store queue. Now, at dispatch2, load/store instructions enter load/store queue.
-
- 23 10月, 2021 1 次提交
-
-
由 rvcoresjw 提交于
* Add perf counters * add reg from hpm counter source * add print perfcounter enable
-
- 22 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits. * isFused is merged with commitType (2 bits reduced) * crossPageIPFFix is used only in ExceptionGen (1 bit reduced) * rename: reduce ldest usages * decode: set isMove to false if ldest is zero
-
- 21 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit changes how de-allocation is done in RefCounter. One cycle after we update the reference counters, the free registers are released to the freelist. Previous version creates a critical path, starting from deallocate ports and ending at freelist registers. This commit adds one more cycle in the allocation --> updating reference counters --> freeing physical registers --> allocation loop.
-
- 17 10月, 2021 2 次提交
-
-
由 Yinan Xu 提交于
This commit removes the update logic for ref counter 0. For simplicity, we don't count the number of references for physical register 0. It should never be released to freelist. Previously we track register 0's references. It works fine but it makes the performance counters confusing because it may increase to a large number. It never causes real issues.
-
由 Yinan Xu 提交于
This commit removes lsrc usages in the fence unit and lsrc is no longer needed after an instruction is renamed. It helps timing and area. lsrc is placed in imm at rename stage (the last stage we need lsrc). They are extracted in the fence unit. Imm needs to go through the pipelines because Jump needs it (and we re-use it for lsrc).
-
- 16 10月, 2021 2 次提交
-
-
由 Yinan Xu 提交于
This commit optimizes the move elimination implementation. Reference counting for every physical register is recorded. Originally 0-31 registers have counters of ones. Every time the physical register is allocated or deallocated, the counter is increased or decreased by one. When the counter becomes zero from a non-zero value, the register is freed and released to freelist.
-
由 Yinan Xu 提交于
This commit removes flush IO for every module. Flush now re-uses redirect ports to flush the instructions.
-
- 11 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
* bump chisel to 3.5.0-RC1 We don't want to use SNAPSHOT version any more because we don't know what will happen when we wake up in the morning. * misc: remove TMA_* to avoid conflicts
-
- 10 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit optimizes RenameTable's timing. Read addresses come from instruction buffer directly and has best timing. So we let data read at decode stage and bypass write data from this clock cycle to the read data at next cycle. For write, we latch the write request and process it at the next cycle.
-
- 28 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
* rename Roq to Rob * remove trailing whitespaces * remove unused parameters
-
- 22 9月, 2021 1 次提交
-
-
由 YikeZhou 提交于
instead of chisel var in MEFreeList.scala
-
- 21 9月, 2021 1 次提交
-
-
由 YikeZhou 提交于
-
- 19 9月, 2021 3 次提交
-
-
由 YikeZhou 提交于
-
由 YikeZhou 提交于
1) generate ptr and preg in a vec first 2) use renameEnable to replace common parts in allocating logic
-
由 Yinan Xu 提交于
This commit adds timer counters for some important pipeline stages, including rename, dispatch, dispatch2, select, issue, execute, commit. We add performance counters for different types of instructions to see the latency in different pipeline stages.
-
- 18 9月, 2021 1 次提交
-
-
由 YikeZhou 提交于
1) bug fix: updateArchRefCounter should be related with pdest, not old_pdest 2) remove complicated logic of headPtr recovery when flushing
-
- 13 9月, 2021 1 次提交
-
-
由 YikeZhou 提交于
-
- 12 9月, 2021 1 次提交
-
-
由 YikeZhou 提交于
-
- 09 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds some simple instruction fusion cases in decode stage. Currently we only implement instruction pairs that can be fused into RV64GCB instructions. Instruction fusions are detected in the decode stage by FusionDecoder. The decoder checks every two instructions and marks the first instruction fused if they can be fused into one instruction. The second instruction is removed by setting the valid field to false. Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc. Currently, ftq in frontend needs every instruction to commit. However, the second instruction is removed from the pipeline and will not commit. To solve this issue, we temporarily add more bits to isFused to indicate the offset diff of the two fused instruction. There are four possibilities now. This feature may be removed later. This commit also adds more instruction fusion cases that need changes in both the decode stage and the funtion units. In this commit, we add some opcode to the function units and fuse the new instruction pairs into these new internal uops. The list of opcodes we add in this commit is shown below: - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31` - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30` - byte2: `srli r1, r0, 8` + `andi r1, r1, 255` - sh4add: `slli r1, r0, 4` + `add r1, r1, r2` - sr30add: `srli r1, r0, 30` + `add r1, r1, r2` - sr31add: `srli r1, r0, 31` + `add r1, r1, r2` - sr32add: `srli r1, r0, 32` + `add r1, r1, r2` - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2` - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2` - orh48: mask off the first 16 bits and or with another operand (`andi r1, r0, -256`` + `or r1, r1, r2`) Furthermore, this commit adds some complex instruction fusion cases to the decode stage and function units. The complex instruction fusion cases are detected after the instructions are decoded into uop and their CtrlSignals are used for instruction fusion detection. We add the following complex instruction fusion cases: - addwbyte: addw and mask it with 0xff (extract the first byte) - addwbit: addw and mask it with 0x1 (extract the first bit) - logiclsb: logic operation and mask it with 0x1 (extract the first bit) - mulw7: andi 127 and mulw instructions. Input to mul is AND with 0x7f if mulw7 bit is set to true.
-
- 06 9月, 2021 2 次提交