- 28 6月, 2022 1 次提交
-
-
由 William Wang 提交于
This commit re-pipelines ECC check logic in data cache and exception generate logic for better timing. Now ecc error is checked 1 cycle after reading result from data sram. An extra cycle is added for load writeback to ROB. Future work: move the pipeline to https://github.com/OpenXiangShan/XiangShan/blob/master/src/main/scala/xiangshan/backend/CtrlBlock.scala#L266-L277, which add a regnext. * dcache: repipeline ecc check logic for timing * chore: fix normal loadAccessFault logic * wbu: delay load unit wb for 1 cycle * dcache: add 1 extra cycle for beu error report
-
- 22 6月, 2022 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds a buffer after the function unit that operate across the integer block and the floating-point block, such as f2i and i2f. For example, previously the out.ready of f2i depends on whether mul/div/csr/jump has a valid instruction out, since f2i has lower priority than them. This ready back-propagates from the integer function units to the floating-point function units, and finally to the floating-point reservation stations (since f2i is fully pipelined). We add a buffer after the function unit to break this ready back-propagation. It incurs one more cycle of execution latency, but we leave it not-fully-optimized for now. Timing can be further optimized if we separates the int writeback and fp writeback in function units. In the current version, the ready of f2i affects the ready of f2f pipelines, which is unnecessary. This is the future work.
-
- 09 12月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds WritebackSink and WritebackSource parameters for multiple modules. These traits hide implementation details from other modules by defining IO-related functions in modules. By using WritebackSink, ROB is able to choose the writeback sources. Now fflags and exceptions are connected from exe units to reduce write ports and optimize timing. Further optimizations on write-back to RS and better coding style to be added later.
-
- 16 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit removes flush IO for every module. Flush now re-uses redirect ports to flush the instructions.
-
- 01 10月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit moves load/store reservation stations into the first ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module is also removed from CtrlBlock. Now the module organization becomes: * ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs * ExuBlock_1: Fp RS, Fp RF, Fp FUs * MemBlock: Load/Store FUs Besides, load queue has 80 entries and store queue has 64 entries now.
-
- 27 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit applys Definition and Instance for some modules. Refer to https://github.com/chipsalliance/chisel3/pull/2045.
-
- 20 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit splits FMA instructions into FMUL and FADD for execution. When the first two operands are ready, an FMA instruction can be issued and the intermediate result will be written back to RS after two cycles. Since RS currently has DataArray to store the operands, we reuse it to store the intermediate FMUL result. When an FMA enters deq stage and leaves RS with only two operands, we mark it as midState ready at this clock cycle T0. If the instruction's third operand becomes ready at T0, it can be selected at T1 and issued at T2, when FMUL is also finished. The intermediate result will be sent to FADD instead of writing back to RS. If the instruction's third operand becomes ready later, we have the data in DataArray or at DataArray's write port. Thus, it's ok to set midState ready at clock cycle T0. The separation of FMA instructions will increase issue pressure since RS needs to issue more times. However, it larges reduce FMA latency if many FMA instructions are waiting for the third operand.
-
- 19 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds load balance strategy in issue selection logic for reservation stations. Previously we have a load balance option in ExuBlock, but it cannot work if the function units have feedbacks to RS. In this commit it is removed. This commit adds a victim index option for oldestFirst. For LOAD, the first issue port has better performance and thus we set the victim index to 0. For other function units, we use the last issue port.
-
- 13 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit cleans up exception vector usages in backend. Previously the exception vector will go through the pipeline with the uop. However, instructions with exceptions will enter ROB when they are dispatched. Thus, actually we don't need the exception vector when an instruction enters a function unit. * exceptionVec, flushPipe, replayInst are reset when an instruction enters function units. * For execution units that don't have exceptions, we reset their output exception vectors to avoid ROB to record them. * Move replayInst to CtrlSignals.
-
- 05 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds support for load balance between different issue ports when the function unit is not pipelined and the reservation station has more than one issue ports. We use a ping pong bit to decide which port to issue the instruction. At every clock cycle, the bit is flipped.
-
- 03 9月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds an 8-entry buffer for fdivSqrt function unit input. Set hasInputBuffer to true to enable input buffers for other function units.
-
- 01 9月, 2021 2 次提交
-
-
由 Jiawei Lin 提交于
* IntToFP: support fully pipelined mode
-
由 Yinan Xu 提交于
This commit adds fastUopOut support for pipelined function units via implementing fastUopOut in trait HasPipelineReg. The following function units now support fastUopOut: - MUL - FMA - F2I - F2F
-
- 31 8月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit optimizes ExuBlock timing by connecting writeback when possible. The timing priorities are RegNext(rs.fastUopOut) > fu.writeback > arbiter.out(--> io.rfWriteback --> rs.writeback). The higher priority, the better timing. (1) When function units have exclusive writeback ports, their wakeup ports for reservation stations can be connected directly from function units' writeback ports. Special case: when the function unit has fastUopOut, valid and uop should be RegNext. (2) If the reservation station has fastUopOut for all instructions in this exu, we should replace io.fuWriteback with RegNext(fastUopOut). In this case, the corresponding execution units must have exclusive writeback ports, unless it's impossible that rs can ensure the instruction is able to write the regfile. (3) If the reservation station has fastUopOut for all instructions in this exu, we should replace io.rfWriteback (rs.writeback) with RegNext(rs.wakeupOut).
-
- 27 8月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds a fastUopOut option to function units. This allows the function units to give valid and uop one cycle before its output data is ready. FastUopOut lets writeback arbitration happen one cycle before data is ready and helps optimize the timing. Since some function units are not ready for this new feature, this commit adds a fastImplemented option to allow function units to have fastUopOut but the data is still at the same cycle as uop. This option will delay the data for one cycle and may cause performance degradation. FastImplemented should be true after function units support fastUopOut.
-
- 25 8月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
* Refactor print control transform * Adda tilelink bus pmu * Add performance counters for dispatch, issue, execute stages * Add more counters in bus pmu * Insert BusPMU between L3 and L2 * add some TMA perfcnt Co-authored-by: NLinJiawei <linjiawei20s@ict.ac.cn> Co-authored-by: NWilliam Wang <zeweiwang@outlook.com> Co-authored-by: Nwangkaifan <wangkaifan@ict.ac.cn>
-
- 23 8月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
-
- 21 8月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit separates store address and store data in backend, including both reservation stations and function units. This commit also changes how stIssuePtr is updated. stIssuePtr should only be updated when both store data and address issue.
-
- 04 8月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
Backend --> ExuBlock --> FuBlock --> Exu --> Function Units --> --> Scheduler --> RS
-
- 24 7月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
XiangShan is jointly released by ICT and PCL.
-
- 17 7月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds support for a parameterized scheduler. A scheduler can be parameterized via issue and dispatch ports. Note: other parameters have not been tested.
-
- 16 7月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit adds support for a parameterized scheduler. A scheduler can be parameterized via issue and dispatch ports. Note: other parameters have not been tested.
-
- 04 6月, 2021 1 次提交
-
-
由 Lemover 提交于
In this commit, we add License for XiangShan project.
-
- 15 5月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
* test,vcs: call $finish when difftest fails * backend,RS: refactor with more submodules This commit rewrites the reservation station in a more configurable style. The new RS has not finished. - Support only integer instructions - Feedback from load/store instructions is not supported - Fast wakeup for multi-cycle instructions is not supported - Submodules are refined later * RS: use wakeup signals from arbiter.out * RS: support feedback and re-schedule when needed For load and store reservation stations, the instructions that left RS before may be replayed later. * test,vcs: check difftest_state and return on nemu trap instructions * backend,RS: support floating-point operands and delayed regfile read for store RS This commit adds support for floating-point instructions in reservation stations. Beside, currently fp data for store operands come a cycle later than int data. This feature is also supported. Currently the RS should be ready for any circumstances. * rs,status: don't trigger assertions when !status.valid * test,vcs: add +workload option to specify the ram init file * backend,rs: don't enqueue when redirect.valid or flush.valid * backend,rs: support wait bit that instruction waits until store issues This commit adds support for wait bit, which is mainly used in load and store reservation stations to delay instruction issue until the corresponding store instruction issued. * backend,RS: optimize timing This commit optimizes BypassNetwork and PayloadArray timing. - duplicate bypass mask to avoid too many FO4 - use one-hot vec to get read data
-
- 09 5月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
This commit replaces src1, src2, src3 in Bundle ExuInput with Vec(3, UInt). Should be easier for RS.
-
- 06 5月, 2021 1 次提交
-
-
由 Lemover 提交于
* [WIP] Backend: add mul to fast wake-up * Backend: handle mul wb priority and fix wrong delay * RS: devide fastwakeup and nonBlocked(they were binded)
-
- 29 4月, 2021 1 次提交
-
-
由 Lemover 提交于
-
- 19 4月, 2021 1 次提交
-
-
由 Jiawei Lin 提交于
* difftest: use DPI-C to refactor difftest In this commit, difftest is refactored with DPI-C calls. There're a few reasons: (1) From Verilator's manual, DPI-C calls should be more efficient than accessing from dut_ptr. (2) DPI-C is cross-platform (Verilator, VCS, ...) (3) difftest APIs are splited from emu.cpp to possibly support more backend platforms (NEMU, Spike, ...) The performance at this commit is quite slower than the original emu. Performance issues will be fixed later. * [WIP] SimTop: try to use 'XSTop' as soc * CircularQueuePtr: ues F-bounded polymorphis instead implict helper * Refactor parameters & Clean up code * difftest: support basic difftest * Support diffetst in new sim top * Difftest; convert recode fmt to ieee754 when comparing fp regs * Difftest: pass sign-ext pc to dpic functions && fix exception pc * Debug: add int/exc inst wb to debug queue * Difftest: pass sign-ext pc to dpic functions && fix exception pc * Difftest: fix naive commit num limit Co-authored-by: NYinan Xu <xuyinan1997@gmail.com> Co-authored-by: NWilliam Wang <zeweiwang@outlook.com>
-
- 26 2月, 2021 1 次提交
-
-
由 ljw 提交于
* Backend: fix some bugs related to exu write * Roq: revert to perv verision * Fix fp write back bugs
-
- 22 2月, 2021 1 次提交
-
-
由 LinJiawei 提交于
-
- 26 1月, 2021 1 次提交
-
-
由 William Wang 提交于
-
- 25 1月, 2021 2 次提交
- 24 1月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
-
- 22 1月, 2021 1 次提交
-
-
由 LinJiawei 提交于
-
- 21 1月, 2021 1 次提交
-
-
由 Yinan Xu 提交于
-
- 17 1月, 2021 1 次提交
-
-
由 LinJiawei 提交于
-
- 14 1月, 2021 2 次提交
-
-
由 wangkaifan 提交于
* values of hardware performance counters can hardly be emulated by NEMU
-
由 LinJiawei 提交于
-
- 13 1月, 2021 1 次提交
-
-
由 LinJiawei 提交于
-