提交 b3c864bd 编写于 作者: Y Yinan Xu

ram: use asynchronous ram and change dpi-c function prototype

Previously, the RAM is synchronous.
However, due to verilator issues, the bug is hidden by Buffer that includes FFs.

The buffer works as follows (simplified):
always @(posedge clk)
  data_out <= data_form_ram_helper;

data_from_ram_helper is given by (synchronous read):
always @(posedge clk)
  ram_helper(raddr, data_from_ram_helper);

At every positive edge, data_out should be assigned with data_from_ram_helper_old,
and data_from_ram_helper should then be evaluated to the new value during the next clock cycle.
However, verilator evaluates data_from_ram_helper first and then assigns it to data_out,
that is, data_out is incorrectly assigned by data_from_ram_helper.

For example, verilator gives the following sequence:
 raddr    data_from_ram_helper     data_out
  0               X                    X
  1             data[0]             data[0]
  2             data[1]             data[1]

However, the correct values should be:
 raddr    data_from_ram_helper     data_out
  0               X                    X
  1             data[0]                X
  2             data[1]             data[0]

Previously, due to the two bugs, ram works well.
However, when it comes to multi-threading, they are put to two threads
and since verilator does not find the relationship between raddr and data_from_ram_helper, data_out,
they don't follow any specific evaluation order.
Thus, multi-threaded emu randomly produces difftest error.

To prove that verilator incorrectly evaluates DPI-C functions and related signals
(however, it's also possible that we were using DPI-C functions incorrectly),
one can change ram.v to
  always @(posedge clk) begin
    rdata <= ram_read_helper(raddr);
    ram_write_helper(waddr, wdata);
  end
This should be the same with previous version of ram.v but it will give errors on difftest.

To solve the issue, this commit makes two modifications:
(1) make the ram asynchronous
AXIWrapper requests the RAM to be asynchronous such that after ar.fire() we have eight cycles of rdata[0-7].
(2) changes DPI-C function prototype to uint64_t ram_read_helper(uint64_t raddr)
In this form, verilator detects the correct order between data_from_ram_helper and data_out evaluation.
上级 34317ece
......@@ -115,16 +115,15 @@ void init_ram(const char *img) {
//new end
}
extern "C" void ram_helper(
uint64_t rIdx, uint64_t *rdata, uint64_t wIdx, uint64_t wdata, uint64_t wmask, uint8_t wen) {
extern "C" uint64_t ram_read_helper(uint64_t rIdx) {
if (rIdx >= RAMSIZE / sizeof(uint64_t)) {
printf("ERROR: ram idx = 0x%x out of bound!\n", rIdx);
// TODO: don't allow out of bound when crossbar is ready
//assert(rIdx < RAMSIZE / sizeof(uint64_t));
*rdata = 0xabcd12345678dcbaUL;
return;
assert(rIdx < RAMSIZE / sizeof(uint64_t));
}
*rdata = ram[rIdx];
return ram[rIdx];
}
extern "C" void ram_write_helper(uint64_t wIdx, uint64_t wdata, uint64_t wmask, uint8_t wen) {
if (wen) {
assert(wIdx < RAMSIZE / sizeof(uint64_t));
ram[wIdx] = (ram[wIdx] & ~wmask) | (wdata & wmask);
......
import "DPI-C" function void ram_helper
import "DPI-C" function void ram_write_helper
(
input longint rIdx,
output longint rdata,
input longint wIdx,
input longint wdata,
input longint wmask,
input bit wen
);
import "DPI-C" function longint ram_read_helper
(
input longint rIdx
);
module RAMHelper(
input clk,
input [63:0] rIdx,
......@@ -18,8 +21,10 @@ module RAMHelper(
input wen
);
assign rdata = ram_read_helper(rIdx);
always @(posedge clk) begin
ram_helper(rIdx, rdata, wIdx, wdata, wmask, wen);
ram_write_helper(wIdx, wdata, wmask, wen);
end
endmodule
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册