• Y
    rs, fma: separate fadd and fmul issue (#1042) · 65e2f311
    Yinan Xu 提交于
    This commit splits FMA instructions into FMUL and FADD for execution.
    
    When the first two operands are ready, an FMA instruction can be issued
    and the intermediate result will be written back to RS after two cycles.
    Since RS currently has DataArray to store the operands, we reuse it to
    store the intermediate FMUL result.
    
    When an FMA enters deq stage and leaves RS with only two operands, we
    mark it as midState ready at this clock cycle T0.
    
    If the instruction's third operand becomes ready at T0, it can be
    selected at T1 and issued at T2, when FMUL is also finished. The
    intermediate result will be sent to FADD instead of writing back to RS.
    If the instruction's third operand becomes ready later, we have the data
    in DataArray or at DataArray's write port. Thus, it's ok to set midState
    ready at clock cycle T0.
    
    The separation of FMA instructions will increase issue pressure since RS
    needs to issue more times. However, it larges reduce FMA latency if many
    FMA instructions are waiting for the third operand.
    65e2f311
StatusArray.scala 13.2 KB