将陆续上传本人写的新书《自己动手写CPU》,今天是第35篇,我尽量每周四篇
开展晒书评送书活动,在亚马逊、京东、当当三大图书网站上,发表《自己动手写CPU》书评的前十名读者,均可获赠《步步惊芯——软核处理器内部设计分析》一书,大家踊跃参与吧!活动时间:2014-9-11至2014-10-20
转移指令的实现过程比较长,分两次介绍,今天是第一次
8.4 修改OpenMIPS以实现转移指令
8.4.1 修改取指阶段的PC模块
参考图8-6可知,PC模块需要增加接口,增加的接口如表8-1所示。修改取指阶段的PC模块如下,主要修改一点:如果branch_flag_i为Branch,那么设置新的PC值为branch_target_address_i。完整代码位于本书附带光盘Code\Chapter8目录下的pc_reg.v文件。
module pc_reg( input wire clk, input wire rst, // 来自控制模块的信息 input wire[5:0] stall, // 来自译码阶段ID模块的信息 input wire branch_flag_i, input wire[`RegBus] branch_target_address_i, output reg[`InstAddrBus] pc , output reg ce ); ...... always @ (posedge clk) begin if (ce == `ChipDisable) begin pc <= 32'h00000000; end else if(stall[0] == `NoStop) begin if(branch_flag_i == `Branch) begin pc <= branch_target_address_i; end else begin pc <= pc + 4'h4; end end end endmodule 其中Branch是defines.v中给出的宏定义: `define Branch 1'b1 // 转移 `define NotBranch 1'b0 // 不转移
8.4.2 修改译码阶段
1、修改ID模块
参考图8-6可知,ID模块需要增加一些接口,增加的接口描述如表8-2所示。
在ID模块要增加对转移指令的分析,根据图8-3、8-4给出的转移指令格式可得,确定转移指令的过程如图8-7所示。
其中涉及的宏定义如下,在本书附带光盘Code\Chapter8目录下的defines.v文件中可以找到这些定义。
`define EXE_J 6'b000010 `define EXE_JAL 6'b000011 `define EXE_JALR 6'b001001 `define EXE_JR 6'b001000 `define EXE_BEQ 6'b000100 `define EXE_BGEZ 5'b00001 `define EXE_BGEZAL 5'b10001 `define EXE_BGTZ 6'b000111 `define EXE_BLEZ 6'b000110 `define EXE_BLTZ 5'b00000 `define EXE_BLTZAL 5'b10000 `define EXE_BNE 6'b000101 此外,还新增如下宏定义,在实现转移指令时会使用到: `define InDelaySlot 1'b1 // 在延迟槽中 `define NotInDelaySlot 1'b0 // 不在延迟槽中
修改译码阶段的ID模块如下。完整代码请参考本书附带光盘Code\Chapter8目录下的id.v文件。
module id( ...... // 如果上一条指令是转移指令,那么下一条指令进入译码阶段的时候,输入变量 // is_in_delayslot_i为true,表示是延迟槽指令,反之,为false input wire is_in_delayslot_i, ...... output reg next_inst_in_delayslot_o, output reg branch_flag_o, output reg[`RegBus] branch_target_address_o, output reg[`RegBus] link_addr_o, output reg is_in_delayslot_o, ...... ); ...... wire[`RegBus] pc_plus_8; wire[`RegBus] pc_plus_4; wire[`RegBus] imm_sll2_signedext; assign pc_plus_8 = pc_i + 8; //保存当前译码阶段指令后面第2条指令的地址 assign pc_plus_4 = pc_i + 4; //保存当前译码阶段指令后面紧接着的指令的地址 // imm_sll2_signedext对应分支指令中的offset左移两位,再符号扩展至32位的值 assign imm_sll2_signedext = {{14{inst_i[15]}}, inst_i[15:0], 2'b00 }; always @ (*) begin if (rst == `RstEnable) begin ...... link_addr_o <= `ZeroWord; branch_target_address_o <= `ZeroWord; branch_flag_o <= `NotBranch; next_inst_in_delayslot_o <= `NotInDelaySlot; end else begin ...... aluop_o <= `EXE_NOP_OP; alusel_o <= `EXE_RES_NOP; wd_o <= inst_i[15:11]; // 默认目的寄存器地址wd_o wreg_o <= `WriteDisable; instvalid <= `InstInvalid; reg1_read_o <= 1'b0; reg2_read_o <= 1'b0; reg1_addr_o <= inst_i[25:21]; // 默认的reg1_addr_o reg2_addr_o <= inst_i[20:16]; // 默认的reg2_addr_o imm <= `ZeroWord; link_addr_o <= `ZeroWord; branch_target_address_o <= `ZeroWord; branch_flag_o <= `NotBranch; next_inst_in_delayslot_o <= `NotInDelaySlot; case (op) `EXE_SPECIAL_INST: begin case (op2) 5'b00000: begin case (op3) ...... `EXE_JR: begin // jr指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_JR_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; link_addr_o <= `ZeroWord; branch_target_address_o <= reg1_o; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; instvalid <= `InstValid; end `EXE_JALR: begin // jalr指令 wreg_o <= `WriteEnable; aluop_o <= `EXE_JALR_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; wd_o <= inst_i[15:11]; link_addr_o <= pc_plus_8; branch_target_address_o <= reg1_o; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; instvalid <= `InstValid; end default: begin end endcase end default: begin end endcase end ...... `EXE_J: begin // j指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_J_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b0; reg2_read_o <= 1'b0; link_addr_o <= `ZeroWord; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; instvalid <= `InstValid; branch_target_address_o <= {pc_plus_4[31:28], inst_i[25:0], 2'b00}; end `EXE_JAL: begin // jal指令 wreg_o <= `WriteEnable; aluop_o <= `EXE_JAL_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b0; reg2_read_o <= 1'b0; wd_o <= 5'b11111; link_addr_o <= pc_plus_8 ; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; instvalid <= `InstValid; branch_target_address_o <= {pc_plus_4[31:28], inst_i[25:0], 2'b00}; end `EXE_BEQ: begin // beq指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_BEQ_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b1; instvalid <= `InstValid; if(reg1_o == reg2_o) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_BGTZ: begin // bgtz指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_BGTZ_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; instvalid <= `InstValid; if((reg1_o[31] == 1'b0) && (reg1_o != `ZeroWord)) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_BLEZ: begin // blez指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_BLEZ_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; instvalid <= `InstValid; if((reg1_o[31] == 1'b1) || (reg1_o == `ZeroWord)) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_BNE: begin // bne指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_BLEZ_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b1; instvalid <= `InstValid; if(reg1_o != reg2_o) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_REGIMM_INST: begin case (op4) `EXE_BGEZ: begin // bgez指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_BGEZ_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; instvalid <= `InstValid; if(reg1_o[31] == 1'b0) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_BGEZAL: begin // bgezal指令 wreg_o <= `WriteEnable; aluop_o <= `EXE_BGEZAL_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; link_addr_o <= pc_plus_8; wd_o <= 5'b11111; instvalid <= `InstValid; if(reg1_o[31] == 1'b0) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_BLTZ: begin // bltz指令 wreg_o <= `WriteDisable; aluop_o <= `EXE_BGEZAL_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; instvalid <= `InstValid; if(reg1_o[31] == 1'b1) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end `EXE_BLTZAL: begin // bltzal指令 wreg_o <= `WriteEnable; aluop_o <= `EXE_BGEZAL_OP; alusel_o <= `EXE_RES_JUMP_BRANCH; reg1_read_o <= 1'b1; reg2_read_o <= 1'b0; link_addr_o <= pc_plus_8; wd_o <= 5'b11111; instvalid <= `InstValid; if(reg1_o[31] == 1'b1) begin branch_target_address_o <= pc_plus_4 + imm_sll2_signedext; branch_flag_o <= `Branch; next_inst_in_delayslot_o <= `InDelaySlot; end end default: begin end endcase ...... // 输出变量is_in_delayslot_o表示当前译码阶段指令是否是延迟槽指令 always @ (*) begin if(rst == `RstEnable) begin is_in_delayslot_o <= `NotInDelaySlot; end else begin // 直接等于is_in_delayslot_i is_in_delayslot_o <= is_in_delayslot_i; end end endmodule
对其中几个典型指令的译码过程解释如下。
(1)jr指令
- jr指令不需要保存返回地址,所以设置wreg_o为WriteDisable,设置返回地址link_addr_o为0,aluop_o保持默认值EXE_NOP_OP,alusel_o保持默认值EXE_RES_NOP。
- jr指令要转移到的目标地址是通用寄存器rs的值,所以需要设置reg1_read_o为1,表示通过Regfile模块的读端口1读取寄存器,读取的寄存器地址正是指令中的rs,所以最终译码阶段的输出reg1_o就是地址为rs的寄存器的值。
- jr指令是绝对转移,所以设置branch_flag_o为Branch。
- 设置转移目标地址branch_target_address_o为reg1_o,也即是读取出来的通用寄存器rs的值。
- 下一条指令是延迟槽指令,所以设置next_inst_in_delayslot_o为InDelaySlot。
j指令与jr类似,只是转移目标地址不再是通用寄存器的值,所以不需要读取通用寄存器,设置reg1_read_o为0,转移目标地址如下。
{pc_plus_4[31:28], inst_i[25:0], 2'b00}
(2)jalr指令
- jalr指令需要保存返回地址,所以设置wreg_o为WriteEnable,设置返回地址link_addr_o为当前转移指令后面第2条指令的地址,即pc_plus_8。此外,还要设置alusel_o为EXE_RES_JUMP_BRANCH,设置要写的目的寄存器地址wd_o为指令的第11-15bit,正是图8-3中的rd。
- jalr指令要转移到的目标地址是通用寄存器rs的值,所以需要设置reg1_read_o为1,表示通过Regfile模块的读端口1读取寄存器,读取的寄存器地址正是指令中的rs,所以最终译码阶段的输出reg1_o就是地址为rs的寄存器的值。
- jalr指令是绝对转移,所以设置branch_flag_o为Branch。
- 设置转移目的地址branch_target_address_o为reg1_o,也即是读取出来的通用寄存器rs的值。
- 下一条指令是延迟槽指令,所以设置next_inst_in_delayslot_o为InDelaySlot。
jal指令与jalr类似,只是jal指令将返回地址写到寄存器$31中,所以wd_o直接设置为5'b11111,另外,转移目标地址不再是通用寄存器的值,所以不需要读取通用寄存器,设置reg1_read_o为0,转移目标地址如下。
{pc_plus_4[31:28], inst_i[25:0], 2'b00}
(3)beq指令
- beq指令不需要保存返回地址,所以设置wreg_o为WriteDisable,设置返回地址link_addr_o为0,aluop_o保持默认值EXE_NOP_OP,alusel_o保持默认值EXE_RES_NOP。
- beq指令是条件转移,转移条件是两个通用寄存器的值相等,所以需要读取两个通用寄存器,设置reg1_read_o、reg2_read_o为1,表示通过Regfile模块的读端口1、读端口2读取寄存器,读取的寄存器地址分别指令中的rs、rt。所以最终译码阶段的输出reg1_o就是地址为rs的寄存器的值,reg2_o就是地址为rt的寄存器的值。
- 对于beq指令,如果读取的两个通用寄存器的值相等(即reg1_o等于reg2_o),那么转移发生,设置branch_flag_o为Branch,同时设置转移目的地址branch_target_address_o为pc_plus_4 +imm_sll2_signedext。此外,下一条指令是延迟槽指令,所以设置next_inst_in_delayslot_o为InDelaySlot。
bne指令与beq类似,只是转移条件是两个通用寄存器的值不相等。
(4)bgtz指令
- bgtz指令不需要保存返回地址,所以设置wreg_o为WriteDisable,设置返回地址link_addr_o为0,aluop_o保持默认值EXE_NOP_OP,alusel_o保持默认值EXE_RES_NOP。
- bgtz指令是条件转移,转移条件是地址为rs的通用寄存器的值大于0,所以需要设置reg1_read_o为1,表示通过Regfile模块的读端口1读取寄存器,读取的寄存器地址正是指令中的rs。所以最终译码阶段的输出reg1_o就是地址为rs的寄存器的值。
- 对于bgtz指令,如果读取的地址为rs的通用寄存器的值大于0(即reg1_o大于0),那么转移发生,设置branch_flag_o为Branch,同时设置转移目的地址branch_target_address_o为pc_plus_4 +imm_sll2_signedext。此外,下一条指令是延迟槽指令,所以设置next_inst_in_delayslot_o为InDelaySlot。
blez、bgez、bltz指令与bgtz类似,只是转移条件不同。
(5)bgezal指令
- bgezal指令需要保存返回地址,所以设置wreg_o为WriteEnable,设置返回地址link_addr_o为pc_plus_8, 设置alusel_o为EXE_RES_JUMP_BRANCH,此外,要将返回地址保存到寄存器$31,所以设置wd_o为5'b11111。
- bgezal指令是条件转移,转移条件是地址为rs的通用寄存器的值大于等于0,所以需要设置reg1_read_o为1,表示通过Regfile模块的读端口1读取寄存器,读取的寄存器地址正是指令中的rs。所以最终译码阶段的输出reg1_o就是地址为rs的寄存器的值。
- 对于bgezal指令,如果读取的地址为rs的通用寄存器的值大于等于0(即reg1_o大于等于0),那么转移发生,设置branch_flag_o为Branch,同时设置转移目的地址branch_target_address_o为pc_plus_4 +imm_sll2_signedext。此外,下一条指令是延迟槽指令,所以设置next_inst_in_delayslot_o为InDelaySlot。
bltzal指令与bgezal类似,只是转移条件是地址为rs的通用寄存器的值小于0。
2、修改ID/EX模块
参考图8-6可知,ID/EX模块需要增加一些接口,增加的接口描述如表8-3所示。
ID/EX模块的代码主要修改如下,很简单,当流水线译码阶段没有被暂停时,ID/EX模块在时钟上升沿将新增加的输入传递到对应的输出。完整代码位于本书附带光盘Code\Chapter8目录下的id_ex.v文件。
module id_ex( ...... input wire[`RegBus] id_link_address, input wire id_is_in_delayslot, input wire next_inst_in_delayslot_i, ...... output reg[`RegBus] ex_link_address, output reg ex_is_in_delayslot, output reg is_in_delayslot_o ); always @ (posedge clk) begin if (rst == `RstEnable) begin ...... ex_link_address <= `ZeroWord; ex_is_in_delayslot <= `NotInDelaySlot; is_in_delayslot_o <= `NotInDelaySlot; end else if(stall[2] == `Stop && stall[3] == `NoStop) begin ..... ex_link_address <= `ZeroWord; ex_is_in_delayslot <= `NotInDelaySlot; end else if(stall[2] == `NoStop) begin ...... ex_link_address <= id_link_address; ex_is_in_delayslot <= id_is_in_delayslot; is_in_delayslot_o <= next_inst_in_delayslot_i; end end ......
未完待续!