CLA(carry-look-ahead)

과제/디지털논리회로(verilog)

by 근성 2022. 1. 5. 15:50

CLA를 설계하기 위해서는 앞선 글인 RCA를 보고 오는것을 추천드립니다.

https://baseballgrammer.tistory.com/44

RCA(Ripple Carry Adder)

이번시간에는 RCA라는 가산기를 설계할것이지만, 가산기를 설계하기전에는 기본지식이 필요합니다. 전가산기(full adder)와 반가산기(half adder), 2의보수가 있습니다. 2의보수 보수 : 두 수의 합이 진

baseballgrammer.tistory.com

이론

carry-lookahead(CLA)

가산기의 한 종류입니다. 저번 실험시간에 설계한 ripple carry adder보다 계산해야 비트 수가 16개 이상인 경우 속도가 빠르다는것을 학교 강의인 디지털 논리회로2 시간에 배웠습니다.

위의 식을 정리해서 4bit full adder를 사용한다고 가정하면 16bit이상을 계산한다고 했을때는, CLA가 더 빠른것을 확인할 수 있습니다.

모든 올림수가 동시에 구해져서 RCA보다 계산시간이 더 단축됩니다.

CLA로 carry out값을 미리 계산해주기 위해서는 generation signal(Gi)과 propagation signal(Pi)을 알아야하고 정의를 해야합니다.

A와B를 input으로 했을때, generation signal=AiBi propagation signal=Ai+Bi 라고 정의한다고 합니다.

위 Gi와 Pi를 full adder에 대입하면, Ci+1=AiBi+(Ai+Bi)Ci=Gi+PiCi 입니다.

그렇게 4bit CLA를 계산한다면, C1=G0+P0C0 C2=G1+P1G0+P1P0C0 C3=G2+P2G1+P2P1G0+P2P1P0C0, Cout=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0C0입니다.

위 4bit adder를 구현하려면 3-input and, or 4-input and, or 5-input and,or 이 필요한것을 확인할 수 있습니다.

설계

-CLA(Carry look ahead)

32bit CLA를 구성하기 위해서는 전가산기, CLA 블럭, 4bit CLA가 필요합니다.

gates는 지난번 실험과는 다르게 3,4,5input or와 and게이트를 추가해주면 됩니다.

전가산기는 RCA때와는 다르게 carry out이 없기 필요가 없기때문에 sum만 출력하면 됩니다.

CLA 블럭은 앞서서 계산했던 C1,C2,C3,CO을 만들어주면 됩니다.

Generate에서는 2-input-and를 통해서 (A3,B3), (A2,B2), (A1,B1), (A0,B0)를 g3,g2,g1,g0해주고, Propagate에서는 2-input-or를 통해서 (A3,B3), (A2,B2), (A1,B1), (A0,B0)를 p3,p2,p1,p0해주고,

위에서 언급한

C2=G1+P1G0+P1P0C0

C3=G2+P2G1+P2P1G0+P2P1P0C0,

Cout=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0C0의 식을 맞춰서 아래와 같은 식을 작성합니다.

C2는 g[1] | (p[1] & g[0]) | (p[1] & p[0] & ci)

C3 = g[2] | (p[2] & g[1]) | (p[2] & p[1] & g[0]) | (p[2] & p[1] & p[0] & ci)

CO = g[3]| (p[3] & g[2])| (p[3] & p[2] & g[1])| (p[3] & p[2] & p[1] & g[0])| (p[3] & p[2] & p[1] & p[0] & ci)를 해주면 됩니다. 이러한 과정을 carry look ahead block(clb)으로 하게 할 예정입니다.

4bit CLA는 4bit이기 때문에 4bit a, b, sum을 사용하고, full adder의 값을 저장할 3bit c와 1bit 짜리 cin cout으로 구성됩니다. input들을 full adder에 넣고 4번 인스턴스 시켜준 후, full adder에서 나온 값을 CLA블럭에 넣어주면 됩니다. 32bit CLA는 32bit a와 32bit b, 32bit sum, 8번의 인스턴스 값을 저장할 8bit짜리 c와, 1bit짜리인 co,ci로 구성됩니다. 4bit CLA를 8번 인스턴스 시켜주면 됩니다.

위와 같은 방법으로 설계를 한다면

module cla32(a,b,ci,s,co);//cla32 module
input [31:0] a,b;//input a,b
input ci;//input cin
output [31:0] s;//input sum
output co;//input cout

wire c1, c2, c3, c4, c5, c6, c7;

cla4 U0_cla4(.a(a[3:0]), .b(b[3:0]), .ci(ci), .s(s[3:0]), .co(c1));
cla4 U1_cla4(.a(a[7:4]), .b(b[7:4]), .ci(c1), .s(s[7:4]), .co(c2));
cla4 U2_cla4(.a(a[11:8]), .b(b[11:8]), .ci(c2), .s(s[11:8]), .co(c3));
cla4 U3_cla4(.a(a[15:12]), .b(b[15:12]), .ci(c3), .s(s[15:12]), .co(c4));
cla4 U4_cla4(.a(a[19:16]), .b(b[19:16]), .ci(c4), .s(s[19:16]), .co(c5));
cla4 U5_cla4(.a(a[23:20]), .b(b[23:20]), .ci(c5), .s(s[23:20]), .co(c6));
cla4 U6_cla4(.a(a[27:24]), .b(b[27:24]), .ci(c6), .s(s[27:24]), .co(c7));
cla4 U7_cla4(.a(a[31:28]), .b(b[31:28]), .ci(c7), .s(s[31:28]), .co(co));//instance cla4

endmodule//end

module cla4(a,b,ci,s,co);//cla4 module
input [3:0] a, b;//input a, b
input ci;//input cin
output [3:0] s;//input sum
output co;//input cout

wire [3:0] c;

fa_v2 U0_fa(.a(a[0]), .b(b[0]), .ci(ci), .s(s[0]));//fa_v2(nosum) instance
fa_v2 U1_fa(.a(a[1]), .b(b[1]), .ci(c[0]), .s(s[1]));//fa_v2(nosum) instance
fa_v2 U2_fa(.a(a[2]), .b(b[2]), .ci(c[1]), .s(s[2]));//fa_v2(nosum) instance
fa_v2 U3_fa(.a(a[3]), .b(b[3]), .ci(c[2]), .s(s[3]));//fa_v2(nosum) instance
clb4 U4_clb4(.a(a),.b(b), .ci(ci), .c1(c[0]), .c2(c[1]), .c3(c[2]), .co(co));//carry look block instance
endmodule//end

module fa_v2(a,b,ci,s); //full adder version2(no sum) module
input a, b, ci;//input a, b, cin
output s;//output s
wire w0;
_xor2 U0_xor2(.a(a), .b(b), .y(w0));
_xor2 U1_xor2(.a(w0), .b(ci), .y(s));//instance xor gate
endmodule //end

module clb4(a,b,ci,c1,c2,c3,co);//clb4 module
input [3:0] a, b;//input a, b
input ci;//input cin
output c1, c2, c3, co;//output cout
wire [3:0] g, p;  //generate, propagate
wire w0_c1;
wire w0_c2, w1_c2;
wire w0_c3, w1_c3, w2_c3;
wire w0_co, w1_co, w2_co, w3_co;

// Generate
_and2 U0_and2(.a(a[0]),.b(b[0]),.y(g[0]));
_and2 U1_and2(.a(a[1]),.b(b[1]),.y(g[1]));
_and2 U2_and2(.a(a[2]),.b(b[2]),.y(g[2]));
_and2 U3_and2(.a(a[3]),.b(b[3]),.y(g[3]));//instance and gate

// Propagate
_or2 U4_or2(.a(a[0]),.b(b[0]),.y(p[0]));
_or2 U5_or2(.a(a[1]),.b(b[1]),.y(p[1]));
_or2 U6_or2(.a(a[2]),.b(b[2]),.y(p[2]));
_or2 U7_or2(.a(a[3]),.b(b[3]),.y(p[3]));//instance or gate

_and2 U8_and2(.a(p[0]),.b(ci),.y(w0_c1));
_or2 U9_or2(.a(g[0]),.b(w0_c1),.y(c1));


// c2 = g[1] | (p[1] & g[0]) | (p[1] & p[0] & ci)
_and2 U10_and2(.a(p[1]),.b(g[0]),.y(w0_c2));
_and3 U11_and3(.a(p[1]),.b(p[0]),.c(ci),.y(w1_c2));
_or3	U12_or3(.a(g[1]),.b(w0_c2),.c(w1_c2),.y(c2));

// c3 = g[2] | (p[2] & g[1]) | (p[2] & p[1] & g[0]) | (p[2] & p[1] & p[0] & ci)
_and2 U13_and2(.a(p[2]),.b(g[1]),.y(w0_c3));
_and3 U14_and3(.a(p[2]),.b(p[1]),.c(g[0]),.y(w1_c3));
_and4 U15_and4(.a(p[2]),.b(p[1]),.c(p[0]),.d(ci),.y(w2_c3));
_or4	U16_or4(.a(g[2]),.b(w0_c3),.c(w1_c3),.d(w2_c3),.y(c3));

// co = g[3] | (p[3] & g[2]) | (p[3] & p[2] & g[1]) | (p[3] & p[2] & p[1] & g[0]) | (p[3] & p[2] & p[1] & p[0] & ci)
_and2 U17_and2(.a(p[3]),.b(g[2]),.y(w0_co));
_and3 U18_and3(.a(p[3]),.b(p[2]),.c(g[1]),.y(w1_co));
_and4 U19_and4(.a(p[3]),.b(p[2]),.c(p[1]),.d(g[0]),.y(w2_co));
_and5 U20_and5(.a(p[3]),.b(p[2]),.c(p[1]),.d(p[0]),.e(ci),.y(w3_co));
_or5	U21_or5(.a(g[3]),.b(w0_co),.c(w1_co),.d(w2_co),.e(w3_co),.y(co));

endmodule //end

-CLA with Clock

register는 32bit CLA이므로 32bit 짜리 레지스터 a, b, s와 1bit 짜리 ci와 co를 만들고 제일 중요한 clk(clock)이 필요로합니다. 그리고 always문을 만들어서 상승엣지일때 이번에 디지털논리회로시간에 배운 non-blocking assignement를 활용하여 동시에 진행하게 했습니다.

이런방식으로 설계를 한다면 clk만 붙이면 되기때문에

module cla_clk(clk, a, b, ci, s, co);//cla clk module
input clk;//clock
input [31:0] a,b;//input a,b
input ci;//input cin
output [31:0] s;//input sum
output co;//output cout

reg [31:0] reg_a, reg_b;//register a, b
reg reg_ci;//register cin
reg [31:0] reg_s;//register sum
reg reg_co;//register cout

wire [31:0] wire_s;//wire to carry sum
wire wire_co;//wire to carry cout

always @ (posedge clk)//nonblocking assignments
begin
reg_a	<=a;
reg_b	<=b;
reg_ci <=ci;
reg_s <=wire_s;
reg_co <=wire_co;

end

cla32 U0_cla32(.a(reg_a), .b(reg_b), .ci(reg_ci), .s(wire_s), .co(wire_co));//instance cla32

assign s=reg_s;//sum==register sum
assign co=reg_co;//cout==register cout

endmodule

이번에 CLA를 설계하면서 RCA보다 빠른 경우를 가질 수 있다는 것을 알게되었고, CLA보다 더 빠를 수 있는 modified CLA를 알게 되었고, 가산기의 종류가 엄청 많다는것을 알게 되었습니다.

다음번에는 blocking assignment와 non-blocking assignment의 차이점 timing analysis에 관해서 포스팅 하겠습니다.

저작자표시 비영리 동일조건 (새창열림)

'과제 > 디지털논리회로(verilog)' 카테고리의 다른 글

D-FlipFlop (0)	2022.06.22
ALU(Arithmetic Logic Unit) (0)	2022.01.17
blocking 과 non-blocking assignment의 차이 (0)	2022.01.06
RCA(Ripple Carry Adder) (0)	2022.01.04
2-to-1 multiplexer 구현하기 (0)	2022.01.03

근성프로그래머 이준형

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

'과제 > 디지털논리회로(verilog)' 카테고리의 다른 글

관련글 더보기

댓글 영역

추가 정보

인기글

최신글

티스토리툴바