This is a "base" scheme of an adder. And I think that this scheme is the best for speed up - by using pipelining. This is a set of full adders where the each full adder input Carry_in is the Carry_out of the previous adder. So full adders are chained in sequence by the carry line. Example:
module full_adder (
input a,
input b,
input cin,
output s,
output cout
);
assign s = a ^ b ^ cin;
assign cout = (a & b) | (a & cin) | (cin & b);
endmodule
module ripple_carry_adder #(
parameter WIDTH = 8
) (
input [WIDTH-1:0] a,
input [WIDTH-1:0] b,
input cin,
output [WIDTH-1:0] s,
output cout
);
genvar i;
wire [WIDTH:0] carry_line;
assign carry_line[0] = cin;
assign cout = carry_line[WIDTH];
generate
for (i = 0; i < WIDTH; i = i + 1) begin
full_adder full_adder_inst (
.a(a[i]),
.b(b[i]),
.cin(carry_line[i]),
.s(s[i]),
.cout(carry_line[i+1])
);
end
endgenerate
endmodule
I used pipelined ripple-carry-adders (as well as carry-save-adders) for addition, subtraction and multiplication when I need to use high frequency in ASIC design.