The algorithms for multiplication in Computer science does not have bit restriction either. It is a trade-off of memory usage. Today the multiprecision multiplication algorithm is known to be bound in O(n log n*2^(log * n)), where n is the total bit length of the input numbers: say the product of two 2048 bit long numbers. So you are here talking about the product of number in the range of 2 to power 2048. The time required in this case is: k*4096*20*4, where k is a constant.
The implementation of multiplication in computer, especially on the chip level, usually only implement simple multiplication algorithm with the bit restriction. It is because the complicated multiprecision algorithm requires extra-accounting steps to manage data structures and memory usage and etc. These accounting steps also takes time, therefore, for small number say in the range of 2^64, there is no need for that, and in fact, it will only make the it slower. However, the typical algorithm including the one presented by “屎叫兽" requires O(n^2) steps, so for the number of 2^4096, “屎叫兽"'s method require 4096^2. He's so call speed algorithm only provides a linear speed up, so essentially, it is the trivial shift and add algorithm. It is meaningless.
As for kind,
you can specified in FORTRAN like:
INTEGER(KIND=36) :: a = 123456789123456789, b = 987654321987654321, c
c = a*b
print c
Will do just fine, and in this case, it will be faster than 屎叫兽's method, because typically these type of operation is done using Karatsuba algorithm. For number this large, 屎叫兽 can't handle.