The Problem
After reading From Mathematics to Generic Programming,1 I accumulated some template code related to the RSA algorithm,2 but I tested it only with native integer types. Some recent events required me to use it for calculations that involve data sizes slightly bigger than those of the built-in types. I had to find some big-number libraries to help the calculation.
This is what the code looks like:
template <Integer N> inline bool odd(N n) { return n % 2 == 1; } template <Integer N> inline N half(N n) { return n / 2; } … template <EuclideanDomain E> std::pair<E, E> extended_gcd(E a, E b) { … } template <Regular A, Integer N, SemigroupOperation Op> A power_accumulate_semigroup(A r, A a, N n, Op op) { … } template <Regular A, Integer N, SemigroupOperation Op> A power_semigroup(A a, N n, Op op) { assert(n > 0); while (!odd(n)) { a = op(a, a); n = half(n); } if (n == 1) { return a; } return power_accumulate_semigroup(a, op(a, a), half(n - 1), op); } template <Integer N> struct modulo_multiply { modulo_multiply(const N& i) : modulus(i) {} N operator()(const N& n, const N& m) const { return (n * m) % modulus; } private: N modulus; }; int main() { typedef … int_type; int_type p1 = …; int_type p2 = …; int_type n = p1 * p2; int_type phi = (p1 - 1) * (p2 - 1); int_type pub = …; pair<int_type, int_type> p = extended_gcd(pub, phi); // Check that p.second == 1 int_type prv = p.first; … int_type encrypt = …; cout << "Encryped message is " << encrypt << endl; cout << "Decrypted message is " << power_semigroup(encrypt, prv, modulo_multiply<int_type>(n)) << endl; }
There are a few details omitted here, but the point is that I had already the code that worked when int_type
was defined to int64_t
, and I needed some types that could represent higher precisions and work with the existing code with minimal changes.
The Exploration
NTL
One of the first libraries I tried was NTL,3 which seemed to support the standard mathematical operators well. It did not take me long to make it work with my program, and I was able to get the result successfully. However, I saw several problems that made me think I probably wanted a more mature solution in the long run:
- Its class name for big integers is ‘
NTL::ZZ
’. ‘ZZ
’ looks ugly to me, not aesthetically comfortable as a type name. - It does not provide a make mechanism for Windows. Luckily, it does not require external libraries and it is easy enough to build it manually with GCC.
- Code like ‘
NTL::ZZ pub = 3
’ does not compile, which is a minor annoyance (but ‘NTL::ZZ pub(3)
’ is an easy workaround, anyway). - Code like ‘
NTL::ZZ p1("3440890133")
’ does not work. This is a problem for big integers that cannot be represented by a native integer type. The workaround is usingstd::istringstream
, which would require more lines and clumsiness. - There is no support for getting the input or output in hexadecimal numbers.
CLN
Another library I tried at the same time was CLN.4 It is not friendly to Windows users either, so I simply installed it from Cygwin.5 It seems to be in stark contrast to NTL in some aspects:
- The class name for big integers is more reasonably named ‘
cln::cl_I
’. - Code lines like ‘
cln::cl_I pub = 3
’ and ‘cln::cl_I p1("3440890133")
’ work. - CLN provides support for hexadecimal input (using a special ‘
#x
’ prefix in the number string) and output (using thefprinthexadecimal
function).
However, CLN is quite terrible in its handling of C++ operators:
%
is not overriden, and I have to call themod
function instead.- Division is not implemented on
cl_I
. I have to, in the general case, use a function that returns the {quotient, remainder} pair, or use anexquo
function when I can guarantee that the remainder is zero. Luckily, in my specific case, I can substitute ‘>> 1
’ for ‘/ 2
’. If shifts could not be used, I would have to replace ‘n / 2
’ with something like ‘truncate2(n, 2).quotient
’. Providing a series of division functions that return both the quotient and remainder is good; forcing people to use them is not.
Unlike the immaturity of NTL, it looks like that CLN deliberately made the design choices to be this way. Still, it looks bad enough to me. The API design of CLN shows the hauteur of the authors: Your time is not important to me; read the fucking manual, and do things the correct way we want it to be. This condescending attitude is completely against the trend.
Boost.Multiprecision
Finally, I found out that I should have looked no further than just the famous Boost libraries.6 A multi-precision template library is among Boost’s 100+ libraries, simply named ‘Boost.Multiprecision’.7 I wondered why I missed it in the beginning. But, anyway, it fulfilled all my needs wonderfully:
- Using the basic
cpp_int
type does not require building any libraries. This makes it work on any platform that has a decent C++ compiler. - All needed operators (like
+
,-
,*
,/
, and%
) are implemented. - Initialization from native integer types and C strings works.
- Hexadecimal input and output are implemented in a natural way: inputs can have the ‘
0x
’ prefix, and thehex
manipulator can be used to make the big integer output to iostreams in the hexadecimal form.8 - In addition, it supports the C++11 user-defined literals.9 So, instead of writing something like ‘
cppint encrypt("0xB570BF8E4BDABA4C")
’, you can have more efficient code by writing ‘cppint encrypt(0xB570BF8E4BDABA4C_cppi)
’.
This said, one problem halted me when I first used its cpp_int
type: very weird compilation errors occurred, spanning several screens. Actually, the solution is described in the introduction of the library, as well as in the first answer of its FAQ, so I figured it out the next day (I did not read carefully the documentation on the first night). I needed to either replace expressions like ‘half(n - 1)
’ with explicit type-casts like ‘half(N(n - 1))
’, or simply use an alternative typedef to turn off expression templates—which I did:
typedef boost::multiprecision::number< boost::multiprecision::cpp_int_backend<>, boost::multiprecision::et_off> int_type;
You can read the Boost documentation for more details. It is related to performance. It is also worth noting that with C++11 move semantics, the expression-template-disabled form I use can still have performance close enough (no more than 10% slower) to the expression-template-enabled form. And the first template parameter probably has a bigger impact—GMP can be used as the backend and is considered faster.10
In my humble opinions, Boost.Multiprecision should change the default cpp_int
definition to have et_off
. Developers who want the ultimate performance will always read the documentation, but it does not seem necessary to force other developers to have failures, read documentation, and change their code. In my case, it takes several seconds to compile the program, a small fraction of a second to run the program, but several hours to find the correct library and learn how to use it.
The Critique
I would argue that the following three criteria should be the foremost in choosing (and thus providing) a good library:
- Correctness. I think this is self-evident. All three libraries described here satisfy this criterion.
- Standard and intuitive interface. This is where NTL and CLN fail. CLN does the worst here by intentionally failing to provide
operator/
. Boost.Multiprecision satisfies my needs without requiring me to look up the documentation (mostly), but it is not perfect, in that the default types can cause horrendous error messages and that its iostream routines do not honouruppercase
andnouppercase
.11 - Performance. Yes, performance comes the last among these three. It is still important, but I would argue we should put performance aside when it conflicts with the other two criteria (like treating ‘premature optimization’), and developers can read the documentation and turn performance options back on when they really need it. Boost.Multiprecision is nice to support different backends and the expression template option, but I am not persuaded that expression templates should be enabled by default.
Of course, there are many other criteria, like portability, (lack of) dependency, etc., but they tend to be more subjective and can vary from project to project. The three criteria listed above are the most important to me.
Correctness and developer productivity should be preferred to code performance. This should be true for both scripting languages and traditional compiled languages.
Appendix: Source Listing
The complete RSA sample code that builds with Boost.Multiprecision is listed below for you to play with. We can optimize the code a little bit by substituting the Boost.Multiprecision divide_qr
function for the handwritten quotient_remainder
. That can be the small exercise for you, dear reader. 🙂
#include <assert.h> // assert #include <iostream> // std::cout/endl #include <utility> // std::pair/make_pair #include <boost/multiprecision/cpp_int.hpp> // Concepts #define EuclideanDomain typename #define Integer typename #define Regular typename #define SemigroupOperation typename template <Integer N> inline bool odd(N n) { return n % 2 == 1; } template <Integer N> inline N half(N n) { return n / 2; } template <Integer N> N largest_doubling(N a, N b) { assert(b != 0); for (;;) { N c = b + b; if (a < c) break; b = c; } return b; } template <Integer N> std::pair<N, N> quotient_remainder(N a, N b) { assert(b > 0); if (a < b) { return std::make_pair(N(0), a); } N c = largest_doubling(a, b); N n(1); a = a - c; while (c != b) { c = half(c); n = n + n; if (c <= a) { a = a - c; n = n + 1; } } return std::make_pair(n, a); } template <EuclideanDomain E> std::pair<E, E> extended_gcd(E a, E b) { E x0(1); E x1(0); while (b != E(0)) { // compute new r and x std::pair<E, E> qr = quotient_remainder(a, b); E x2 = x0 - qr.first * x1; // shift r and x x0 = x1; x1 = x2; a = b; b = qr.second; } return std::make_pair(x0, a); } template <Regular A, Integer N, SemigroupOperation Op> A power_accumulate_semigroup(A r, A a, N n, Op op) { assert(n >= 0); if (n == 0) { return r; } for (;;) { if (odd(n)) { r = op(r, a); if (n == 1) { return r; } } n = half(n); a = op(a, a); } } template <Regular A, Integer N, SemigroupOperation Op> A power_semigroup(A a, N n, Op op) { assert(n > 0); while (!odd(n)) { a = op(a, a); n = half(n); } if (n == 1) { return a; } return power_accumulate_semigroup(a, op(a, a), half(n - 1), op); } template <Integer N> struct modulo_multiply { modulo_multiply(const N& i) : modulus(i) {} N operator()(const N& n, const N& m) const { return (n * m) % modulus; } private: N modulus; }; int main() { using namespace std; typedef boost::multiprecision::number< boost::multiprecision::cpp_int_backend<>, boost::multiprecision::et_off> int_type; int_type p1 = 3440890133; int_type p2 = 4006628849; int_type n = p1 * p2; int_type phi = (p1 - 1) * (p2 - 1); int_type pub = 65537; pair<int_type, int_type> p = extended_gcd(pub, phi); if (p.second != 1) { // pub is not coprime with phi cout << "Please choose another public key!" << endl; return 1; } int_type prv = p.first; if (prv < 0) { prv += phi; } cout << "Public key is (" << pub << ", " << n << ")\n"; cout << "Private key is (" << prv << ", " << n << ")\n"; int_type encrypt("0xB570BF8E4BDABA4C"); cout << hex; cout << "Encrypted message is " << encrypt << endl; cout << "Decrypted message is " << power_semigroup(encrypt, prv, modulo_multiply<int_type>(n)) << endl; }
- Alexander A. Stepanov and Daniel E. Rose: From Mathematics to Generic Programming. Addison-Wesley Professional, 2014. ↩
- Wikipedia: RSA (cryptosystem). ↩
- Victor Shoup: NTL: A Library for doing Number Theory. ↩
- Bruno Haible and Richard B. Kreckel: CLN — Class Library for Numbers ↩
- Cygwin. ↩
- Boost. ↩
- John Maddock and Christopher Kormanyos: Boost.Multiprecision. ↩
-
Cppreference.com:
std::hex
. ↩ - Cppreference.com: User-defined literals. ↩
- Free Software Foundation: GMP — The GNU Multiple Precision Arithmetic Library. ↩
-
Cppreference.com:
std::uppercase
. ↩