105 lines
4.8 KiB
Plaintext
105 lines
4.8 KiB
Plaintext
In embedded applications on low-powered processors, performance is a big
|
|
issue. Using either the KCM or Comba methods as described here can increase
|
|
speeds 4-fold.
|
|
|
|
To use the super-fast KCM (for 2048-bit RSA 1024-bit DH and DSS) and Comba
|
|
(for 1024 bit RSA and GF(p) Elliptic curves) methods you will need to create
|
|
the file mrkcm.c or the file mrcomba.c for inclusion in the MIRACL library.
|
|
This is done by inserting 'macros' from a ?.mcs file into the template files
|
|
mrkcm.tpl, or mrcomba.tpl. This is done automatically using the MEX utility.
|
|
|
|
A c.mcs file is supplied, which contains C macros. Also c1.mcs which uses
|
|
an alternate approach. See also cs.mcs (read the comments at the top).
|
|
If a quad-length type is available (mr_qltype defined in mirdef.h),
|
|
use c2.mcs
|
|
|
|
However the best performance is usually achieved by
|
|
using assembly language macros. This requires your compiler to support in-line
|
|
assembly. For example the file ms86.mcs inserts Pentium assembly language
|
|
macros for use with Microsoft or Borland compilers. The file gcc386.mcs does
|
|
the same for the gcc compiler. If your PC supports SSE2 extensions, for
|
|
example if it is a Pentium 4, then instead use either sse2.mcs or gccsse2.mcs
|
|
(see sse2.txt). The file arm.mcs does the same for the popular 32-bit ARM
|
|
processor. Other .mcs files for other processors/compilers may be available.
|
|
See makemcs.txt for instructions for creating your own.
|
|
|
|
New! The files c.mcs and arm.mcs now allow "interleaved" multiplication steps
|
|
to facilitate improved processor scheduling - see makemcs.txt.
|
|
|
|
The macro expansion is carried out automatically by the supplied program
|
|
MEX.C. You must compile and run this program. If you use the config.c
|
|
utility it will advise you on the parameters to use. Note that although
|
|
config.c should be compiled and run on the target processor, mex.c can be
|
|
compiled and run on any workstation.
|
|
|
|
For example
|
|
|
|
c:>mex 6 ms86 mrcomba
|
|
|
|
creates a file mrcomba.c from mrcomba.tpl and ms86.mcs. The Comba method will
|
|
then be optimised for a modulus of 6*32 = 192 bits on a Pentium computer.
|
|
Typically this might be used for an implementation of elliptic curves over
|
|
GF(p) for p a 192 bit prime. Note that the code generated in mrcomba.c or
|
|
mrkcm.c may benefit to a small extent from some manual post-optimisation.
|
|
Re-ordering instructions may help for certain processors.
|
|
|
|
c:>mex 16 ms86 mrkcm
|
|
|
|
creates a file mrkcm.c from mrkcm.tpl and ms86.mcs. The KCM method will then
|
|
be optimised for moduli of sizes 512, 1024, 2048 bits etc. Typically this
|
|
might be used for a fast implementation of RSA, DSS or Diffie-Hellman.
|
|
|
|
For the Comba method only it is possible to implement special modular
|
|
reduction methods for a modulus p of a particular form. Two types of special
|
|
modulus are supported, Generalised Mersenne Primes, and Pseudo-Mersenne
|
|
Primes. To make use of this feature MR_SPECIAL must be defined in mirdef.h.
|
|
|
|
Generalised Mersenne Primes are also known as Solinas primes. These are of a
|
|
form like for example 2#224-2#96+1. Note that the exponents are multiples of
|
|
a 32-bit word length. Many of the NIST recommended primes are of this form.
|
|
In the file mrcomba.tpl code can be found to implement fast reduction with
|
|
respect to many different GM primes, and for many different word lengths.
|
|
If the particular one you want is not there, it is not hard to implement it
|
|
yourself by manually editing the file mrcomba.tpl
|
|
|
|
Pseudo Mersenne Primes are also known as Crandall Primes. These are of a form
|
|
like for example 2^160-57, where 160 is a multiple of the word length, and the
|
|
constant 57 is small enough to fit into one computer word. Moduli of this
|
|
form are automatically supported if you define MR_PSEUDO_MERSENNE in mirdef.h
|
|
|
|
As always it is best to use config.c, which guides you through all of this.
|
|
|
|
|
|
You will find it valuable to run through this whole process on a standard PC
|
|
using perhaps the Microsoft C/C++ compiler, just to get familiar with the
|
|
config.c and mex.c utilities.
|
|
|
|
If you are embarking on an embedded project using a processor for which a
|
|
.mcs file does not exist, you will have to write your own, or be content with
|
|
the C macros. Note that this approach is likely to be optimal only on
|
|
processors that support an unsigned multiply instruction.
|
|
This is probably the case with the majority of embedded processors (e.g. ARM,
|
|
68000 variants etc). It is also important that the compiler support inline
|
|
assembly, via something like
|
|
|
|
asm(" ");
|
|
|
|
or
|
|
|
|
__asm
|
|
|
|
{
|
|
}
|
|
|
|
constructs in C. However other approaches are possible, for example using C
|
|
"intrinsics" works well for the itanium - see itanium.mcs
|
|
|
|
To write your own .mcs file, use c.mcs or arm.mcs as models. For more
|
|
background read the article ftp.computing.dcu.ie/pub/crypto/timings.doc
|
|
|
|
The macro-expansion mechanism has been designed to make it as easy as
|
|
possible for the developer to write optimal code for best performance.
|
|
See makemcs.txt
|
|
|
|
|