Cryptography
Gimli: a cross-platform permutation
In a joint paper with Daniel J. Bernstein, Stefan Kölbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, François-Xavier Standaert, Yosuke Todo, and Benoît Viguier, we designed and implemented Gimli, a 384-bit permutation that achieves high performance across a broad variety of platforms. For more details and the software and hardware implementations, see the Gimli website.
Kyber: a module-lattice-based KEM
In a joint paper with Joppe Bos, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, John M. Schanck, and Damien Stehlé, we describe Kyber, a high-speed post-quantum-secure key-encapsulation mechanism (KEM), and (authenticated) key-exchange protocols based on module lattices. The software is available from the Kyber github repository.
A high-speed KEM from NTRU
In a joint paper with Andreas Hülsing, Joost Rijneveld, and John Schanck, we describe a high-speed post-quantum-secure key-encapsulation mechanism (KEM) based on NTRU. The software is available from Joost's website.
TESLA signature software
In a joint paper with Erdem Alkım, Nina Bindel, Johannes Buchmann, and Özgür Dagdelen, we describe a conservative, lattice-based signature scheme called TESLA. To download and build the software, run the following commands (you will have to install Python3 first):
tar xjvf tesla-20160930.tar.bz2
cd tesla-20160930/tesla-416/ref && make
cd ../avx2 && make
cd ../../tesla-768/ref && make
cd ../avx2 && make
This will build binaries called test_sign, testvectors and speed in the test/ subdirectories of each respective implementation. For a brief description of those binaries see the README.
Please note that versions previous to 20160930 had a subtle bug in the rounding. Please upgrade to the latest version.
gfverif: fast and easy verification of finite-field arithmetic
Together with Daniel J. Bernstein I am working on gfverif, a tool to verify the correctness of ECC software. For details please refer to the gfverif website.
"Newhope" Ring-LWE-based key exchange
In a paper with Erdem Alkim, Léo Ducas, and Thomas Pöppelmann, we describe a key-exchange protocol based on the Ring-Learning-with-Errors (Ring-LWE) problem. The paper describes two different software implementations: a C reference implementation and an optimized software implementation targeting Intel Haswell processors. To download and build the software follow these instructions:
tar xjvf newhope-20160815.tar.bz2
cd newhope-20160815/ref && make
cd ../avx2 && make
cd ../torref && make
cd ../toravx2 && make
- ref/test/test_newhope,
- ref/test/test_statistical,
- ref/test/testvectors,
- ref/test/speed,
- avx2/test/test_newhope,
- avx2/test/test_statistical,
- avx2/test/testvectors,
- avx2/test/speed.
- torref/test/test_newhope,
- torref/test/test_statistical,
- torref/test/testvectors,
- torref/test/speed,
- toravx2/test/test_newhope,
- toravx2/test/test_statistical,
- toravx2/test/testvectors, and
- toravx2/test/speed.
The test_newhope binary performs various key exchanges and outputs nothing if they are successful. The test_statistical generates many keys without final hashing and prints statistics on the distribution of ones and zeroes in those keys. The testvectors binary runs many key exchanges with deterministic "random-number" generation to ensure compatibility of independent implemtenations and the speed binary performs benchmarks and prints results.
The torref and toravx2 implementations use a different, constant-time, generation of the polynomial a from a seed as described in the paper. The testvectors generated by those implementations are not compatible with the testvectors generated by ref and avx2.
Aside from the implementations of the key exchange, the software package also contains 5 Python scripts in the scripts subdirectory. For details of those scripts please refer to the paper.
A tiny hardware implementation of NaCl's crypto_box
In a joint paper with Michael Hutter, Jürgen Schilling, and Wolfgang Wieser, we describe an ASIC design for NaCl's crypto_box public-key authenticated encryption routine aiming at IoT devices like WISP nodes. The VHDL source code is in the public domain.
High-speed Curve25519 on embedded microcontrollers
In a joint paper with Michael Düll, Björn Haase, Gesine Hinterwälder, Michael Hutter, Christof Paar, Ana Helena Sánchez, and Peter Schwabe we describe high-speed Curve25519 software for AVR ATmega, TI MSP430, and ARM Cortex-M0 microcontrollers. The software is available on the μNaCl website:
SPHINCS: practical stateless hash-based signatures
In a joint paper with Daniel J. Bernstein, Daira Hopwood, Andreas Hülsing, Tanja Lange, Ruben Niederhagen, Louiza Papachristodoulou, and Zooko Wilcox-O'Hearn we describe a stateless hash-based signature system and software for a particular set of parameters that offers 128 bits of security against attacks that include attacks by quantum computers. The software is included in the SUPERCOP benchmark suite in subdirectory crypto_sign/sphincs256. For more details on SPHINCS, see also the SPHINCS website.Fast multiprecision multiplication on AVR ATmega
In a joint paper with Michael Hutter we describe speed-record-setting multiprecision multiplication software for the AVR ATmega family of embedded microcontrollers. This software uses the Karatsuba multiplication technique and covers input sizes of 48, 64, 96, 128, 160, 192, and 256 bits. To download the software and build test routines for the ATmega2560, do the following:
tar xjvf avrmul-20150101.tar.bz2
cd avrmul-20150101/
make
For running the tests we use an Arduino Mega development board. There are two scripts that handle flashing test software and displaying output; both assume that the Arduino is connected at /dev/ttyACM0. The script test/test_atmega2560.sh uploads the hex file test/test_mul_atmega2560.hex, which runs 1000 tests on pseudorandom inputs for each input size. The script test/stack_atmega2560.sh uploads the hex file test/stack_mul_atmega2560.hex, which measures and prints stack usage of the multiplication routines.
Vectorized genus-2 Diffie-Hellman
In a joint paper with Daniel J. Bernstein, Chitchanok Chuengsatiansup, and Tanja Lange we present speed records for 128-bit secure Diffie-Hellman key exchange using vectorized arithmetic on a Kummer surface. The software is included in the SUPERCOP benchmarking framework in the crypto_scalarmult/kummer subdirectory.
Formally verified Curve25519
In joint work with
Yu-Fang Chen,
Chang-Hong Hsu,
Hsin-Hung Lin,
Ming-Hsien Tsai,
Bow-Yaw Wang,
Bo-Yin Yang, and
Shang-Yi Yang
we formally verified hand-optimized assembly code performing the core arithmetic operation in Curve25519.
More specifically, we verified the fully inlined Montgomery ladderstep of the two AMD64 implementations
of Curve25519 presented in the Ed25519 paper.
For details of the verification, see the paper.
To reproduce the verification procedure, you first need to install OCaml (version 4.x or higher),
Coq, and Boolector with minisat support (note that the version of Boolector included in Debian does not have minisat support; also note that Boolector requires
glibc-static, zlib-static, and libstdc++-static which are not installed by default on Fedora/Redhat):
sudo apt-get install coq
wget http://fmv.jku.at/boolector/boolector-1.6.0-with-sat-solvers.tar.gz
tar xzvf boolector-1.6.0-with-sat-solvers.tar.gz
cd boolector-1.6.0-with-sat-solvers/
make
cd ..
tar xjvf verify25519-20140323.tar.bz2
cd verify25519-20140323/translator
make
cd ..
- "translator" contains our qhasm-to-boolector translator,
- "original-qhasm" contains the original qhasm files,
- "annotated-qhasm" contains the annotated qhasm files that the translator uses as input,
- "coq" contains the input files for the Coq proof system.
cd vc
for f in ./*;do ../../boolector-1.6.0-with-sat-solvers/boolector/boolector -minisat $f;done
coqc Modulo.v
coqtop < proposition-1.v
coqtop < proposition-2.v
If you are only interested in the btor input files and the heuristics as described in the paper, you can simply download the archive of btor files, which also includes a README with explanations.
TweetNaCl
TweetNaCl is a cryptographic library that offers the all the 25 NaCl functions used by applications and uses only 100 tweets. TweetNaCl is joint work by Daniel J. Bernstein, Wesley Janssen, Tanja Lange, and Peter Schwabe. For more information about TweetNaCl, see
- the official TweetNaCl website,
- the TweetNaCl Twitter account with the source code in 100 tweets, and
- the TweetNaCl paper.
NaCl for AVR microcontrollers
If you are interested in using NaCl for AVR, please refer to the most recent version of μNaCl.
The software listed below is still available as additional material accompanying the Africacrypt 2013 paper.
In the paper NaCl on 8-bit AVR microcontrollers we describe implementations of the cryptographic primitives used in the core of the Networking and Cryptography library (NaCl) for AVR microcontrollers. Specifically, the software implements the following functions from the NaCl API:
- crypto_stream (crypto_stream_salsa20)
- crypto_stream_xor (crypto_stream_xor_salsa20)
- crypto_onetimeauth (crypto_onetimeauth_poly1305)
- crypto_onetimeauth_verify (crypto_onetimeauth_verify_poly1305)
- crypto_verify16 (crypto_verify16)
- crypto_verify32 (crypto_verify32)
- crypto_hash (crypto_hash_sha512)
- crypto_scalarmult (crypto_scalarmult_curve25519)
- crypto_scalarmult_base (crypto_scalarmult_base_curve25519)
- crypto_sign_keypair (crypto_sign_keypair_ed25519), insecure, only for testing
- crypto_sign (crypto_sign_ed25519)
- crypto_sign_open (crypto_sign_open_ed25519)
The library comes in two flavors, one optimized for speed, one optimized for small ROM usage. To build the high-speed version, do the following (for the small-ROM version everything works analogously):
tar xjvf avrnacl-20130514.tar.bz2
cd avrnacl-20130514/highspeed
make
This will produce the libnacl.a in the obj/ subdirectory and various tests in the test/ subdirectory for the ATmega2560 microcontroller. To run various unit tests on an ATmega2560, simply run
This will copy the testing binary to an Arduino MEGA (connected on
/dev/ttyACM0) and run the tests. The results of the tests are sent
through the serial device and read and evaluated automatically.
Similary, to perform speed benchmarks, run
and to run stack-usage benchmarks, run
Practical post-quantum (lattice-based) signatures
The paper Software speed records for lattice-based signatures describes an implementation
of lattice-based signatures at a 100-bit security level. The software is available either as
submission to the eBACS benchmarking project or as standalone package.
The link to the eBACS submission is http://cryptojedi.org/crypto/data/lattisigns512avx-20130329-ebacs.tar.bz2
To download and build the standalone package, do the following:
tar xjvf lattisigns512avx-20130329.tar.bz2
cd lattisigns512avx-20130329
make
This will produce two binaries in the test/ subdirectory: test_sign performs various tests and prints zeros if they are successful; speed will perform various performance measurements. Note that in order to build and run the software you will need a processor that supports the AVX vector-instruction set.
Fast cryptography for ARM CPUs with NEON vector instructions
The paper NEON crypto describes high-performance software of various
symmetric and asymmetric cryptographic primitives for the ARM Cortex-A8 processor and other CPUs that support
the NEON vector instruction set. Most of the software (all but the Ed25519 signatures) are included in the
SUPERCOP benchmarking suite since version 20120918.
Different version of the Salsa20 software can be found in the subdirectories
crypto_stream/salsa20/armneon,
crypto_stream/salsa20/armneon2,
crypto_stream/salsa20/armneon3, and
crypto_stream/salsa20/armneon6.
The Poly1305 authentication software is in the subdirectory crypto_onetimeauth/poly1305/neon2.
The Curve25519 ECDH software is in the subdirectory crypto_scalarmult/curve25519/neon2.
The Ed25519 digital-signature software is going to be included in SUPERCOP, soon.
We are planning to include all these implementations in the next release of the Networking and Cryptography Library (NaCl).
ARM11 implementations of all SHA-3 finalists
The paper SHA-3 on ARM11 processors describes optimized assembly implementations of the 256-bit-output versions of all SHA-3 finalists and of SHA-256. These implementations are included in the arm11 subdirectories of the respective hash function implementations in SUPERCOP since version 20111120. For example the implementation of Blake256 is in the subdirectory crypto_hash/blake256/arm11/.
NaCl: Networking and Cryptography library
The NaCl library is designed to offer high-speed high-security cryptographic protection of network communication.
Initial development of the library was done as part of the EU FP7 project
Computer Aided Cryptography Engineering (CACE).
Further development of the library is part of the European Network of Excellence in Cryptology II (ECRYPT II).
For information about the library, see
- http://nacl.cr.yp.to: The official website of the library;
- the paper The security impact of a new cryptographic library describing how NaCl avoids various security disasters of other cryptographic libraries;
- the paper Curve25519: new Diffie-Hellman speed records. describing the key exchange protocol included in the NaCl library;
- Daniel J. Bernstein's website on Salsa20, the stream cipher used by the NaCl library;
- the paper Poly1305 describing the MAC algorithm included in the NaCl library;
- the paper High-speed high-security signatures describing the signatures scheme provided by the NaCl library;
- The corresponding website of the signature software; and
- the paper Faster and Timing-Attack Resistant AES-GCM. describing the AES implementation included in the NaCl library.
Ed25519: high-speed high-security signatures
The paper High-speed high-security signatures describes a public-key signature scheme with an implementation that sets new speed records for (batched) verification of elliptic-curve signatures. We put the software described in the paper into the public domain. It can be found in the SUPERCOP benchmarking suite from version 20110629 in the crypto_sign/ed25519/ subdirectory. The software will be furthermore be included in future versions of the NaCl library. For more details also see Daniel J. Bernstein's Ed25519 website.
Warning: Early versions of the software had a bug in the carry chain after multiplication and squaring in the amd64-64-24k implementation.
Tests with random inputs will not catch the bug, it is triggered with with very low probability.
This bug is in the software contained in SUPERCOP up to version 20130419.
The bug also affects the crypto_scalarmult/curve25519/amd64-64 implementation in SUPERCOP up to version 20130419.
RFSB509: really fast syndrome-based hashing
In the paper Really fast syndrome-based hashing we describe a family of collision-resistant compression functions that can be used as the core building block for fast and secure cryptographic hash functions. The paper also describes a software implementation that uses a particular choice of parameters targeting the 128-bit security level. This implementation of RFSB509 is included in SUPERCOP in the subdirectory crypto_hash/rfsb509.
DCLXVI: Pairing computation on AMD64 processors
In the paper New software speed records for cryptographic pairings we describe
an implementation of cryptographic pairings over a Barreto-Naehrig curve that sets new speed records for cryptographic
pairings on many amd64 processors. We put the software into the public domain.
To download and build the latest version of the software do the following:
tar xjvf dclxvi-20130329.tar.bz2
cd dclxvi-20130329/
make
This will produce 10 binaries: bilintest-check, bilintest-c, bilintest-as, speedtest-check, speedtest-c, and speedtest-as,
test_curvepoint_multiscalar-as, test_curvepoint_multiscalar-check, test_twistpoint_multiscalar-as, test_twistpoint_multiscalar-check.
The binaries ending on -check perform all arithmetic with included overflow check as described in the paper. The files ending on
-c are non-optimized implementations in C and the implementations ending on -as have all speed-critical functions implemented in
qhasm.
The bilintest binaries perform NTESTS bilinearity (and non-degeneracy) tests, the value NTESTS is defined in the Makefile.
The speedtest binaries measure the cycles counts of the most speed-critical functions.
Since version 20130329, DCLXVI also has faster (but not timing-attack protected!) functions for scalar multiplication
and multi-scalar multiplication on the curve and the twist. The test_curvepoint_multiscalar and test_twistpoint_multiscalar
binaries perform simple tests of this multi-scalar multiplication on curve and twist, respectively.
Caution: The software as described in early versions of the paper
(versions 2010-05-28 [pdf] and 2010-04-06 [pdf])
has a bug: The parameter u=1966080 used to construct the curve we
use for the pairing does not generate a Barreto-Naehrig curve; the
order of the group E(F_{p}) is not prime.
We thank Francisco Rodríguez- Henríquez and Jean-Luc Beuchat
for pointing out this bug.
The software from version 20100618 now uses a different curve generated with u=1868033,
the speed of this updated software is similar, actually even slightly faster.
For verification of the performance numbers the old version
of the software is still available as dclxvi-notsecure.tar.bz2;
do not use this software for cryptographic purposes.
Update:
An even faster pairing implementation
has been presented by Jean-Luc Beuchat, Jorge Enrique González Díaz, Shigeo Mitsunari, Eiji Okamoto,
Francisco Rodríquez-Henríquez and Tadanori Teruya.
But our software is a bit (read: one bit) more secure.
FSBday: a parallel implementation of Wagner's generalized birthday attack
In the paper FSBday: Implementing Wagner's generalized birthday attack against the SHA-3 round-1 candidate FSB we describe an implementation of Wagner's generalized birthday attack that runs on a cluster of 8 computers to find a collision in the compression function of the toy version FSB_{48} of the hash function FSB. The requirements to run this code are the following:
- 8 computers with a 64-bit processor, running a 64-bit Linux system,
- 8 GB of RAM on each of the computers,
- a hard disk with a free 700-GB data partition in each of the computers, and
- some network connection between the computers (our attack used switched Gigabit ethernet).
Installing MPICH-2
The implementation uses the MPI implementation
MPICH-2.
We followed this installation guide
to set up MPICH-2 on our cluster machines running Ubuntu 9.04 (aka Jaunty Jackalope).
As a remark: We had problems with the setup of MPI when the hostname was mapped to 127.0.0.1 in /etc/hosts,
we just removed this entry.
Downloading and running our code
To download and compile the code run
tar xjvf fsbday-20091213.tar.bz2
cd fsbday-20091213
make
If the data partition for the attack is not /dev/sda1, change the definition of IODEVICE in params.h accordingly before compiling, otherwise running the attack may destroy data on /dev/sda1!
This will produce a binary called fsbday, which you have to copy to each of the cluster nodes in the same directory. For this you can use the bash script mpicp.sh.
Running Phase 1 (Determine clamping constants)
The first phase of the attack (for the description of the two phases please read the paper) is started through the bash script fsbstart.sh, before running the script you might have to change the value of the HOSTFILE (the path to the MPI host file, set to ./hosts by default) in the script. The script takes as only argument the path to a logfile, so the first phase of the attack can be started with
The only output on stdout is the clamping constants tried and whether they lead to success, more information including the values yielding a collision can be found in the log file.
Running Phase 2 (Compute colliding matrix positions)
To determine the matrix positions from the clamping constants you have to run the program fsbfinal twice, once for the left and once for the right half-tree. The fsbfinal program needs as input the clamping constants used to generate the lists on level 0 and the value on level 3 that leads to success. The clamping constants are output on standard output, the value you will find in the logfile n a line looking like
when you grep for "Result". The first number (before the colon) is the number of the node.
Let's assume the clamping constants in the rightmost quarter-tree are you determined in phase 1 are 4, 4, 0, 0,
and let's assume the value as in the line above, then you first have to convert 54e28b37d58e75 to decimal, yielding 23892985608769141,
same for 781538a2cdf5639d yielding 8652884530953544605 and for
the part 1f8 yielding 504.
You then run fsbfinal as follows
mpirun -l -machinefile hosts -np 8 ./fsbfinal --nlists=8 --startlevel=0 --startlist=8 --cmp=23892985608769141 --remaining=8652884530953544605 --part=504 --node=0 0 0 0 0 0 0 4 4
This will print the positions (relative positions in the according blocks and half-blocks) to stderr.
For information about how to change to code to other attack scenarios please refer to the FSBday website.
A much more efficient way to find collisions in the full hash function FSB_{48} is Pollard's rho algorithm, i.e. a generic attack that works against any hash function. To download and run the code of such a simple attack against the full FSB_{48} hash function do the following:
tar xjvf fsb48-pollard.tar.bz2
cd fsb48-pollard
make
./fsb48-pollard
Cleaned-up implementation of FSB-256
Based on the reference implementation for ebash, I implemented a somewhat cleaned-up, faster version of FSB-256.
Curve25519 for the Cell Broadband Engine
In joint work with Neil Costigan
I implemented a version of the elliptic-curve Diffie Hellman key exchange software curve25519 for the Cell Broadband
Engine. There is a paper describing the details of the implementation.
The software is available for download; it's following the eBATS API.
Cache-timing resistant AES-GCM for 64-bit Intel processors
Emilia Käsper implemented very fast and cache-timing-attack resistant AES in counter mode. I ported this code
to qhasm and we jointly implemented AES-GCM based on this fast AES code. For details of the implementation please refer to the
paper "Faster and Timing-Attack Resistant AES-GCM".
The code follows the
eSTREAM API, the following versions are avaible:
- AES-CTR, constant-time key setup,
- AES-CTR, non-constant-time key setup,
- AES-GCM, constant time,
- AES-GCM, fast (not constant time).
The assembly versions of the code are available on Emilia's website.
The qhasm version uses macros, so you will need
maq, a preprocessor for qhasm, to rebuild the code (the resulting assembly file is included).
You will furthermore need an extended AMD64 machine-description file for qhasm, it just goes to the qhasm directory.
AES implementations
D. J. Bernstein and I implemented AES in Counter mode for different architectures, namely ppc32, x86, sparcv9 and amd64. For several microarchitectures this code is setting
new speed records. Details about how these speed records are achieved can be found in our joint paper "New AES software speed records" (PDF).
The software is now available as part of the
estreambench
toolkit (in the subdirectory /benchmarks/aes/aes-128/schwabe).
We have placed the software into the public domain;
feel free to integrate it into your own AES applications!
HECTOR: fast hyperelliptic-curve-based key exchange
HECTOR is an implementation of a key exchange and a signature scheme based on a hyperelliptic curve over the
binary field F_{2113} for the
eBATS project.
HECTOR is joint work with Peter Birkner.
For details of the implementation please refer to the documentation in pdf format.
To download and build HECTOR, first install the GMP library, then do the following:
tar xjvf hector.tar.bz2
cd hector
make
This builds two binaries, hector_dh and hector_sig, both need random bytes as input, they can for example be started with
./hector_sig < /dev/urandom
respectively.
BN curves
A BN curve is an elliptic curve E: Y^{2}=X^{3}+b defined over F_{p} such that the group of F_{p}-rational points has prime order n. BN curves have embedding degree k=12 with respect to n. The BN family of pairing-friendly curves is parametrized by the following polynomials in the indeterminate u:
p=p(u)=36u^{4}+36u^{3}+24u^{2}+6u+1,
n=n(u)=36u^{4}+36u^{3}+18u^{2}+6u+1.
The trace of Frobenius over F_{p} is given as t(u)=6u^{2}+1. To find a BN curve
just choose integer values of the suitable size for u until both p(u) and n(u) are prime. Then test
values for the coefficient b until the curve has the correct order. A generator of the
group E(F_{p}) is any nonzero F_{p}-rational point, but it can be chosen to be
G=(1,y) for y a square root of b+1.
For more detailed information see [1]. BN curves have sextic twists and are therefore suitable for very efficient pairing
implementation. It is also possible to compute pairings on BN curves in a compressed form, see [2].
[1] | Paulo S. L. M. Barreto, Michael Naehrig, Pairing-Friendly Elliptic Curves of Prime Order, Selected Areas in Cryptography -- SAC'2005, Lecture Notes in Computer Science 3897, Springer-Verlag (2006), pp 319--331. Preliminary version: Cryptology ePrint Archive, Report 2005/133. |
[2] |
Michael Naehrig, Paulo S. L. M. Barreto, Peter Schwabe: On compressible pairings and their computation, Progress in Cryptology - AfricaCrypt 2008 , Lecture Notes in Computer Science 5023, Springer-Verlag (2008), pp 371--388. Cryptology ePrint Archive, Report 2007/429. |
Pairing computation on BN curves
bn256 is an implementation of different cryptographic pairings on a 256 bit BN curve.
In order to build it, you need the GMP library with header files and makedepend, for Debian (Lenny) systems these can be installed with
For speed measurement we include cpucycles written by D. J. Bernstein.
To build bn256, do the following:
tar xjvf bn256-20090824.tar.bz2
cd bn256-20090824
make
After the build process has finished you find a binary called bn256 in the bin/ directory, whose usage should be self-explanatory.
From the corresponding source file src/bn256.c you can see how to call the functions for pairing computation.
We note, that the generation of random points is based on the GMP functions for random number generation and must not be used for cryptographic purposes.
Parts of this code have been developed in the context of a UMIC project at the Institute for Theoretical Information Technology at RWTH Aachen University.