Quickstart

intops is a Nim library that implements multi-precision arithmetic operations for integers. Unlike Nim’s stdlib, intops offers multiple flavors for each operations and multiple implementations of each flavors. On top of that, intops chooses the best implementation for each OS, CPU architecture, and C compiler combo.

intops is designed for Nim developers working on bigint-related projects, e.g. crypto libraries.

What’s included in the package

intops implements the following operations for signed and unsigned 32- and 64-bit integers:

	addition	subtraction	multiplication	division	muladd
overflowing	Nim, intrinsics	Nim, intrinsics
saturating	Nim, intrinsics, ASM	Nim, intrinsics, ASM
carrying	Nim, intrinsics, C
borrowing		Nim, intrinsics, C
widening			Nim, intrinsics, C		Nim, intrinsics, C
narrowing				Nim, intrinsics, ASM

Overflowing operations return an explicit didOverflow bool flag that tells you if an overflow happened during the operation. These operations wrap for both signed and unsigned integers.

Saturating operations do not return an overflow flag but silently return the highest or the lowest available value for a given type if an overflow is to occur.

Carrying and borrowing operations accept two operands and a flag that determines if an overflow happened earlier and should be taken into account. These operations return the new flag value along with the calculated result.

Widening operations accept two single-word operands and return a two-word results, i.e. widen the resulting type compared to the input type.

Narrowing operations do the opposite: accepting a two-word operand, they return a single-word result.

Additionally, intops ships with composite operations that purely combine other primitives to offer convenience operations for cryptography calculations:

Square and accumulate (mulDoubleAdd2)
Column accumulator for Comba multiplication (mulAcc)

Installation

$ nimble install -y intops

Add intops to your .nimble file:

requires "intops"

Usage

The most straightforward way to use intops is by importing intops and calling the operations from it:

import intops

echo carryingAdd(12'u64, 34'u64, false)

Output:

(res: 46, carryOut: false)

intops.carryingAdd is a dispatcher. When invoked, it calls the best implementation of this operation out of the available ones for the given environment.

Notice that we call carryingAdd with uint64 type set explicitly. This is important because int can mean different things under different circumstances and so intops doesn’t allow this kind of ambiguity.

If you try to call carryingAdd with an int, your code simply won’t compile:

echo not compiles carryingAdd(12, 34, false)

Output:

true

All available dispatchers are listed in the Imports section of the API docs for intops module.

The operations are grouped into families, each one living in a separate submodule. For example, carryingAdd operation mentioned above is imported from intops/ops/add submodule, so this is where you find its documentation: /apidocs/intops/ops/add.html.

Calling Specific Operations

import intops imports all available operations.

If you only need specific operations, you can import them individually:

import intops/ops/[add, sub]

echo compiles carryingAdd(12'u64, 34'u64, false)
echo not compiles wideningMul(12'u64, 34'u64)

Output:

true
true

Composite Operations

On top of the primitive operations like addition and multiplication, intops offers convenience operations that combine several primitives as a single composite operation:

import intops/ops/composite

echo mulDoubleAdd2(12'u64, 34'u64, 45'u64, 78'u64, 90'u64)

Output:

(t2: 0, r1: 78, r0: 951)

Read more in the API docs for composite module.

Calling Specific Implementations

You may want to override the dispatcher’s choice. To do that, import a particular implementation from intops/impl directly and call the function from it:

import intops/impl/intrinsics

echo intrinsics.gcc.carryingAdd(12'u64, 34'u64, false)

Output:

(46, false)

Implementations are also grouped into families: pure Nim, C intrinsics, inline C, and inline Assembly. Each family can be further split into subgroups.

For example, the carryingAdd implementation above is based on C intrinsics that are specific to GCC/Clang. All such implementations live in intops/impl/intrinsics/gcc. There’s also a subgroup that provides an implementation based on Intel/AMD specific C intrinsics—intops/impl/intrinsics/x86.

To see all available implementations for a particular operation, find it in the API index.

Caveat

When you use a dispatcher, you can be sure that your code will compile in any environment. It is the dispatcher’s job to provide a code path for any case, so you don’t have to worry about it.

But if you choose to call an implementation manually, it is your job to validate that this implementation is available in the environment you’re using it in. If you attempt to use an implementation that is unavailable, you’ll get a compile-time error.

For example, trying to use an Inline Assembly implementation with intopsNoInlineAsm flag (more on those in the section below) will cause a compilation error:

import intops/impl/inlineasm

echo not compiles inlineasm.x86.carryingAdd(12'u64, 34'u64, false)

If compiled with -d:intopsNoInlineAsm, this outputs:

true

Compilation Flags

You can control which implementations are forbidden to be picked by dispatchers using the compilation flags:

intopsNoIntrinsics
intopsNoInlineAsm
intopsNoInlineC

For example, to avoid using Inline Assembly implementations, compile your code with -d:intopsNoInlineAsm:

$ nim c -d:intopsNoInlineAsm mycode.nim

Of course, you can combine those flags. For example, if you want to use only pure Nim implementations, pass all three forbidding flags:

$ nim c -d:intopsNoIntrinsics -d:intopsNoInlineAsm -d:intopsNoInlineC mycode.nim

Contributor’s Guide

Optimizing arithmetic operations for a variety of enviroments and consumers is a never-ending chase. There hardly will ever be a moment when intops will be 100% ready: there’s always an older Nim version, and newer GCC version, or an obscure CPU vendor to target.

Because of that, contributors’ help is essential for intops’ development. You can help improve intops by improving the code, improving the docs, and reporting issues.

Library Structure

src
│   intops.nim              <- entrypoint for the public API
│
└───intops
    │   consts.nim          <- global constants for environment detection
    │
    ├───impl                <- implementations of primitives
    │   │   inlineasm.nim   <- entrypoint for the Inline ASM family of implementations
    │   │   inlinec.nim
    │   │   intrinsics.nim
    │   │   pure.nim
    │   │   ...
    │   │
    │   ├───inlineasm
    │   │       arm64.nim   <- implementation in ARM64 Assembly 
    │   │       x86.nim
    │   │       ...
    │   │
    │   └───...
    │
    └───ops                 <- operation families
            add.nim         <- addition flavors: carrying, saturating, etc.
            mul.nim
            sub.nim
            ...

The entrypoint is the root intops module. It’s the library’s public API and exposes all the available primitives.

Each operation family has its own submodule in intops/ops. E.g. intops/ops/add is the submodule that contains various addition flavors.

These submodules contain the dispatchers that pick the best implementation of the given operation and for the given CPU, OS, and C compiler. I.e., each operation “knows” its best implementation and “decides” which one to run.

The actual implementations are stored in submodules in intops/impl. For example, intops/impl/intrinsics contains all primitives implemented with C intrinsics.

Composite Operations

Operations defined in ops/composite.nim are a little special. Their purpose is to provide conveienvce operations that glue other ones together. Think of them as syntactic sugar for calling several primitives in a row.

Dispatchers in ops/composite.nim to not have the typical when branching you see in the primitive dispatchers. Instead they are just templates that define the operations to be called and let their dispatchers decide which implementation to use.

API Conventions

intops follows the common Nim convention of calling things in camelCase.
intops prefers pure functions that return values to the ones that modify mutable arguments.
The docstrings are mandatory for the dispatchers in intops/ops modules because this is the library public API.
Operations the return a wider type that the input type are called widening. For example wideningMul is multiplication that takes two 64-bit integers and return a single 128-bit integer (althouth represented as a pair of 64-bit integers).
Operations that return a carry or borrow flag are called carrying and borrowing, e.g. carryingAdd.
Operations that return an overflow flag are called overflowing, e.g. overflowingSub.
Operations that return maximal or mininal type value when a type border is hit are called saturating, e.g. saturatingAdd.
Carry, borrow, and overflow flags are booleans.

Tests

The tests for intops are located in a single file tests/tintops.nim.

These are integration tests that emulate real-lfe usage of the library and check two things:

every dispatcher picks the proper implementation on any environment
the results are correct no matter the implementation

To run the tests locally, use nimble test command.

With this command, the tests are run:

without compilation flags in runtime mode
without compilation flags in compile-time mode
with each compilation flag separately
with all compilation flags

When executed on the CI, the tests are run against multiple OS, C compilers, and architectures:

amd64 + Linux + gcc 13
amd64 + Linux + gcc 14
amd64 + Windows + clang 19
i386 + Linux + gcc 13
amd64 + macOS + clang 17
arm46 + macOS + clang 17

This is hardly reproduceable locally, but you can cover at least some of the cases by passing flags to nimble test. For example, to emulate running the tests in a 32-bit CPU, run nimble --cpu:i386 test. On Windows, you also must pass the path to your GCC installation, e.g. nimble --gcc.path:D:\mingw32\bin\ --cpu:i386 test. To run your tests contunously during development, use monit:

Install monit with nimble install monit.
Create a file called .monit.yml in the working directory:

%YAML 1.2
%TAG !n! tag:nimyaml.org,2016:
---
!n!custom:MonitorConfig
sleep: 1
targets:
  - name: Run tests
    paths: [src, tests]
    commands:
      - nimble test
      - nimble --gcc.path:D:\mingw32\bin\ --cpu:i386 test
    extensions: [.nim]
    files: []
    exclude_extensions: []
    exclude_files: []
    once: true

Run monit run

Benchmarks

Benchmarking is crucial for a library like intops: you can’t really do any reasonable dispatching improvement if you can’t argue about the changes with numbers.

There are two kinds of benchmarks: latency and throughput.

Latency benchmarks measure how long a particular operation takes to complete. For a latency benchmark, we run the same operation against random input many times making sure the next iteration doesn’t start before the previous one completes. The result is measured in nanoseconds per operation.

Throughput benchmarks measure how many operations of a particluar kind can be executed per unit of time. For a throughput benchmark, we spawn the same operation against random input many times back to back so that multiple instances of the same operation are executed in parallel. The results are measured in millions of operations per second.

Benchmarks are grouped by kind and operation family, e.g. benchmarks/latency/add.nim contains latency benchmarks for the add operations (overflowingAdd, saturatingAdd, etc.).

To run the benchmarks locally, use nimble bench command:

run latency and throughput benchmarks for all operations:

$ nimble bench

run latency and throughput benchmarks for particular operations:

$ nimble bench add
$ nimble bench sub mul

run latency or throughput benchmarks for all operations:

$ nimble bench --kind:latency
$ nimble bench --kind:throughput

run particular kind of benchmarks on a particular kind of operations:

$ nimble bench --kind:latency add
$ nimble bench --kind:throughput sub mul

Writing Benchmarks

intops ships with a ready-to-use test harness for latency and throughput benchmarks available in benchmarks/utils.nim:

measureLatency and measureThroughput templates run the actual code to be measured and produce the output (stdout and results.json).
benchTypesAndImpls template lets you measure all availalbe implementations against 32- and 64-bit types.

Here’s an existing example of a latency benchmark for multiplication from benchmarks/latency/mul.add with additional comments:

# We need this for `alignLeft`.
import std/strutils

# Import all available implementations.
import intops/impl/[pure, intrinsics, inlinec, inlineasm]

# Import the test harness.
import ../utils

# In this template, we define how the benchmakr must be set up, run, and wrapped up.
# The convention is to call these templates `bench(Latency|Throughput)<OperationKind>`.
# The template accepts a type and a fully-qualified operation name.
template benchLatencyWidening*(typ: typedesc, op: untyped) =
  let opName = astToStr(op)

  # Check is the given operation compiles in the current environment.
  when not compiles op(default(typ), default(typ)):
    echo alignLeft(opName, 35), " -"
  elif typeof(op(default(typ), default(typ))[0]) isnot typ:
  # Check that we're not falling back to an operation for a different type,
  # e.g. if we call the operation on uint32, we don't want the uint64 variant to be called.
    echo alignLeft(opName, 35), " -"
  else:
    # This is the actual meat of the benchmark.
    measureLatency(typ, opName):
      # First, we define and initialize the variables we'll need during the benchmark.
      # `inputsA` (and several more, not used here) is an array of random data provided by `measureLatency`.
      var
        currentA {.inject.} = inputsA[0]
        flush {.inject.}: typ
    do:
      # Second, we define the actual benchmarking logic.
      # In this example, since we're measuring latency, we must ensure that iterations happen one after another,
      # so the next value must depend on the previous calculation result.
      # `op` is the operation we're benchmarking.
      let (hi, lo) = op(currentA, inputsB[idx])
      currentA = hi
      flush = flush xor cast[typ](lo)
    do:
      # Finally, we do what's necessary to properly wrap up the benchmark.
      # Typically, we'll call `doNotOptimize` to force the compiler not to optimize away unused variables.
      # `doNotOptimize` is defined in `benchmarks/utils.nim` and tricks the compiler to believe the variable
      # is used in an Assembly block.
      doNotOptimize(flush)

# Wrap the template call in a `noinline` function so that the Nim compiler wouldn't produce
# one huge source code file; this would affect the benchmark results.
proc runLatencyWidening() {.noinline.} =
  # `benchTypesAndImpls` measures a given operation (e.g. `wideningMul`) with a given benchmark
  # routine (e.g. `benchLatancyWidening`) against all available implementations and types. 
  benchTypesAndImpls(benchLatencyWidening, wideningMul)

when isMainModule:
  # This is purely for the stdout reading convenience.
  echo "\n# Latency, Multiplication"

  # Call the benchmarking function.
  runLatencyWidening()

Docs

The docs consist of two parts:

the book (this is what you’re reading right now)
the API docs

The book is created using mdBook.

The API docs are generated from the source code docstrings.

To build the docs locally, run:

nimble book to build the book
nimble apidocs to build the API docs
nimble docs to build both

Operations

Improving Existing Operations

intops’ public API exposes dispatchers for each available operation.

Dispatcher is a Nim template that contains the logic used to select the best available implementation of the given operation for the given environment.

For example, carryingAdd mentioned in Quickstart is a dispatcher.

Let’s examine the code of this dispatcher:

template carryingAdd*(a, b: uint64, carryIn: bool): tuple[res: uint64, carryOut: bool] =
  when nimvm:
    pure.carryingAdd(a, b, carryIn)
  else:
    when cpuX86 and compilerMsvc and canUseIntrinsics:
      intrinsics.x86.carryingAdd(a, b, carryIn)
    elif compilerGccCompatible and canUseIntrinsics:
      intrinsics.gcc.carryingAdd(a, b, carryIn)
    elif cpu64Bit and compilerGccCompatible and canUseInlineC:
      inlinec.carryingAdd(a, b, carryIn)
    else:
      pure.carryingAdd(a, b, carryIn)

As you can see, a dispatcher is just a nested when-condition that checks if:

the operation called during compilation (when nimvm)
the code is run on a particular CPU (when cpuX86) and with a particular C compiler (and compilerMsvc)
particular compilation flags were passed (and canUseIntrinsics)

Depending on these conditions, a particular implementation is called.

If you want to improve how intops chooses an implementation, find the corresponding dispatcher and modify the branching in it. All dispatchers are defined in intops/ops modules and are grouped by operation. For example, the dispatchers for addition flavors are defined in intops/ops/add.nim.

In the dispatchers, you can use the global constants defined in intops/consts.nim to check for the CPU architecture, C compiler, etc. If necessary, feel free to define new constants.

For example, if you want to prioritize inline C implementation over intrinsics, you could modify the dispatcher like so:

template carryingAdd*(a, b: uint64, carryIn: bool): tuple[res: uint64, carryOut: bool] =
  when nimvm:
    pure.carryingAdd(a, b, carryIn)
  else:
-   when cpuX86 and compilerMsvc and canUseIntrinsics:
+   when cpu64Bit and compilerGccCompatible and canUseInlineC:
+     inlinec.carryingAdd(a, b, carryIn)
+   elif cpuX86 and compilerMsvc and canUseIntrinsics:
      intrinsics.x86.carryingAdd(a, b, carryIn)
    elif compilerGccCompatible and canUseIntrinsics:
      intrinsics.gcc.carryingAdd(a, b, carryIn)
-   elif cpu64Bit and compilerGccCompatible and canUseInlineC:
-     inlinec.carryingAdd(a, b, carryIn)
    else:
      pure.carryingAdd(a, b, carryIn)

Adding New Operations

Adding an operation means doing two things:

Adding a pure Nim implementation for the new operation. Pure Nim implementations are universal fallbacks for all operations because they are guaranteed to compile everwhere Nim code can compile regardless of the environment. Pure Nim implementations are defined in intops/impl/pure.nim.
Adding a dispatcher that exposes this implementation. Find the corresponding module in intops/ops (or create a new one) and add the dispatcher there.

For example, let’s define a new addition flavor called magic addition which adds two uint64 integers and adds the number 42 to the sum (this is our magic component).

In intops/impl/pure.nim:

func magicAdd*(a, b: uint64): uint64 =
  a + b + 42

In intops/ops/add.nim:

template magicAdd*(a, b: uint64): uint64 =
  ## Docstring is mandatory for dispatchers.

  pure.magicAdd(a, b)

Adding New Operation Families

If you’re not just adding a new operation to an existing module but adding a new module to intops/ops, you must also expose it in intops.nim so that it becomes part of the public API.

For example, you’ve added a new module intops/ops/magicadd.nim, do this in intops.nim:

- import intops/ops/[add, sub, mul, muladd, division, composite]
+ import intops/ops/[add, sub, mul, muladd, division, composite, magicadd]

- export add, sub, mul, muladd, division, composite
+ export add, sub, mul, muladd, division, composite, magicadd

Implementations

Improving Existing Implementations

In a perfect world, a pure Nim implementation would be enough to cover every operation: the Nim compiler would generate the optimal C code and the C compiler would generate the optimal Assembly code for every environment.

In reality, this isn’t always so. Since there are so many combinations of a Nim version, OS, CPU, and C compiler, there are inevitable performance gaps that need to be filled manually.

This is why most operations in intops have multiple implementations and dispatchers exist.

For improve an existing implementation, find its module in intops/impl and modify the code there. Some implementation families are represented as a single module (e.g. intops/impl/inclinec), some are split into submodules (e.g. intops/intrinsics/x86.nim and intops/intrinsics/gcc.nim).

Adding New Implementations

If you want to provide a new implemtation for an existing operation:

Add a new function to the corresponsing intops/impl submodule (or create a new one).
Update the corresponding dispatcher →

For example, let’s implement magic addition from the previous chapter in C.

In intops/impl/inlinec.nim:

# This is a guard that prevents the compilation in unsupported environments.
# In this example, we explicitly say that this implementation works only with 64-bit CPUs.
# Guards are necessary for the case where a user calls this function directly circumventing
# the logic in `ops/add.nim`.
when cpu64Bit:
  func magicAdd*(a, b: uint64): uint64 {.inline.} =
    var res: uint64

    {. emit: "`res` = `a` + `b` + ((unsigned __int64)42);" .}

    res

In intops/ops/add.nim:

template magicAdd*(a, b: uint64): uint64 =
  ## Docstring is mandatory for dispatchers.

- pure.magicAdd(a, b)

+ when nimvm:
+   pure.magicAdd(a, b)
+ else:
+   when cpu64Bit and canUseInlineC:
+     inlinec.magicAdd(a, b)
+   else:
+     pure.magicAdd(a, b)

Keyboard shortcuts

intops