========================================
numpy.i: a SWIG Interface File for NumPy
========================================

:Author:      Bill Spotz
:Institution: Sandia National Laboratories
:Date:        29 March, 2007

.. contents::

Introduction
============

The Simple Wrapper and Interface Generator (or `SWIG
<http://www.swig.org>`_) is a powerful tool for generating wrapper
code for interfacing to a wide variety of scripting languages.
`SWIG`_ can parse header files, and using only the code prototypes,
create an interface to the target language.  But `SWIG`_ is not
omnipotent.  For example, it cannot know from the prototype::

    double rms(double* seq, int n);

what exactly ``seq`` is.  Is it a single value to be altered in-place?
Is it an array, and if so what is its length?  Is it input-only?
Output-only?  Input-output?  `SWIG`_ cannot determine these details,
and does not attempt to do so.

Making an educated guess, humans can conclude that this is probably a
routine that takes an input-only array of length ``n`` of ``double``
values called ``seq`` and returns the root mean square.  The default
behavior of `SWIG`_, however, will be to create a wrapper function
that compiles, but is nearly impossible to use from the scripting
language in the way the C routine was intended.

For `python <http://www.python.org>`_, the preferred way of handling
contiguous (or technically, *strided*) blocks of homogeneous data is
with the module `NumPy <http://numpy.scipy.org>`_, which provides full
object-oriented access to arrays of data.  Therefore, the most logical
`python`_ interface for the ``rms`` function would be::

    def rms(seq):

where ``seq`` would be a `NumPy`_ array of ``double`` values, and its
length ``n`` would be extracted from ``seq`` internally before being
passed to the C routine.  Even better, since `NumPy`_ supports
construction of arrays from arbitrary `python`_ sequences, ``seq``
itself could be a nearly arbitrary sequence (so long as each element
can be converted to a ``double``) and the wrapper code would
internally convert it to a `NumPy`_ array before extracting its data
and length.

`SWIG`_ allows these types of conversions to be defined via a
mechanism called typemaps.  This document provides information on how
to use ``numpy.i``, a `SWIG`_ interface file that defines a series of
typemaps intended to make the type of array-related conversions
described above relatively simple to implement.  For example, suppose
that the ``rms`` function prototype defined above was in a header file
named ``rms.h``.  To obtain the `python`_ interface discussed above,
your `SWIG`_ interface file would need the following::

    %{
    #define SWIG_FILE_WITH_INIT
    #include "rms.h"
    %}

    %include "numpy.i"

    %init %{
    import_array();
    %}

    %apply (double* IN_ARRAY1, int DIM1) {(double* seq, int n)};
    %include "rms.h"

Typemaps are keyed off a list of one or more function arguments,
either by type or by type and name.  We will refer to such lists as
*signatures*.  One of the many typemaps defined by ``numpy.i`` is used
above and has the signature ``(double* IN_ARRAY1, int DIM1)``.  The
argument names are intended to suggest that the ``double*`` argument
is an input array of one dimension and that the ``int`` represents
that dimension.  This is precisely the pattern in the ``rms``
prototype.

Most likely, no actual prototypes to be wrapped will have the argument
names ``IN_ARRAY1`` and ``DIM1``.  We use the ``%apply`` directive to
apply the typemap for one-dimensional input arrays of type ``double``
to the actual prototype used by ``rms``.  Using ``numpy.i``
effectively, therefore, requires knowing what typemaps are available
and what they do.

Note that if the C function signature was in a different order::

    double rms(int n, double* seq);

that `SWIG`_ would not match the typemap signature given above with
the argument list for ``rms``.  Fortunately, ``numpy.i`` has a set of
typemaps with the data pointer given last::

    %apply (int DIM1, double* IN_ARRAY1) {(int n, double* seq)};

Using numpy.i
=============

The ``numpy.i`` file is currently located in the ``numpy/docs/swig``
sub-directory under the ``numpy`` installation directory.  Typically,
you will want to copy it to the directory where you are developing
your wrappers.  If it is ever adopted by `SWIG`_ developers, then it
will be installed in a standard place where `SWIG`_ can find it.

A simple module that only uses a single `SWIG`_ interface file should
include the following::

    %{
    #define SWIG_FILE_WITH_INIT
    %}
    %include "numpy.i"
    %init %{
    import_array();
    %}

Within a compiled `python`_ module, ``import_array()`` should only get
called once.  This could be in a C/C++ file that you have written and
is linked to the module.  If this is the case, then none of your
interface files should ``#define SWIG_FILE_WITH_INIT`` or call
``import_array()``.  Or, this initialization call could be in a
wrapper file generated by `SWIG`_ from an interface file that has the
``%init`` block as above.  If this is the case, and you have more than
one `SWIG`_ interface file, then only one interface file should
``#define SWIG_FILE_WITH_INIT`` and call ``import_array()``.

Available Typemaps
==================

The typemap directives provided by ``numpy.i`` for arrays of different
data types, say ``double`` and ``int``, and dimensions of different
types, say ``int`` or ``long``, are identical to one another except
for the C and `NumPy`_ type specifications.  The typemaps are
therefore implemented (typically behind the scenes) via a macro::

    %numpy_typemaps(DATA_TYPE, DATA_TYPECODE, DIM_TYPE)

that can be invoked for appropriate ``(DATA_TYPE, DATA_TYPECODE,
DIM_TYPE)`` triplets.  For example::

    %numpy_typemaps(double, NPY_DOUBLE, int)
    %numpy_typemaps(int,    NPY_INT   , int)

The ``numpy.i`` interface file uses the ``%numpy_typemaps`` macro to
implement typemaps for the following C data types and ``int``
dimension types:

  * ``signed char``
  * ``unsigned char``
  * ``short``
  * ``unsigned short``
  * ``int``
  * ``unsigned int``
  * ``long``
  * ``unsigned long``
  * ``long long``
  * ``unsigned long long``
  * ``float``
  * ``double``

In the following descriptions, we reference a generic ``DATA_TYPE``, which
could be any of the C data types listed above.

Input Arrays
------------

Input arrays are defined as arrays of data that are passed into a
routine but are not altered in-place or returned to the user.  The
`python`_ input array is therefore allowed to be almost any `python`_
sequence (such as a list) that can be converted to the requested type
of array.  The input array signatures are

  * ``(DATA_TYPE IN_ARRAY1[ANY])``
  * ``(DATA_TYPE* IN_ARRAY1, int DIM1)``
  * ``(int DIM1, DATA_TYPE* IN_ARRAY1)``
  * ``(DATA_TYPE IN_ARRAY2[ANY][ANY])``
  * ``(DATA_TYPE* IN_ARRAY2, int DIM1, int DIM2)``
  * ``(int DIM1, int DIM2, DATA_TYPE* IN_ARRAY2)``

The first signature listed, ``(DATA_TYPE IN_ARRAY[ANY])`` is for
one-dimensional arrays with hard-coded dimensions.  Likewise,
``(DATA_TYPE IN_ARRAY2[ANY][ANY])`` is for two-dimensional arrays with
hard-coded dimensions.

In-Place Arrays
---------------

In-place arrays are defined as arrays that are modified in-place.  The
input values may or may not be used, but the values at the time the
function returns are significant.  The provided `python`_ argument
must therefore be a `NumPy`_ array of the required type.  The in-place
signatures are

  * ``(DATA_TYPE INPLACE_ARRAY1[ANY])``
  * ``(DATA_TYPE* INPLACE_ARRAY1, int DIM1)``
  * ``(int DIM1, DATA_TYPE* INPLACE_ARRAY1)``
  * ``(DATA_TYPE INPLACE_ARRAY2[ANY][ANY])``
  * ``(DATA_TYPE* INPLACE_ARRAY2, int DIM1, int DIM2)``
  * ``(int DIM1, int DIM2, DATA_TYPE* INPLACE_ARRAY2)``

Argout Arrays
-------------

Argout arrays are arrays that appear in the input arguments in C, but
are in fact output arrays.  This pattern occurs often when there is
more than one output variable and the single return argument is
therefore not sufficient.  In `python`_, the convential way to return
multiple arguments is to pack them into a sequence (tuple, list, etc.)
and return the sequence.  This is what the argout typemaps do.  If a
wrapped function that uses these argout typemaps has more than one
return argument, they are packed into a tuple or list, depending on
the version of `python`_.  The `python`_ user does not pass these
arrays in, they simply get returned.  The argout signatures are

  * ``(DATA_TYPE ARGOUT_ARRAY1[ANY])``
  * ``(DATA_TYPE* ARGOUT_ARRAY1, int DIM1)``
  * ``(int DIM1, DATA_TYPE* ARGOUT_ARRAY1)``
  * ``(DATA_TYPE ARGOUT_ARRAY2[ANY][ANY])``

These are typically used in situations where in C/C++, you would
allocate a(n) array(s) on the heap, and call the function to fill the
array(s) values.  In python, the arrays are allocated for you and
returned as new array objects.

Note that we support ``DATA_TYPE*`` argout typemaps in 1D, but not
2D.  This because of a quirk with the `SWIG`_ typemap syntax and
cannot be avoided.  Note that for these types of 1D typemaps, the
python function will take a single argument representing ``DIM1``.

Output Arrays
-------------

The ``numpy.i`` interface file does not support typemaps for output
arrays, for several reasons.  First, C/C++ function return arguments
do not have names, so signatures for ``%typemap(out)`` do not include
names.  This means that if ``numpy.i`` supported them, they would
apply to all pointer return arguments for the supported numeric
types.  This seems too dangerous.  Second, C/C++ return arguments are
limited to a single value.  This prevents obtaining dimension
information in a general way.  Third, arrays with hard-coded lengths
are not permitted as return arguments.  In other words::

    double[3] newVector(double x, double y, double z);

is not legal C/C++ syntax.  Therefore, we cannot provide typemaps of
the form::

    %typemap(out) (TYPE[ANY]);

If you run into a situation where a function or method is returning a
pointer to an array, your best bet is to write your own version of the
function to be wrapped, either with ``%extend`` for the case of class
methods or ``%ignore`` and ``%rename`` for the case of functions.

Other Common Types: bool
------------------------

Note that C++ type ``bool`` is not supported in the list in the
`Available Typemaps`_ section.  NumPy bools are a single byte, while
the C++ ``bool`` is four bytes (at least on my system).  Therefore::

    %numpy_typemaps(bool, NPY_BOOL, int)

will result in typemaps that will produce code that reference
improper data lengths.  You can implement the following macro
expansion::

    %numpy_typemaps(bool, NPY_UINT, int)

to fix the data length problem, and `Input Arrays`_ will work fine,
but `In-Place Arrays`_ might fail type-checking.

Other Common Types: complex
---------------------------

Typemap conversions for complex floating-point types is also not
supported automatically.  This is because `python`_ and `NumPy`_ are
written in C, which does not have native complex types.  Both
`python`_ and `NumPy`_ implement their own (essentially equivalent)
``struct`` definitions for complex variables::

    /* Python */
    typedef struct {double real; double imag;} Py_complex;

    /* NumPy */
    typedef struct {float  real, imag;} npy_cfloat;
    typedef struct {double real, imag;} npy_cdouble;

We could have implemented::

    %numpy_typemaps(Py_complex , NPY_DOUBLE , int)
    %numpy_typemaps(npy_cfloat , NPY_CFLOAT , int)
    %numpy_typemaps(npy_cdouble, NPY_CDOUBLE, int)

which would have provided automatic type conversions for arrays of
type ``Py_complex``, ``npy_cfloat`` and ``npy_cdouble``.  However, it
seemed unlikely that there would be any independent (non-`python`_,
non-`NumPy`_) application code that people would be using `SWIG`_ to
generate a python interface to, that also used these definitions for
complex types.  More likely, these application codes will define their
own complex types, or in the case of C++, use ``std::complex``.
Assuming these data structures are compatible with `python`_ and
`NumPy`_ complex types, ``%numpy_typemap`` expansions as above (with
the user's complex type substituted for the first argument) should
work.

Helper Functions
================

The ``numpy.i`` file containes several macros and routines that it
uses internally to build its typemaps.  However, these functions may
be useful elsewhere in your interface file.

Macros
------

  **is_array(a)**
    Evaluates as true if ``a`` is non-``NULL`` and can be cast to a
    ``PyArrayObject*``.

  **array_type(a)**
    Evaluates to the integer data type code of ``a``, assuming ``a`` can
    be cast to a ``PyArrayObject*``.

  **array_dimensions(a)**
    Evaluates to the integer number of dimensions of ``a``, assuming
    ``a`` can be cast to a ``PyArrayObject*``.

  **array_size(a,i)**
    Evaluates to the ``i``-th dimension size of ``a``, assuming ``a``
    can be cast to a ``PyArrayObject*``.

  **array_is_contiguous(a)**
    Evaluates as true if ``a`` is a contiguous array.  Equivalent to
    ``(PyArray_ISCONTIGUOUS(a))``.

Routines
--------

  **char* pytype_string(PyObject* py_obj)**
    Given a ``PyObject*``, return a string describing its type.


  **char* typecode_string(int typecode)**
    Given a `NumPy`_ integer typecode, return a string describing the type.


  **int type_match(int actual_type, int desired_type)**
    Make sure input has correct `NumPy`_ type.  Allow character and
    byte to match.  Also allow int and long to match.  This is
    deprecated .  You should use ``PyArray_EquivTypenums()`` instead.


  **PyArrayObject* obj_to_array_no_conversion(PyObject* input, int typecode)**
    Given a ``PyObject*``, cast it to a ``PyArrayObject*`` if legal.
    If not, set the python error string appropriately and return
    ``NULL``.


  **PyArrayObject* obj_to_array_allow_conversion(PyObject* input, int typecode, int* is_new_object)**
    Convert the given ``PyObject*`` to a `NumPy`_ array with the given
    typecode.  On Success, return a valid ``PyArrayObject*`` with the
    correct type.  On failure, the python error string will be set and
    the routine returns ``NULL``.


  **PyArrayObject* make_contiguous(PyArrayObject* ary, int* is_new_object, int min_dims, int max_dims)**
    Given a ``PyArrayObject*``, check to see if it is contiguous.  If
    so, return the input pointer and flag it as not a new object.  If
    it is not contiguous, create a new ``PyArrayObject*`` using the
    original data, flag it as a new object and return the pointer.


  **PyArrayObject* obj_to_array_contiguous_allow_conversion(PyObject* input, int typecode, int* is_new_object)**
    Convert a given ``PyObject*`` to a contiguous ``PyArrayObject*``
    of the specified type.  If the input object is not a contiguous
    ``PyArrayObject*``, a new one will be created and the new object
    flag will be set.


  **int require_contiguous(PyArrayObject* ary)**
    Test whether a ``PyArrayObject*`` is contiguous.  If array is
    contiguous, return 1.  Otherwise, set the python error string and
    return 0.


  **int require_dimensions(PyArrayObject* ary, int exact_dimensions)**
    Require the given ``PyArrayObject*`` to have a specified number of
    dimensions.  If the array has the specified number of dimensions,
    return 1.  Otherwise, set the python error string and return 0.


  **int require_dimensions_n(PyArrayObject* ary, int* exact_dimensions, int n)**
    Require the given ``PyArrayObject*`` to have one of a list of
    specified number of dimensions.  If the array has one of the
    specified number of dimensions, return 1.  Otherwise, set the
    python error string and return 0.


  **int require_size(PyArrayObject* ary, int* size, int n)**
    Require the given ``PyArrayObject*`` to have a specified shape.
    If the array has the specified shape, return 1.  Otherwise, set
    the python error string and return 0.


Beyond the Provided Typemaps
============================

There are many C or C++ array/`NumPy`_ array situations not covered by
a simple ``%include "numpy.i"`` and subsequent ``%apply`` directives.

A Common Example
----------------

Consider a reasonable prototype for a dot product function::

    double dot(int len, double* vec1, double* vec2);

The python interface that we want is::

    def dot(vec1, vec2):

The problem here is that there is one dimension argument and two array
arguments, and our typemaps are set up for dimensions that apply to a
single array (in fact, `SWIG`_ does not provide a mechanism for
associating ``len`` with ``vec2`` that takes two python input
arguments).  The recommended solution is the following::

    %apply (int DIM1, double* IN_ARRAY1) {(int len1, double* vec1),
                                          (int len2, double* vec2)}
    %rename (dot) my_dot;
    %inline %{
    double my_dot(int len1, double* vec1, int len2, double* vec2) {
        if (len1 != len2) {
	    PyErr_Format(PyExc_ValueError,
                         "Arrays of lengths (%d,%d) given",
                         len1, len2);
	    return 0.0;
        }
        return dot(len1, vec1, vec2);
    }
    %}

If the header file that contains the prototype for ``dot()`` also
contains other prototypes that you want to wrap, so that you need to
``%include`` this header file, then you will also need a ``%ignore
dot;`` directive, placed after the ``%rename`` and before the
``%include`` directives.

Other Situations
----------------

There are other wrapping situations in which ``numpy.i`` may be
helpful when you encounter them.

  * In some situations, it is possible that you could use the
    ``%numpy_templates`` macro to implement typemaps for your own
    types.  See the `Other Common Types: bool`_ or `Other Common
    Types: complex`_ sections for examples.  Another situation is if
    your dimensions are of a type other than ``int`` (say ``long`` for
    example)::

        %numpy_typemaps(double, NPY_DOUBLE, long)

  * You can use the code in ``numpy.i`` to write your own typemaps.
    For example, if you had a three-dimensional array as a function
    argument, you could cut-and-paste the appropriate two-dimensional
    typemap into your interface file.  The modifications for the third
    dimension would be trivial.

  * Sometimes, the best approach is to use the ``%extend`` directive
    to define new methods for your classes (or overload existing ones)
    that take a ``PyObject*`` (that either is or can be converted to a
    ``PyArrayObject*``) instead of a pointer to a buffer.  In this
    case, the helper routines in ``numpy.i`` can be very useful.

  * Writing typemaps can be a bit nonintuitive.  If you have specific
    questions about writing `SWIG`_ typemaps for `NumPy`_, the
    developers of ``numpy.i`` do monitor the
    `Numpy-discussion <mailto:Numpy-discussion@scipy.org>`_ and
    `Swig-user <mailto:Swig-user@lists.sourceforge.net>`_ mail lists.

A Final Note
------------

When you use the ``%apply`` directive, as is usually necessary to use
``numpy.i``, it will remain in effect until you tell `SWIG`_ that it
shouldn't be.  If the arguments to the functions or methods that you
are wrapping have common names, such as ``length`` or ``vector``,
these typemaps may get applied in situations you do not expect or
want.  Therefore, it is always a good idea to add a ``%clear``
directive after you are done with a specific typemap::

    %apply (double* IN_ARRAY1, int DIM1) {(double* vector, int length)}
    %include "my_header.h"
    %clear (double* vector, int length);

In general, you should target these typemap signatures specifically
where you want them, and then clear them after you are done.

Acknowledgements
================

Many people have worked to glue `SWIG`_ and `NumPy`_ together (as well
as `SWIG`_ and the predecessors of `NumPy`_, Numeric and numarray).
The effort to standardize this work into ``numpy.i`` began at the 2005
`SciPy <http://scipy.org>`_ Conference with a conversation between
Fernando Perez and myself.  Fernando collected helper functions and
typemaps from Michael Hunter, Anna Omelchenko and Michael Sanner.
Their work has made this end result possible.
