Intro
Suffix array construction (or, more generally, suffix sorting) is an important task of finding the lexical order of all sub-strings of a given string. It is heavily used in data indexing and BWT compression, which I've been doing for a long time to date.

First of all, it was surprising to me to discover that the subject is possible. By

**linear time** I mean O(N) asymptotic execution time, where N is the input array size. By

**no extra space** I mean that the only memory required is for the input array (N bytes), suffix array (4N bytes) and some constant storage (O(1) bytes).

Induction sort (SA-IS) is the key. Authors did a great work of exploring it. The major breakthrough was to use the induction to sort LMS sub-strings, in addition to using it afterwards to recover the order of all other strings. The only thing missing was a proper memory requirement for the algorithm. Authors claim 2N is the worst-case additional memory. Let's see how we can narrow this value down.

Proof
For an input string of size N, let's define M to be the total number of LMS-type suffixes. The memory (

*R(N,M)* machine words) required for a recursive call consists of three sections:

- Suffix storage = M
- Data storage = M
- Radix storage =
*K(M)*, where *K(M)* is the number of unique LMS sub-strings

*R(N,M) = M + M + K(M) <= 3M/2*
Now, let's split LMS sub-strings according to their length. There can be LMS of size 1. There are

*L2 *of size 2,

*L3* of size 3, and so on. Let's now define M and N in terms of these numbers:

*M = Sum(i){ Li }*
*N = Sum(i){ i * Li }*
We know that LMS of different size can not be equal. Therefore, we can safely split

*K(M)*:

*K(M) = Sum(i){ K(Li) } = K(L2) + K(L3) + ... + K(Li) + ...*
*K(Li) <= Li*
We know that the maximum number of unique sub-strings of size 2 can not exceed 2^16, which is a rather small number. For the convenience let's name

*a=L2 *and

* b=L3+L4+...= Sum(i>2){ Li }*. It is the time to give a better upper bound to

*R(N,M)*:

*M = a+b*
*N = 2L2 + 3L3 + 4L4 + ... <= 2L2 + 3(L2+L3+...) = 2a + 3b*
*R(N,M) = 2M + K(M) = 2a+2b + K(a) + K(b) <= 2a+2b + 2^16 + b = 2a+3b + 2^16 <= ***N+O(1)**
Well, that's it. By carefully arranging the data in memory and allocating just 2^16 additional words we can perform a recursive call to SA-IS, and, therefore, construct a complete suffix array.

Code
My version of SA-IS is called

A7. Advantages to

Yuta's SA-IS implementation (v2.4.1):

**5N** is the worst-case memory requirement. It doesn't depend on the input.
- No memory allocation/free calls inside the sorting procedure.
- Data squeezing: using smaller data types for recursion if range allows.
- Cleaner C++ code with more comments.