Veikko Keränen
15 June 2002 and 15 June 2003
Rovaniemi Polytechnic, School of Technology
Jokiväylä 11, 96300 Rovaniemi, Finland
veikko.keranen@ramk.fi
http://south.rotol.ramk.fi
Abstract
In 1906 Axel Thue [34] started the systematic study of structures in words. Consequently, he studied basic objects of theoretical computer science long before the invention of the computer or DNA. In 1961 Paul Erdös [13] raised the question whether abelian squares can be avoided in infinitely long words. In 1992, we presented in [19], see also [20
23], an abelian square-free (a-2-free) endomorphism
on the four letter alphabet
= {a, b, c, d}. The size of
, i.e. ![]()
(abcd)
, is equal to 4×85. Until now, all known methods for constructing arbitrarily long a-2-free words on
have been based on the structure of this
; see Arturo Carpi [4
6].
In this paper, we report of a completely new endomorphism
of
, the iteration of which produces an infinite abelian square-free word. The size of
is 4×98, and the image words for letters are constructed, in part, differently from the case of
. For
they were directly obtained by permutating letters cyclically. The endomorphism
is not an a-2-free endomorphism itself, since it does not preserve the a-2-freeness of all words of length 7. However,
can be used together with
to produce a-2-free DT0L-languages of unlimited size. Here DT0L-languages mean deterministic context-independent Lindenmayer languages produced by using compositions of endomorphisms - so called tables; see [32, p.188]. Indeed, by using Carpi's algorithm [4] for prefixes of
(
) and
(
), and a modified version of this algorithm, one can check the following fact: for any a-2-free words
and
, where
does not contain a certain subword pattern,
(
) and
(
) are always a-2-free and avoid all undesirable patterns that would, in the case of
, lead to an abelian-square in the next iteration step.
It is anticipated that later on this new result will lead to a sharper bound for the base number (~1.000021) found in [5], where Carpi showed that the number of a-2-free words over 4 letters of length n is
.
We explain extensive computer aided searches that have been carried out over 11 years to find new ways of constructing a-2-free words over 4 letters. In this process, we have encountered some unintuitive non-linear phenomena which, however, are in accordance with the complex behaviour of simple systems studied by Stephen Wolfram in his long-awaited book [35]. For example, two same looking initial values for prefixes and suffixes of image words for letters may yield 1000-fold running times when searching through all possibilities for proper infixes! We also analyse the structure of
in terms of subwords, and discuss some open problems.
1 Introduction
In theoretical computer science, structures and patterns in words (strings) constitute a natural and central research topic. As mentioned above, the systematic study of word structures, i.e., combinatorics on words, was started by Axel Thue (1863-1922) in [34] at the beginning of the 20th century. One of the remarkable discoveries made by Thue is that the consecutive repetitions of non-empty subwords (squares) can be avoided in infinite words over a three letter alphabet. As a simple example of the square concept, consider the finite words a b a c a b a and a b c d c d a b. The first word does not contain any square, i.e., it is square-free, whereas the second word contains the underlined square c d c d as a subword.
Thue also showed that over a binary alphabet one can construct arbitrarily long words which do not contain any triple repetitions (cubes), i.e., factors occurring three times in succession. Unfortunately, Thue's results were forgotten for a long time, and they have been rediscovered and republished again and again - at least a dozen of times as noted in [9]. Indeed, repetition-free words have been used in various fields of mathematics. For example, in formal languages, in connection with unending games, and in symbolic dynamics (which constitutes a tool for studying chaos). Moreover, repetition-free words have served in many ways in the procedure of solving the famous Burnside problem from 1902. The question asked by Burnside was: is every group finite when it has a finite number of generators and satisfies the identity relation
= 1 for some natural number n? Among other infinite repetition-free words, cube-free words over a binary alphabet were utilized in showing that certain Burnside groups can be infinite [1, pp.32-33].
The above mentioned square-freeness property of words is by no means trivial to prove. The tool which Thue invented for constructing square-free words, namely the concept of a repetition-free morphism, is still today a basic device in the study of avoidable patterns in words. Repetition-free morphisms are mappings between free monoids that preserve the repetition-freeness of words. The iteration of such a (non-trivial) morphism produces a repetition-free infinite word. Morphisms possessing this property have been sharply characterized in [4,8,17,18,25,26]. These results concern different types of repetitions and alphabet sizes. Informally speaking, most of the characterizational results for morphisms say that in order to test the repetition-freeness of a given morphism, one has only to check whether the image words of short repetition-free words are repetition-free. A general survey of these and related results (achieved before 1984) is given in [3]. For a short survey of Thue's results concerning repetition-free words and their applications, see [15]. Fundamental topics are discussed in [27,33].
In a paper from 1961, see [13, p.240], Paul Erdös (1913-1996) raised the question whether abelian squares can be avoided in infinitely long words, i.e., whether there exist infinitely many abelian square-free words over a given alphabet. Here, an abelian square means a non-empty word uv, where u and v are permutations of each other (Niels Henrik Abel 1802-1829). For example, a b c a c b is an abelian square. A word is called abelian square-free, if it does not contain any abelian square as a subword. For example, the word a b a c a b a is abelian square-free, while ab cabdc bcacd ac is not.
Later, in a 1970 paper, Pleasants [31] showed that there exists an infinite abelian square-free word over five letters. In 1991, see Keränen [19] the year after, we managed to show that the same holds true also in the case of four letters. It is easily seen that abelian squares cannot be avoided over a three letter alphabet. Indeed, in this alphabet, each word of length 8 contains an abelian square, cf. Section 9 below. In [12] Entringer et al. showed that every infinite word over a binary alphabet contains arbitrarily long abelian squares. Due to Dekking [10] it is known that abelian fourth [third] power repetitions can be avoided in infinite words over two [three] letters.
An application of Dekking's result was given by Justin et al. in [16], where it was shown that a finitely generated semigroup is uniformly repetitive if and only if it is finite. In [30] Pirillo et al. used similar kind of reasoning when proving, among other results, that the additive semigroup
is not uniformly 4-repetitive. It seems to be an open problem whether
is uniformly 2-repetitive or 3-repetitive. In all these considerations the use of van der Waerden's theorem has been very central. In Lothaire [27, pp.55
62] van der Waerden's theorem was used to show that every morphism from a free semigroup
, where A is finite, to
is repetitive. This means that every long enough sequence on a finite set of integers contains two adjacent segments (not necessarily of the same length) that have the same sum.
The original problem concerning abelian squares has attracted attention also in the research of free partially commutative monoids, see for instance [7,11]. Moreover, abelian square-free words have aroused interest in algorithmic music, see e.g. Laakso [24].
In [4] Carpi gave sufficient conditions for morphisms to preserve abelian kth power-freeness of words. A conjecture is that these conditions yield an effective characterization also for abelian square-free endomorphisms on a four letter alphabet
= {a, b, c, d}. However, new examples of relatively short abelian square-free endomorphisms g of
, possibly with the size
g(abcd)
< 4
85 = 340
provided they exist, have turned out to be extremely hard to find. Thus far, after 1992 (when we presented
in [19]), the only new abelian square-free endomorphisms and substitutions have been found by Carpi [5], cf. also [6, pp.80
81]. However, these mappings are all based on the structure of
. Moreover, the size of these endomorphisms and substitutions is huge, it is around 4
(![]()
1)
, or so [actually, a similar favourable result regarding the size 4
(![]()
85)
is also quoted to be found with the aid of computers]. These substitutions, and the endomorphisms derived from them, are constructed by deleting distinct occurrences of one letter in
(x), x
. Indeed, Carpi showed in [5] that there exists exactly one abelian square-free word w
which can be obtained by deleting an occurrence of a letter in
(d). The letter that can be deleted is the 6349th letter (a) of
(d). Furthermore, it turns out that both
(c) w and w
(c) are abelian square-free. By using these substitutions Carpi then showed that the number of abelian square-free words of each length grows expenentially, and, interestingly enough, that the monoid of (uniform) abelian square-free endomorphisms of
is not finitely generated.
The words
(x), x
, of our morphism are obtained by cyclically permutating the letters in
(a). This method was already used by Pleasants [31] in connection with five letters. Consequently, both of these endomorphisms are uniformly growing. The size of Pleasants' endomorphism is 5
15 = 75. In our case, we could check by using computers that the morphism size 4
85 = 340, in spite of its largeness, is known to be minimal, as far as we consider those abelian square-free endomorphisms which are constructed by using cyclic permutation of letters in
(x), x
.
The structure of
in terms of reappearing subwords - also as cyclic permutations, so called grey words, and mirror images - turns out to be somewhat surprising as we will see in Section 4 below. An analogous analysis for
is not available as yet. The frequency of (ordinary) subwords in D0L words was studied in Frid [14]. For a generalization of abelian squares, see Avgustinovich and Frid [2].
2 Preliminaries
In this section we present most of the terminology necessary for understanding the mathematical structures and reasoning involved in this paper and in literature.
An alphabet Σ is a finite non-empty set of abstract symbols called letters. A word (string) over
is a finite (unless otherwise indicated) string, or sequence, of letters belonging to
. The set of all words [non-empty words] over
is denoted by
[
]. On the
, the associative binary operation of catenation is defined. For words u and v, it is the juxtaposition uv. The empty word, which is the neutral element of catenation, is denoted by
. The algebraic structures
and
are called, respectively, the free monoid and the free semigroup generated by
.
Let w =
,
. The length of the word w, denoted by | w |, is the number of occurrences of letters in w, i.e., | w | = m. Let
= {
, ... ,
}. The number of occurrences of one letter x
in w is denoted by
, or simply by
if x =
. The notation
(w) stands for the Parikh vector of w, i.e.,
(w) = (
,
,
). Usually we will omit the subscript
and write simply
instead of
. Quite interchangeably with the Parikh vector notation also formal sums
(w) =
, with
, are used. For example
(abacaba) = 4a + 2b + c. Thus
(w), with w
, is an element of the abelian free monoid
generated by
. We will also consider differences of Parikh vectors and differences of formal sums. Consequently, these vectors and sums are extended into elements of the abelian free group
generated by
. The neutral element of
is denoted by
.
A word u is called a subword of a word w, if w = p u s for some words p and s. The notation SW(w) stands for the set of all subwords of w. If p =
[s =
], then u is called a prefix [a suffix] of w. By PREF(w) [SUFF(w)] we mean the set of all prefixes [suffixes] of w. The notation pref(w) [suff(w)] denotes an element in PREF(w) [SUFF(w)]. In the case w
λ we write first(w) [last(w)] to denote the first [the last] letter of w.
Let k
2 be a given integer. A k-repetition [an abelian k-repetition] is a non-empty word of the form
[![]()
![]()
, where
for all
, i.e.,
:s are permutations of each other]. Instead of [abelian] 2- and 3-repetitions, terms [abelian] squares and [abelian] cubes are often used. A word or an
-word (explained below) is called k-repetition free [abelian k-repetition free, or k-free in the abelian sense], or in short k-free [a-k-free], if it does not contain any k-repetition [abelian k-repetition] as a subword. A word sequence or a word set is k-free [a-k-free], if all words in it are k-free [a-k-free]. If, for a fixed k, it is possible to construct arbitrarily long (infinite) a-k-free (or other pattern-free) words over a given alphabet
, then we say that abelian k-repetitions (or those patterns) are avoidable over
.
Subsets of
are called languages over
. The catenation of languages L and
, denoted by L
, is the language {uv | u
L, v
}. The notation
, where L is a language and
, is defined as follows:
= {
} and
=
L. Let
=
and
=
. If a language contains only one word, say w, then we sometimes write w instead of {w}; especially,
= {
| i
0} and
= {
| i
1}.
For a language L we write Q(L), where Q is SW, PREF or SUFF, to denote the set
Q(w). Moreover, for
, we write
(L) to denote the words in Q(L) of length = n. The notation
(w) [
(w)] is used for an element in
(w) [
(w)].
A morphism h is a mapping between free monoids
and
with h(uv) = h(u)h(v) for every u and v in
. Especially, h(
) =
. A morphism h:
, being compatible with the catenation of words, is uniquely defined, if the word h(x)
is (effectively) given for each
. If
, we call h an endomorphism (and usually write g instead of h). For a morphism h and a language L we define h(L) = {h(w) |
}. A morphism h is termed uniformly growing, if |h(x)| = |h(y)|
2 for every x and
.
For a given integer k
2, a morphism h:
is called k-free [a-k-free], if h(w) is k-free [a-k-free] for every k-free [a-k-free] word
.
As regards the L-systems (Aristid Lindenmayer 1925
1989), we specify the following concepts. A D0L-system is a triple G = (
, g,
), where
is an alphabet, g:
is an endomorphism, and
, called the axiom, is a word over
. The (word) sequence S(G) generated by G consists of the words
=
(
),
(
),
(
),
(
), ... ,
where
(
) = g(
(
)) for
. The language of G is defined by L(G) = {
(
) |
}. Languages [sequences] defined by a D0L-system are referred to as D0L-languages [D0L-sequences]. D0L-systems provide a very convenient way for defining languages and infinite words. Furthermore, if g and
are k-free [a-k-free], then the iteration of g will yield a k-free [a-k-free] D0L-sequence. An HD0L-system is a 5-tuple
= (
,
, g, h,
), where (
, g,
) is a D0L-system, called the underlying D0L-system of
, Δ is an alphabet, and h:
is a morphism. The HD0L-sequence S(
) generated by
consists of the words
h(
) = h(
(
)), h(
(
)), h(
(
)), h(
(
)), ... ,
and the HD0L-language of
is the set L(
) = {h(
(
)) |
}. A DT0L-system is a triple
= (
, H,
), where H is a finite non-empty set of morphisms (called tables) and (
, h,
) is a D0L-system for every
. The DT0L-language of
is the set L(
) = {w | w =
or w =
, where the compositions
of morphims are constructed from
, ...,
H}. Obviously, a DT0L-system can be regarded as a D0L-system, when H contains only one (endo)morphism. For a thorough discussion of various L-systems we refer the reader to Rozenberg and Salomaa [32].
An
-word is an infinite sequence, from left to right, of letters of an alphabet
. Thus an
-word can be identified with a mapping of
into
. One can construct an
-word, for example, by iterating an endomorphism g:
such that
and g(x) = xw for some
,
. Such a morphism g is called prefix preserving for the reason that
(x) is a proper prefix of
(x) whenever i
0. An
-word is obtained as the "limit" of the sequence
(a); i = 0, 1, 2, ... .
3 Some examples of repetition-free endomorphisms
Below we use Mathematica to produce some examples of the iteration of an endomorphism. The first two examples are due to Axel Thue (1906, 1912), and the last one is our abelian square-free morphism
.
Thue iterated the following prefix preserving endomorphism g: {a, b}
{a, b}
when showing the existence of a cube-free (3-free) infinite word over the binary alphabet {a, b}. We have used extra spaces to make the output more readable:
![productions = {"a" -> "ab ", "b" -> "ba "} ; g[x_] := StringReplace[x, productions]](HTMLFiles/NewAbelianSquare-FreeDT0L-LanguagesOver4Letters_281.gif)
![]()
| a |
| ab |
| ab ba |
| ab ba ba ab |
| ab ba ba ab ba ab ab ba |
| ab ba ba ab ba ab ab ba ba ab ab ba ab ba ba ab |
The iteration of the next endomorphim g: {a, b, c}
{a, b, c}
produces longer and longer square-free (2-free) strings over the three letter alphabet {a, b, c}:
![]()
![]()
| a |
| abc |
| abc ac b |
| abc ac b abc b ac |
| abc ac b abc b ac abc ac b ac abc b |
| abc ac b abc b ac abc ac b ac abc b abc ac b abc b ac abc b abc ac b ac |
Note that this g is not square-free itself, since g(aba) = a b c a c a b c is not square-free !
The abelian square-free endomorphism
: {a, b, c, d}
{a, b, c, d}
which we found in [19] is as follows:
![]()
![]()
![g85b = StringReplace[g85a, cyclicPermutation] ; g85c = StringReplace[g85b, cyclicPermutation] ; g85d = StringReplace[g85c, cyclicPermutation] ;](HTMLFiles/NewAbelianSquare-FreeDT0L-LanguagesOver4Letters_294.gif)
We define:
![]()
and compute the length and the number of occurrences of each letter of
(a) as follows:
![]()
![]()
4 Structures in ![]()
In this section, we give a pictorial representation of the structure of
in terms of reappearing subwords
also as cyclic permutations, grey words (in which all letters occur equally often), and mirror images. In the first part, cyclic permutations of subwords of length 26 turn out to be quite surprising. In a sense, one can explain these phenomena by thinking that reappearing subwords reduce the number of different Parikh vectors of longer successive subwords, and thus make avoiding of all abelian repetitions more plausible. Needless to say that we are still far from understanding these structures precisely.
The underlined letter c in
(a) means the following. As explained in the Introduction, Carpi [5] used substitutions derived from
when showing that the number of abelian square-free words of each length grows expenentially. Basically, he showed that there exists exactly one abelian square-free word w
which can be obtained by deleting an occurrence of a letter in
(b). The letter, that can be deleted, is the 6349th letter (c) of
(b). In this case, both
(a) w and w
(a) are also abelian square-free. This 6349th letter of
(b) lies inside an occurrence of the block
(a) and is at the 59th position of that occurrence (note that 6349 = 74
85 + 59). The 59th letter of
(a) is the underlined c below.
g85(a) = abcacdcbcdcadcdbdaba
cabadbabcbdbcbacbcdcacbabd abacadcbcdcac
dbcbacbcdcacdcbdcdadbdcbca;
g85(b) = bcdbdadcdadbadacabcb
dbcbacbcdcacdcbdcdadbdcbca bcbdbadcdadbd
acdcbdcdadbdadcadabacadcdb;
g85(c) = cdacabadabacbabdbcdc
acdcbdcdadbdadcadabacadcdb cdcacbadabaca
bdadcadabacabadbabcbdbadac;
g85(d) = dabdbcbabcbdcbcacdad
bdadcadabacabadbabcbdbadac dadbdcbabcbdb
cabadbabcbdbcbacbcdcacbabd;
g85(a) = abc acdcbcdcadcdbdabacabadbabcbd
bcbacbc dcacbabd abac adcb cdc acdb cbacbcdcacdcbdcd adbdcbca;
g85(a) = abc acdcbcdcadcdbdabacabadbabcbd
bcbacbcdcacba bdabacadcbcd c acdb cbacbcdcacdcbdcd adbdcbca;
![g85 (a) = abcacdcbcdcadcdbdaba cabadbabcb Overscript[d bcbacbcdcac, 1] babd abaca d cbcdcac Overscript[d bcba cbcdcac, 1] dcbdcdadbdcbca ; g85 (a) = abcacdcbcdcadcd Overscript[bdaba ca, 2] badbabcbdbcbacbcdcacba Overscript[bd abaca, 2] abacad Overscript[cbcdcac d, 3] bcba Overscript[cbcdcacd, 3] cbdcdadbdcbca ; g85 (a) = Overscript[Overscript[abcacdcbc, ==>], 4] dcadcdbdaba cabadbabcbdbcba Overscript[Overscript[cbcdcacba, <==], 4] bd abacadcbcdcac dbcbacbcdcacdcbdcdadbdcbca ; g85 (a) = Overscript[Overscript[abcacdcb, ==>], 5] cd Overscript[cadcdbdab, 6] a cabadbabcbdbcbacbcdcacbab Overscript[Overscript[d abacadc, ==>], 7] bcdcac dbcbacbcdcacdcbdcdadbdcbca ; g85 (c) = Overscript[Overscript[cdacabad, <==], 7] abacbabdbcdc acdcbdcdadbdadca Overscript[Overscript[dabacadc, ==>], 7] d Overscript[Overscript[b cdcacba, <==], 5] dabaca bdadcadabacabadbabcbdbadac ; g85 (b) | g85 (d) = ... aba Overscript[cadcdb | dab, 6] dbc ... ;](HTMLFiles/NewAbelianSquare-FreeDT0L-LanguagesOver4Letters_314.gif)
5 Carpi's characterisation of a-2-free morphisms
Proposition 1 (Carpi [4]). Given an integer
, two alphabets
and
, and a morphism h:
, let us denote by
the subgroup of
generated by
. Then, the morphism h is an abelian nth power-free (a-n-free) provided that the following conditions are satisfied:
(i) h(w) is a-n-free for every a-n-free word
of length 2,
(ii) h is commutatively bijective (i.e. the set
is linearly independent),
(iii) for all
,
PREF( h(
) )
{h(
)} (
) such that
, j = 1, 2 , ... , n
1,
there exists integers
{0,1} such that
=
, j = 1, 2 , ... , n
1.
If Card(
)
6, then an a-n-free morphism h always satisfies conditions (i)
(iii). ![]()
Informally speaking, the condition (iii) above says that, if h(w) contains an abelian repetition (as a subword), then there is an abelian repetition in the underlying word w, too. In our case, we are interested in applying this characterisation when n = 2 (and j = 1). Independently of us, Carpi verified in [4] that, indeed,
satifies conditions (i)
(iii) of Proposition 1, for n = 2, and is therefore a-2-free.
Ville Mattila implemented Carpi's algorithm in [28]. Here, we use the word algorithm, as we wish to emphasise the process aspect of this characterisation. Mattila explains programming viewpoints related to this algorithm also in another paper, entitled Constructing Abelian Square-Free Words and Testing Morphisms with Mathematica, in this IAS 2002 proceedings. In that paper, and in [28], he also explains how to use computer algebra tools for producing a-2-free strings - there the method is to solve character variables in strings by using polynomial inequalities.
6 A new infinite a-2-free D0L
sequence over 4 letters
Define the endomorphism g:
as follows:
g(a) = abcacdcbcdcadbdcbdbabcbdcacbabdbabcabdadcdadbdcbd-
babdbcbacbcdbabdcdbdcacdbcbacbcdcacdcbdcdadbdcbca
and let g(
(x)) =
(g(x)) for all x
, where
:
is the letter-to-letter endomorphism that performs the cyclic permutation
(a) = b,
(b) = c,
(c) = d and
(d) = a. Here | g(a) | = 98 and the Parikh vector of g(a) is
(g(a)) = (18, 28, 27, 25).
One may check that, for this g, g(w) is a-2-free for all a-2-free words w of length
6 in
, and for all a-2-free words w of length = 7, except for acadbca and acadcba and their cyclic permutations. This means that all words in the set
(1) not-a-2-free = g({acadbca, bdbacdb, cacbdac, dbdcabd})
g({acadcba, bdbadcb, cacbadc, dbdcbad})
contain abelian squares. Thus g is not an a-2-free morphism of
.
Let A = g(a), B = g(b), C = g(c), D = g(d), where g is as above. Using (1) and the fact concerning a-2-free words of length
6, we conclude that all words obtained by any cyclic permutation from the words
(2)
A
B
, where
,
,
,
are in g(
),
|-----------------|-----------------| and
(![]()
) =
(![]()
)
![]()
contain abelian squares
with starting and ending boundaries inside the underlined occurrences of words taken from {A, B, C, D} as indicated in (2).
On the other hand, it can be checked from the prefixes of the image words in g(
), by using Carpi's algorithm, that the words of the form (2) and their cyclic permutations, are the only possible subwords in any g(w) that contain abelian squares, provided that w in
is a-2-free. Moreover, the coding properties of g guarantee that if, for any i
{0, 1, 2, 3},
g(w) = P
(A
A
D
B
A) S, with P, S
, and
:s,
:s as in (2),
then w can be factorized as follows:
(3) w = p
(a
a
d
b
a) s, with P = g(p),
= g(
),
= g(
),
= g(
),
= g(
), S = g(s),
g(
a
) =
A
and g(
b
) =
B
.
Moreover, g is commutatively bijective, that is, the vectors
,
,
,
are linearly independent. This implies, together with (3) and the fact
, that
in all the factorizations of w in (3).
Define the endomorphism g
:
by the productions:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now it follows from above that all words
(w) are a-2-free, whenever
is a-2-free and does not have, for any i, a subword of the form
(4)
(a
a
b
d
a) with
; i = 0, 1, 2, 3.
Indeed, it turns out that no subword of
(w) or
(w) can be of the form (4), for any a-2-free
. This can be checked by modifying Carpi's algorithm, see Proposition 1, for prefixes of
(
) and
(
). In order to illustrate this modified, and also the original Carpi's algorithm, that is, part (iii) in Proposition 1, we consider the following. Let | mark the boundaries of the blocks of imagewords of X , Y, Z in {A, B, C, D}. In the case of
X | Y | Z =
|
|
=
u v
; with u =
, v
, uv as an abelian square;
we have
= pref(X),
= suff(X),
= pref(Y),
= suff(Y),
= pref(Z),
= suff(Z). Thus
(5)
=
=
= ![]()
= ![]()
= ![]()
Since
=
, for an abelian square uv, Carpi's idea in Proposition 1 (iii) is to test whether the linear combinations of the Parikh vectors of prefixes, i.e.,
belong to
(the subgroup of
generated by
, where h is the morphism to be tested). In (2) we had X = A, Y = D, Z = A.
Corresponding to the search of the subword pattern a
a
b
d
a, with
, and the natural cyclic permutations in (4), we denote u = a
a
b, v =
d
a, x = a, left = a, y = b, right = d, z = a (we exclude the natural cyclic permutations for simplicity). From this, and (5), i.e., from
=
, we obtain
(6)
=
=
= ![]()
Thus, instead of testing whether
we now test whether
. When programming this test for a computer, it is best to include the first letter of u, i.e. x (above x = a) as the last letter of
to reduce the number of prefixes to be tested. Of course, it is possible to speed the computation even more, in a similar fashion, for other letters as well, if necessary.
The short image words
(w), for words
of length 2, were in turn tested with the following simple pattern matching Mathematica program which works fast enough for words up to length 300 or so. In the pattern matching part, we do not use many subpatterns, otherwise the computation time may become too long for practical purposes. Below the function s is the same as myParikh defined in Section 3.
![Clear[myPatterns] ; myPatterns[x_String] := Module[{pattString = {}}, Characters[x] /. {___, "a", w1__, "b", w2__, "a", ___} :> q /; If[s[{w1}] - "a" === s[{w2}] - "d", pattString = Append[pattString, StringJoin["a", w1, "b", w2, "a"]]] ; pattString]](HTMLFiles/NewAbelianSquare-FreeDT0L-LanguagesOver4Letters_502.gif)
Because all the tests mentioned above show that no subword of
(w) or
(w) is of the form (4), for any a-2-free
, we now draw the conclusion:
Proposition 2. All words of the form
(w), where
, ... ,
are of the form
(
) or
(
); i = 0,1,2,3; are a-2-free, provided that w is a-2-free and does not contain a subword of the form (4) in the case of
=
(
). ![]()
Thus, instead of a single endomorphism we can now iterate several endomorphisms and obtain completely new infinite a-2-free words and D0L-sequences over four letters. Note that in Proposition 2 one can choose as morphisms
all cyclic permutations of
and
but not all permutations in general.
As mentioned in Section 2, the languages of the form
(w), obtained by using compositions of morphims, are called DT0L-languages. See [32] for more details.
7 Extensive computer aided searches
We have checked very nearly all the possibilites in the size range
4
100 for those endomorphisms g of
, the words
(x), x
, of which are obtained by cyclically permutating the letters in
(a). Moreover, there is no need for any long computations anymore (in checking the remaining few chances), since all possible candidates for image words have already been constructed. The result is
almost certainly
as follows: excluding symmetry (mirror images, cyclic permutations, or renaming letters), there are no other cyclic endomorphisms of
in the size range
4
100, except
and
, that are a-2-free or generate a-2-free L-sequences.
We have also tried many other approaches in which the endomorphisms are not growing uniformly. For example, in the following cases of g of
, and in thousands other (larger) cases as well, g(w) is a-2-free for all a-2-free w
=
. However, no proper g(d) exists for any of the triplets { g(a), g(b), g(c) }:
13 13 8 15 15 9 15 15 25
a
abacabadacdbd a
abacadabdadcdbd a
abcdbcacbdcdadb
b
acdcbcacbcdbd b
abacadcbcadcdbd b
abcbdbadbdcdadb
c
abdbcdbd c
abdbcbabd c
adacadabacbcdcadabacabadb
Here we firstly fixed g(a) and g(b), and then started to search for suffix-prefix pairs (suff, pref) for g(c) and g(d). In this setting, all words of the form suff w pref, where w ∈ {g(a), g(b)}
; with n = 0,1,2,3; have to be a-2-free.
Moreover, we now know of over 90 different cases of g of
, for which g(w) is a-2-free for every a-2-free
=
(i.e. for w of length = 4) but not for all a-2-free
. Some endomorphisms (of size 4
98 and 4
100) are even a-2-free for every a-2-free
but not for all a-2-free
. Thus, we have examples of endomorphisms of
, each of which produce thousands of a-2-free words of length
600 (some words being much longer), but still fail to be useful in our search for new endomorphisms generating a-2-free L-sequences on
.
In one setting, we let g(a) and g(b) be fixed from the set {g
(a), g
(b), g
(c)}, and then checked different possibilities for the remaining g(c) and g(d)
. Here we separated two cases as follows:
Case 1. g(a) = g
(a), g(b) = g
(b), 1
g(c)
85, 1
g(d)
120 (we had to abort this run)
Case 2. g(a) = g
(a), g(b) = g
(c), 1
g(c)
90, 1
g(d)
120 (this checking was relatively fast)
Firstly, we searched through all possibilities for g(c) in the range mentioned above. In more detail, we went through all triplets { g(a), g(b), g(c) } such that g(w) is a-2-free for all a-2-free
=
(here w is of length
7), and g(w) can be extended to the right with a proper prefix of g(d) of length 20 or so. Excluding all symmetrical cases, we found thousands of different triplets. However, one can check that no proper g(d) can be found for any of them, though, in this setting, we found over 20 cases of g, for which g(w) is a-2-free for every a-2-free word
of length = 4. It is very counterintuitive fact that in Case 1 (with
), when searching for g(d), the computation time was over 10 times longer than in Case 2 (with
) ! Even though it is by no means the first time that such a complex non-linear behaviour occurs in our search for strings (timewise or spacewise, and in much shorter and carefully rechecked runs), we plan to recheck Case 1, and maybe extend it, in the near future. Nevertheless, the result file indicates that most likely all possibilities for g(d) were checked already, though the process remained active. Indeed, it seems that sometimes operating systems can cause problems in runs that may take months to complete. For the Cases 1 and 2, we used C++ on a two processor 400 MHz PC running Linux.
In conclusion, we are not very optimistic that in the near future it would be possible to find more examples of abelian square-free endomorphisms, substitutions, or L-sequences of
which would not be based on the structures of
or
.
8 Open problems (by Sami Mäkelä, University of Turku)
Problem 1. Can you avoid abelian squares of the form uv, where
, over three letters? - Computer experiments show that you can avoid these patterns at least in words of length 450.
Problem 2. Can you avoid abelian cubes of the form uvw, where
, over two letters? - You can do this at least for words of length 250.
9 Material on the Internet
We plan to place C/C++ programs and a great number of result files from our long computer runs at
in the near future. This material would provide a pool of long a-2-free strings over 4 letters for possible further investigations. As mentioned in Section 7, some of the endomorphisms are a-2-free for every a-2-free words of length up to 6.
From http://south.rotol.ramk.fi/~keranen/Forssa/UsingTheEmptySet.html (.nb) the reader may find examples of computer algebra programs in which we pay special attention to handling the empty set. Indeed, in our study, the empty set (or the empty word) is many times a natural starting point, and the meticulous use of it supports us to keep the programs clear and reliable. In many cases the computation also ends with the empty set as a result. For example, the reader may try the following Mathematica program which starts, in a sense, from {{}} and ends with {}:
![With[{n = 8, k = 3}, NestList[DeleteCases[Flatten[Map[Table[Append[#, i - 1], {i, k}] &, #], 1], {___, u__, v__} /; Sort[{u}] == Sort[{v}]] &, {{}}, n]]](HTMLFiles/NewAbelianSquare-FreeDT0L-LanguagesOver4Letters_593.gif)
Indeed, this computation shows that abelian squares are unavoidable over 3 letters, since every word of length 8 turns out to contain them. Originally, this style of programming arose in discussions with Stephen Wolfram, and our program above is a modified version of the example that Wolfram presents in [35, p.944].
Acknowledgements
We gratefully acknowledge the participation of the following individuals most of who have been students at the Rovaniemi Polytechnic over the course of this study from 1990 to 2003. These people have made a number of computer programs for searching strings with desirable properties, or helped in a crucial way to set up the computing environments. Starting year is given in parenthesis: Kari Tuovinen (1990); Minna Iivonen, Anja Keskinarkaus, Marko Manninen (1993); Abdeljalil Chabani, Tomi Laakso (1994); Mika Moilanen, Juha Särestöniemi (1996); Juho Alfthan (1999); Olli-Pentti Saira (2000); Marja Kenttä, Ville Mattila (2001); Lauri Autio, and Marianna Mölläri (2002).
Over the years we have extensively used many computers at the School of Technology, Rovaniemi Polytechnic, and quite recently also at the Art and Design Faculty of the University of Lapland. We wish to thank these institutions for allocating computational resources at our disposal.
References
[1] S.I. Adjan. Burnside groups of odd exponent and irreducible systems of group identities. In W.W. Boone et al., editors, Word Problems, 19
37. North-Holland, Amsterdam, 1973.
[2] S.V. Avgustinovich and A.E. Frid. Words avoiding abelian inclusions. To appear in J. Automata, Languages and Combinatorics. Otto-von-Guericke-Univ., Magdeburg.
[3] J. Berstel. Some recent results on square-free words. In M. Fontet and K. Melhorn, editors, Proc. STACS '84, Lecture Notes in Comp. Sci., 166:14
25. Springer-Verlag, Berlin, 1984.
[4] A. Carpi. On abelian power-free morphisms. Int. J. Algebra Comput., 3:151
167. World Sci. Publ. Company, 1993.
[5] A. Carpi. On the number of abelian square-free words on four letters. Discrete Appl. Math., 81:155
167. Elsevier, 1998.
[6] A. Carpi. On abelian squares and substitutions. Theor. Comp. Sci., 218:61
81. Elsevier, 1999.
[7] R. Cori and M.R. Formisano. Partially abelian square-free words. RAIRO Inform. Théor. et Appl., 24:509
520, 1990.
[8] M. Crochemore. Sharp characterizations of square-free morphisms. Theoret. Comp. Sci., 18:221
226, 1982.
[9] J. Currie. Open problems in pattern avoidance. American Mathematical Monthly, 100:790
793, 1993.
[10] F.M. Dekking. Strongly non-repetitive sequences and progression-free sets. J. Combin. Theory Ser. A, 27:181
185, 1979.
[11] V. Dickert. Research topics in the theory of free partially commutative monoids. Bull. Europ. Assoc. Theoret. Comp. Sci., 40:479
491, 1990.
[12] R.C. Entringer and D.E. Jackson and J.A. Schatz. On non-repetitive sequences. J. Combin. Theory Ser. A, 16:159
164, 1974.
[13] P. Erdös. Some unsolved problems. Magyar Tud. Akad. Mat. Kutató Int. Közl., 6:221
254, 1961.
[14] A.E. Frid. On the frequency of factors in a DOL word. J. Automata, Languages and Combinatorics, 3(1):29
41. Otto-von-Guericke-Univ., Magdeburg, 1998.
[15] G.A. Hedlund. Remarks on the work of Axel Thue on sequences. Nordisk Mat. Tidskr., 15:148
150, 1967.
[16] J. Justin and G. Pirillo and S. Varricchio. Unavoidable regularities and finiteness conditions for semigroups. In A. Bertoni and C. Bohm and P. Miglioli, editors, Proc. 3rd Italian Conf. on Theoret. Comp. Sci. '89, 350
355. World Scientific, Singapore, 1989.
[17] V. Keränen. On the k-Freeness of Morphisms on Free Monoids. Annales Academiæ Scientiarum Fennicæ, Ser. A. I. Mathematica Dissertationes, 61, 55 pages. Finnish Science Academy, 1986.
[18] V. Keränen. On the k-freeness of morphisms on free monoids. In F.J. Brandenburg and G. Vidal-Naquet and M. Wirsing, editors, Proc. STACS '87, Lecture Notes in Comp. Sci., 247:180
188. Springer-Verlag, Berlin, 1987.
[19] V. Keränen. Abelian squares are avoidable on 4 letters. In W. Kuich, editor, Proc. ICALP '92, Lecture Notes in Comp. Sci., 623:41
52. Springer-Verlag, Berlin, 1992.
[20] V. Keränen. Mathematica in research of avoidable patterns in strings, In V. Keränen and P.Mitic, editors, Mathematics with Vision, Proc. First International Mathematica Symposium (IMS '95, Southampton, England), 259
266. Computational Mechanics Publications, 1995.
[21] V. Keränen. The avoidability of regularities in strings (in Finnish). Arkhimedes, 47(1):71
79, 1995.
[22] V. Keränen. On abelian repetition-free words. In V. Demidov, editor, Proc. IAS '96, 8
14. Murmansk State Pedagogical Institute, Murmansk, 1996.
[23] V. Keränen. Repetition-free strings and computer algebra (in Finnish). In C. Gefwert and P. Orponen and J. Seppänen, editors, Logic, Mathematics and the Computer, Proc. Finnish Artificial Intelligence Society, 14:250
257. Hakapaino, Helsinki, 1996.
[24] T. Laakso. Musical rendering of an infinite repetition-free string. In C. Gefwert and P. Orponen and J. Seppänen, editors, Logic, Mathematics and the Computer, Proc. Finnish Artificial Intelligence Society, 14:292
297. Hakapaino, Helsinki, 1996.
[25] M. Leconte. A characterization of power-free morphisms. Theoret. Comp. Sci., 38:117
122, 1985.
[26] M. Leconte. Kth power-free codes. In M. Nivat and D. Perrin, editors, Proc. Automata on Infinite Words '84, Lecture Notes in Comp. Sci., 192:172
187. Springer-Verlag, Berlin, 1985.
[27] M. Lothaire. Combinatorics on Words. Addison-Wesley, Reading, Massachusetts, 1983.
[28] V. Mattila. Creating Permutation Repetition-Free Words and Testing Permutation Repetition-Freeness of Morphisms (in Finnish). B.Sc. Thesis. Rovaniemi Polytechnic, 2002.
[29] S. Mäkelä. Patterns in Words (in Finnish). M.Sc. Thesis. Univ. Turku, 2002.
[30] G. Pirillo and S. Varricchio. On uniformly repetitive semigroups. Semigroup Forum, 49:125
129. Springer-Verlag, New York, 1994.
[31] P.A.B. Pleasants. Non-repetitive sequences, Proc. Cambridge Phil. Soc., 68:267
274, 1970.
[32] G. Rozenberg and A. Salomaa. The Mathematical Theory of L-systems. Academic Press, New York, London, Toronto, Sydney, San Fransisco, 1980.
[33] A. Salomaa. Jewels of Formal Language Theory. Computer Science Press, Rockville, Maryland, 1981.
[34] A. Thue. Über unendliche Zeichenreihe. Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiania, 7:1
22, 1906.
[35] S. Wolfram. A New Kind of Science. Wolfram Media, 2002.
Converted by Mathematica (June 15, 2003)