What is the smallest string containing all permutations of a set of n elements for n=14?
"What is the least number of Haruhi episodes that you would have to watch in order to see the original 14 episodes in every order possible?"
Or, more formally, "what is the shortest string containing all permutations of a set of n elements?"
The Problem[]
- You have an n episode TV series. You want to watch the episodes in every order possible. What is the least number of episodes that you would have to watch?
- Overlapping is allowed. For example, in the case of n=2, watching episode 1, then 2, then 1 again, would fit the criteria.
- The orders must be continuous. For example, (1,2,1,3) does NOT contain the sequence (1,2,3)
Current Collaboration[]
Algorithm and Bounds[]
Credit: Anonymous (/sci/)
4 permutations are contained within each n-cycle. So the goal of this algorithm is to systematically generate the sequence, and show that any other method would give a less efficient method. In the long run, I'd like to create some axioms/theorems for the proof, such as more overlap=more efficiency. I think this will call for some modular arithmetic to generalize it for all n but I'm not sure how to do so.
*by follow, I mean use it as a "rule" to tell you the next number in the sequence*
Start with 1 and follow (1 2 3 4) 6 times. We do this by convention. 1,[2,3,4,1,2,3]
Follow (1 4 2 3) 5 times. This group must be chosen, as (23) is in it. 1,2,3,4,1,2,3,[1,4,2,3,1]
Follow (1 2 4 3) 5 times. This is the last group with (3 1) in it. 1,2,3,4,1,2,3,1,4,2,3,1,[2,4,3,1,2]
Since no remaining groups have (1 2) in them, we have to choose one
A): Following (1 4 3 2) 6 times would end the sequence in wrong: 1,2,3,4,1,2,3,1,4,2,3,1,2,4,3,1,2[1,4,3,2,1,4] No group left has (1 4) in it, so this option looses efficiency
B): Following (1 3 2 4) 6 times would end the sequence in wrong: 1,2,3,4,1,2,3,1,4,2,3,1,2,4,3,1,2[1,3,2,4,3,2] No group left has (3 2) in it, so this option looses efficiency
C): Following (1 3 4 2) 6 times works: 1,2,3,4,1,2,3,1,4,2,3,1,2,4,3,1,2[1,3,4,2,1,3]
Follow (1 3 2 4) 5 times. This is the last group with (1 3) in it. 1,2,3,4,1,2,3,1,4,2,3,1,2,4,3,1,2,1,3,4,2,1,3,[2,4,1,3,2] Follow the last group (1 4 3 2) 5 times, to complete the sequence. 1,2,3,4,1,2,3,1,4,2,3,1,2,4,3,1,2,1,3,4,2,1,3,2,4,1,3,2,1,4,3,2,1
The Lower Bound[]
I think I have a proof of the lower bound (for ). I'll need to do this in multiple posts. Please look it over for any loopholes I might have missed. As in other posts, let n (lowercase) = the number of symbols; there are permutations to iterate through. The obvious lower bound is . We can obtain this as follows:
Let:
- the running length of the string
- = the number of permutations visited
- When you write down the first permutation,
- is already n-1. For each new permutation you visit, the length of the string must increase by at least 1. So
- can never decrease. At the end,
- , giving us
- .
I'll use similar methods to go further, but first I'll need to explain my terminology...
Edges[]
I'm picturing the ways to get from one permutation to the next as a directed graph where the nodes correspond to permutations and the edges to ways to get from one to the next. A k-edge is an edge in which you move k symbols from the beginning of the permutation to the end; for example,
would be a 3-edge. Note that I don't include edges like
in which you pass through through a permutation in the middle (in this case 23451). This example would be considered two edges:
From every node there is exactly one 1-edge, e.g.:
These take you around in a cycle of length n. A 2-edge moves the first two symbols to the end. A priori, it could either reverse or maintain the order of those two symbols:
But the second, as already stated, is not counted as a 2-edge because it is a composition of two 1-edges. So there is exactly one 2-edge from every node.
1-loops[]
I call the set of n permutations connected by a cyclic path of 1-edges a 1-loop. There are (n-1)! 1-loops.
The concept of 1-loops is enough to get the next easiest lower bound of n! + (n-1)! + n-2. That's because to pass from one 1-loop to another, it is necessary to take a 2-edge or higher. Let us define:
- = the number of 1-cycles completed or that we are currently in
- The definition of
- is a bit more complicated than we need for this proof, but we'll need it later. You might ask, isn't
- just one more than the number of completed 1-cycles? No! When we have just completed a 1-cycle, it is equal to the number of completed 1-cycles. In order to increment
- , we have to take a 2-edge, which increases L by 2 instead of 1. Therefore
- can never decrease. Since
- starts out with the value n-2, and we have to complete all (n-1)! 1-loops, we get the lower bound n! + (n-1)! + n-2.
2-loops[]
Suppose we enter a 1-loop, iterate through all n nodes (as is done in the greedy palindrome algorithm), and then take a 2-edge out. The edge we exit by is determined by the entry point. The permutation that the 2-edge takes us to is determined by taking the entry point and rotating the first n-1 characters, e.g.:
12345 is taken by n-1 1-edges to 51234 which is taken by a 2-edge to 23415
If we repeat this process, it takes us around in a larger loop passing through n(n-1) permutations. I call this greater loop a 2-loop.
The greedy palindrome algorithm uses ever-larger loops; it connects (n-k+1) k-loops via (k+1)-edges to make (k+1)-loops. But I haven't been able to prove anything about these larger loops yet.
The tricky thing about 2-loops is that which 2-loop you're in depends on the point at which you entered the current 1-loop. Each of the n possible entry points to a 1-loop gives you a different 2-loop, so there are n*(n-2)! 2-loops, which overlap.
Final proof[]
And now for the proof of the n! + (n-1)! + (n-2)! lower bound...
To review: n = alphabet length L = running string length
= number of permutations visited = number of 1-cycles completed or that we are currently in In order to increase , you must jump to a new 1-cycle -- having completed the one you are leaving. That means the next permutation P' in the 1-cycle (following your exit point P) is one you have already visited. Either you have at some point entered the 1-cycle at P', or this is the second or greater time you've visited P. If you have ever entered the 1-cycle at P', leaving at P by a 2-edge will not take you to a new 2-cycle; you will be in the same 2-cycle you were in when you entered at P'.So these are the available ways to enter a 2-cycle you've never been in before:
- take a 3-edge or higher
- take a 2-edge but don't increase
- take a 2-edge from a permutation P that you were visiting for the second or greater time
In the first two cases, increases by 1 in the step under consideration. In the third case, must have increased by 1 in the previous step. Because of this third case, it is convenient to regard any series of edge traversals that takes you through permutations you've already visited as a single step. Then if we define
- = number of 2-cycles visited
the quantity does not decrease in any step. Since 2-cycles are n(n-1) long, you must visit at least (n-2)! 2-cycles. is initially n-3, giving us the lower bound . -Written by Anonymous
Loops[]
Each node in a sequence is a part of 3 loops, the k = 1, k = 2 and k = 3 loops, based on the shift per iteration. For example:
If N is 4, then:
k = 1: 0, 1, 2, 3 -> 1, 2, 3, 0 -> 2, 3, 0, 1 -> 3, 0, 1, 2 -> 0, 1, 2, 3 (4 elements in total)
k = 2: 0, 1, 2, 3 -> 2, 3, 1, 0 -> 1, 0, 3, 2 -> 3, 2, 0, 1 -> 0, 1, 2, 3 (4 elements in total)
k = 3: 0, 1, 2, 3 -> 3, 2, 1, 0 -> 0, 1, 2, 3 (2 elements in total)
The 0, 1, 2, 3 node is used for simplicity. The following steps can be performed over any node in the sequence. In fact you can calculate the total amount of elements in each loop for any n following the diagram below:
First 30 values:
Such nearly random distribution explains differences in the sequence length
Hardmode[]
Define H(n) as the number of shortest sequences. Here is a couple of known results:
H(1) = 1
H(2) = 2
H(3) = 6
H(4) = 24
H(5) >= 960 (deviates from the fact(n) pattern, does contain at least 8 sequences per initial permutation)
H(6) = good luck
Sequence building[]
Every new permutation (node) you append to the sequence increases its total size. When building 'optimal' sequences, it is important to append them as efficiently as possible, adding as few extra digits as possible. Let's explore how this works with examples:
Optimal sequence for n = 3:
0 1 2 0 1 0 2 1 0
+1, +1, +2, +1, +1
0 1 2, 1 2 0 + 1 (Reuses 2 digits)
1 2 0, 2 0 1 + 1 (Reuses 2 digits)
2 0 1 , 1 0 2 + 2 (Reuses 1 digit)
1 0 2, 0 2 1 + 1 (Reuses 2 digits)
0 2 1, 2 1 0 + 1 (Reuses 2 digits)
And if we list all of the cases for added digits:
+1: 4
+2: 1
Something interesting happens at n = 5, because there are no longer fact(n) shortest sequences. Combinations like these start to emerge:
0 1 2 3 4 0 1 2 3 0 4 1 2 3 0 1 4 2 3 0 1 2 4 3 0 1 2 0 3 4 1 2 0 3 1 4 2 0 3 1 2 4 0 3 1 2 0 4 3 1 0 2 4 1 3 0 2 4 1 0 3 2 4 1 0 2 3 4 1 0 2 4 3 1 0 4 2 3 1 0 4 3 2 1 0 4 3 1 2 0 1 3 4 2 1 0 3 4 2 1 3 0 4 2 1 3 4 0 2 1 4 3 0 2 1 4 0 3 2 1 4 0 2 3 1 4 0 2 1 3 4 2 0 1 3 2 4 0 1 3 2 0 4 1 3 2 0 1 4 3 2 0 1
+1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +3, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +2, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +3, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +2, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1, +2, +1, +1, +1, +1
+1: 92 (regular one has 96)
+2: 25 (regular one has 18)
+3: 2 (regular one has 4)
+4: 0 (regular one has 1)
As you can see, this one doesn't have a single node appended which requires extra 4 digits when combining. Unfortunately, this doesn't save any digits for n = 5. The current shortest sequence for n = 6 produces the following distribution:
+1: 575
+2: 141
+3: 3
As you can see a pattern is starting to emerge, for instance, no appended node uses extra 4 digits. Here is the distribution of coefficients for the shortest known sequence for n = 7:
+1: 4196
+2: 826
+3: 17
Defining +1 term as x, +2 term as y and +3 term as z you get the following equations:
x + y + z = n! - 1 (excluding the beginning node)
y + z >= (n - 1)! - 1 (since you have to change the k = 1 loop every n iterations)
x <= n! - (n - 1)!
if you want to build a sequence with 'a' elements you also get:
n + x + y * 2 + z * 3 = a
You can calculate the lower bound for a which is:
n + n! + (n - 1)! - 2
Which indicates the minimal size for a sequence
Transforming the problem and finding the solutions[]
Initial optimizations[]
By using the knowledge above, we can limit our search to just the first 3 node which can be appended to the sequence. This drops the algorithm complexity to just O(9 ^ n!) from O(n! ^ n!).
Further improvements[]
Another cool observation is that we don't just add all possible node to our sequence, they are extremely specific:
(Example) node: 0123
3 digit overlay: 1230
2 digit overlay: 2310
1 digit overlay: 3210
All other node are just not used. It is also true for any n. This means that we have to just rotate our number instead of trying all of the possible node. This optimization drops complexity to just O(3 ^ n!) which is a massive improvement.
New scheme[]
You can now represent any sequence by just the chosen node from 0 to 2. Let's rewrite the n = 3 sequence provided in the example above in this scheme:
Combination: 0 1 2 0 1 0 2 1 0
Representation: 012 0 0 1 0 0
0 means the chosen node adds 0 + 1 extra digits
1 means the chosen node adds 1 + 1 extra digits
2 means the chosen node adds 2 + 1 extra digits
Symmetry[]
By inspecting all of the n = 5 combinations which can be written in this scheme (Not including 2 lazy ones), it appears that there are exactly 6 distinct combinations. All of the other ones can be formed by following the decoding procedure described above. Here is the picture of them:

By carefully inspecting them we can easily see the centers of symmetry, which are colored in blue. It also looks like those are parts of a bigger sequence if you connect them. The fact that they all are symmetric allows us to further refine the algorithm.
Skipping shortest combinations[]
To adhere to the symmetry, shortest nodes have to be skipped. For instance, when n = 5, there are exactly 4 times you have to skip the shortest node. Here is the picture for sequences which are center symmetrical which highlights skipped nodes:

<1> means that the shortest node was skipped
>1< means that the skipped node was used.
You can further improve the search algorithm by picking the initial and the end nodes and trying to interconnect them, though it eliminates most of the combinations which is probably irrelevant if you're only looking for the shortest one.
It looks like k = 1 loops arent shared between the begin and the end nodes, though there isnt any obvious pattern for k = 2 loops. Since there is always even amount of elements it means that the end and front nodes would meet by following the k = 2 or k = 3 loop.
This is however worse than just bruteforcing nodes since you have to guess the end node for your initial one but since they have to meet in the middle you can reverse the problem and pick the middle nodes instead. This way you are guaranteed to have 2 correct nodes and you simply have to follow them back.
Its pretty funny to note that if you were picking the end node by following the k = 3 loop 1 step back from the initial one you were already picking the correct node. However for the n > 5 you should also try picking k = 2 loop just in case. If the symmetrical shortest sequence exists, this strategy cuts your execution time exponentially
Sequence graphs for different Ns[]




Resources and Information[]
- http://pastebin.com/aNwANugC -Python algorithm by Anonymous
- http://www.notatt.com/permutations.pdf -proof for upper bound
- https://warosu.org/sci/thread/3751105 - source thread
