## Merging lists in place

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

### Merging lists in place

I'm told that there is a stable way to merge two sorted lists in place that runs in time better than O(n2)--but I haven't been able to find or derive anything about it. Can anyone help?
makes a complete sentence when preceded by itself in quotes

I am often scatterbrained and therefore make frequent use of the edit button. If you're replying within a few minutes of my post, there's a strong chance it changed between your clicking "reply" and "submit".

LucasBrown

Posts: 227
Joined: Thu Apr 15, 2010 2:57 am UTC
Location: Right behind you

### Re: Merging lists in place

You can't merge two lists in place, unless you're splitting the final list across the two input lists. Merging two lists necessarily creates a new list, of size equal to the sum of the input sizes.

Stable merging of two lists in linear time is easy, though. Just set up two list pointers, each pointing to the start of one of the inputs. Every iteration, check which of the two pointers has a smaller value. Copy that value into the result and move that pointer. Repeat until one of the lists is empty, at which point you can just append the entries in the remaining list to the end of the output list. That's a single linear pass through each list.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

Xanthir
My HERO!!!

Posts: 4139
Joined: Tue Feb 20, 2007 12:49 am UTC

### Re: Merging lists in place

Xanthir wrote:You can't merge two lists in place, unless you're splitting the final list across the two input lists. Merging two lists necessarily creates a new list, of size equal to the sum of the input sizes.

Stable merging of two lists in linear time is easy, though. Just set up two list pointers, each pointing to the start of one of the inputs. Every iteration, check which of the two pointers has a smaller value. Copy that value into the result and move that pointer. Repeat until one of the lists is empty, at which point you can just append the entries in the remaining list to the end of the output list. That's a single linear pass through each list.

Well now, it depends on how your lists are represented. If by "list" you mean contiguous block of memory, like an array in C, then this is correct. However, if you mean a Lispy (linked) list, then it's quite easy to merge two lists. Just change where all the cdrs (links) point in a similar fashion to your description and you've just merged two sorted lists in place in O(n) time.

LucasBrown: could you specify how your lists are represented?

scarecrovv

Posts: 621
Joined: Wed Jul 30, 2008 4:09 pm UTC
Location: Massachusetts

### Re: Merging lists in place

My program isn't really merging two lists... it's all in one contiguous block of memory, and my program is written in such a way that the first half might as well be a completely seperate list from the second half. My question is this: is there a way to rearrange the items in the list in a stable, sub-quadratic manner that uses a quantity of additional space that is independent of the size of the lists?

P.S. The only reason I'm posting this is because the piece of hardware my program's on has miniscule memory and the data I'm working with almost fills the memory--I would gladly do the copy-into-new-list thing if it was possible on my hardware.
makes a complete sentence when preceded by itself in quotes

I am often scatterbrained and therefore make frequent use of the edit button. If you're replying within a few minutes of my post, there's a strong chance it changed between your clicking "reply" and "submit".

LucasBrown

Posts: 227
Joined: Thu Apr 15, 2010 2:57 am UTC
Location: Right behind you

### Re: Merging lists in place

LucasBrown wrote:My program isn't really merging two lists... it's all in one contiguous block of memory, and my program is written in such a way that the first half might as well be a completely seperate list from the second half. My question is this: is there a way to rearrange the items in the list in a stable, sub-quadratic manner that uses a quantity of additional space that is independent of the size of the lists?

P.S. The only reason I'm posting this is because the piece of hardware my program's on has miniscule memory and the data I'm working with almost fills the memory--I would gladly do the copy-into-new-list thing if it was possible on my hardware.

I think it might be doable.

Consider that you have two lists, adjacent, sorted least-to-most, as in your example. If we don't mind moving items between lists, I think we can merge, and it should have good runtime IF your lists are of similar magnitude. Have a pointer to the head of each list. As normal, choose the minimum from each to put onto the output list. If this is the list that you're growing the output list into, great, just move the pointer. Otherwise, swap whatever item you'd need to move out of list 1 into the spot in list 2 from which you took the item you put into the output list.

Since this could break the sorting invariant of list two, swap the item down until it is in the correct place. This algorithm has the potential for a quadratic runtime, but if your lists are similar, should only require very few swaps per iteration.

I can't think of a better way.
The same as the old Meteorswarm, now with fewer posts!

Meteorswarm

Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

### Re: Merging lists in place

If you've got two sorted arrays in a row, you could "merge" them in-place using heapsort. O(n log n).
Goplat

Posts: 490
Joined: Sun Mar 04, 2007 11:41 pm UTC

### Re: Merging lists in place

You can get O(n) time with O(n) memory using a merge, or O(n log n) time with O(1) memory by sorting the whole thing. Or, if you switch to linked lists, you can get O(n) time with O(1) memory using a merge.
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.

Berengal
Superabacus Mystic of the First Rank

Posts: 2707
Joined: Thu May 24, 2007 5:51 am UTC
Location: Bergen, Norway

### Re: Merging lists in place

Berengal wrote:You can get O(n) time with O(n) memory using a merge, or O(n log n) time with O(1) memory by sorting the whole thing. Or, if you switch to linked lists, you can get O(n) time with O(1) memory using a merge.

The problem is that in this case he has strictly n+m memory. Converting to linked lists would take something more than that, depending on how many bits he uses for addressing, and conventional merging we've already talked about.

If this merge is the result of you trying to do mergesort, a better option is to just perform heapsort or quicksort and enjoy the better memory usage.
The same as the old Meteorswarm, now with fewer posts!

Meteorswarm

Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

### Re: Merging lists in place

Most of you seem to forget that LucasBrown needs a stable sort. Maybe in that case insertion sort is the best option. It takes O(n2) comparisons and swaps, but it has low overhead so it will generally perform better than strictly in-place merge sort.

There is a more efficient way to use "in-place" merge sort with additional storage, though. Assuming that you have a memory region that consists of two serially aligned subarrays that are to be merged, you can copy just the first subarray to another region. After that, you merge the copy of the first subarray with the original second subarray into the original memory region.

So it really depends on your possibilities. I don't know of any way to merge in-place with only constant additional storage in better than O(n2) time. If you can really only stand constant additional storage, go for insertion sort. If you can afford to copy the first array, take the in-place merge sort with "modest linear additional storage".

If you think the sort doesn't need to be stable afterall, I agree to the above posters that options like quicksort and heapsort are better. If speed is very important and you have time to write twice as much code, or if you're coincidentally using C++, you might be interested in introsort. If you're sorting data with few unique keys you might be interested in 3-way quicksort.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

You can also mimic a stable sort (assuming that you're trying to use it to have things be sub-sorted by other keys) by having your comparison function be sophisticated enough to consider all the keys simultaneously.
The same as the old Meteorswarm, now with fewer posts!

Meteorswarm

Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

### Re: Merging lists in place

Meteorswarm wrote:You can also mimic a stable sort (assuming that you're trying to use it to have things be sub-sorted by other keys) by having your comparison function be sophisticated enough to consider all the keys simultaneously.

Certainly. You just have to watch that the comparisons are fast enough, or your n log n unstable sort gets slower than a stable n² algorithm. Of course, for large data sets n log n will beat n², but I get the feeling that these lists aren't that big.

I suspect that we need more info to discover the best solution here. Why does the program generate 2 lists, why not merge them as they're being created? Are the 2 lists roughly of equal size, and roughly how big are they? How many secondary keys are there, and how hard are they to compare?

PM 2Ring

Posts: 2960
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

### Re: Merging lists in place

I stumbled across another option: quicksort with a stable partitioning algorithm. The C++ STL provides a stable partitioning algorithm. Their implementation is adaptive; if it can allocate some temporary storage it can run in O(n) just like the normal unstable partition of normal quicksort, and otherwise it can run O(n log n) in-place. So a quicksort built upon stable_partition would run O(n log n log n) if it has to be in-place, which is a lot better than the O(n2) of insertion sort or in-place merge sort. If you're not using C++ you can still try and peek at their code here (search for "stable_partition").
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.
The same as the old Meteorswarm, now with fewer posts!

Meteorswarm

Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

### Re: Merging lists in place

Meteorswarm wrote:Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.

I've always felt that this isn't emphasized enough. In my CS classes it seemed like all we cared about was the big O of the algorithm but a lot of our practice type situations weren't very large so we were actually taking more time by using a more sophisticated algorithm.
double epsilon = -.0000001;

Dason

Posts: 1274
Joined: Wed Dec 02, 2009 7:06 am UTC
Location: ~/

### Re: Merging lists in place

Dason wrote:
Meteorswarm wrote:Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.

I've always felt that this isn't emphasized enough. In my CS classes it seemed like all we cared about was the big O of the algorithm but a lot of our practice type situations weren't very large so we were actually taking more time by using a more sophisticated algorithm.

For pedagogical reasons, it's useful because it's simpler analytically to talk about the big-\$character complexity than all the little vagaries that go into computing the actual run time (how does it behave with a cache of various sizes? for example). There are real-world applications where it matters. If you're Google, you probably don't care a whole lot about those constants when you're sure that your n is so huge they're tiny by comparison.
The same as the old Meteorswarm, now with fewer posts!

Meteorswarm

Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

### Re: Merging lists in place

Oh definitely. It's A LOT easier to just talk about the large n case. I just wish they would have made a slight effort to say something like "Hey guys, you know sometimes those constants matter so you gotta think about if that additional overhead makes sense in your particular problem or not". I mean hopefully everybody realizes these kinds of things... but with some of the kids I was taking those classes with I highly doubt they would think about these issues.
double epsilon = -.0000001;

Dason

Posts: 1274
Joined: Wed Dec 02, 2009 7:06 am UTC
Location: ~/

### Re: Merging lists in place

You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems - they're still doing 100k linear searches over large linked lists rather than using hash tables.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

Xanthir
My HERO!!!

Posts: 4139
Joined: Tue Feb 20, 2007 12:49 am UTC

### Re: Merging lists in place

Xanthir wrote:You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems - they're still doing 100k linear searches over large linked lists rather than using hash tables.

This.

My university keeps buying flashy software that doesn't scale at all, because the test demos all have 100 students. Asymptotic analysis can sometimes mean you're doing overkill on small problems, but it will keep you from doing something dumb later. (Hopefully).
Agent_Irons

Posts: 213
Joined: Wed Sep 10, 2008 3:54 am UTC

### Re: Merging lists in place

Agent_Irons wrote:
Xanthir wrote:You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems - they're still doing 100k linear searches over large linked lists rather than using hash tables.

This.

My university keeps buying flashy software that doesn't scale at all, because the test demos all have 100 students. Asymptotic analysis can sometimes mean you're doing overkill on small problems, but it will keep you from doing something dumb later. (Hopefully).

It's all about matching your problem to your constraints. If you just need a one-off analysis of 10,000 data items, who cares if it's O(n^4) if you can run it overnight. If you're writing something that scales to Google's data, you'd better optimize the heck out of it.
The same as the old Meteorswarm, now with fewer posts!

Meteorswarm

Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

### Re: Merging lists in place

I recommend one of the in-place versions of quicksort: such as the following. If the call stack on this gets too big then you may be stuck using bubble sort

Code: Select all
void quicksort(ref int[] b, int begin, int end) {   int pi = end;   int ci = begin;   bool homogenous = true;   if(begin >= end) {      return;   }   for(int i = begin; i <= end; i++) {      if(homogenous && i > begin && b[i] != b[begin]) {         homogenous = false;      }   }   if(homogenous) {      return;   }   while(b[pi - 1] == b[pi]) {      pi--;   }   while(ci != pi) {      if(pi > ci) {         if(b[ci] < b[pi]) {            int tb = b[ci];            b[ci] = b[pi];            b[pi] = tb;            int ti = ci;            ci = pi;            pi = ti;         } else {            ci++;         }      } else {         if(b[ci] > b[pi]) {            int tb = b[ci];            b[ci] = b[pi];            b[pi] = tb;            int ti = ci;            ci = pi;            pi = ti;         } else {            ci--;         }      }   }   quicksort(ref b, begin, pi - 1);   quicksort(ref b, pi + 1, end);}
quick-dudley

Posts: 1
Joined: Mon Mar 28, 2011 10:49 am UTC

### Re: Merging lists in place

Jplus wrote:I stumbled across another option: quicksort with a stable partitioning algorithm. The C++ STL provides a stable partitioning algorithm. Their implementation is adaptive; if it can allocate some temporary storage it can run in O(n) just like the normal unstable partition of normal quicksort, and otherwise it can run O(n log n) in-place. So a quicksort built upon stable_partition would run O(n log n log n) if it has to be in-place, which is a lot better than the O(n2) of insertion sort or in-place merge sort. If you're not using C++ you can still try and peek at their code here (search for "stable_partition").

It might not solve the OP's problem, I can't help but point out that the C++ STL has std::inplace_merge(). However, I don't think it's required to be stable (also, it's not required to be O(n), library implementors can choose O(n) with a buffer, or O(n*log(n)) without). If you have C++, it might be worth looking at the implementation provided. There's a lot of handy stuff like that in <alogrithm> that most C++ coders are unaware of.
lgw

Posts: 272
Joined: Mon Apr 12, 2010 10:52 pm UTC

### Re: Merging lists in place

It is possible to merge contiguous chunks of memory (input) in-place in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8523
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

dhruvbird wrote:It is possible to merge contiguous chunks of memory (input) in-place in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/su ... .1.22.8523

That paper discusses a merge operation that takes O(n log n) operations, just like the algorithm mentioned by lgw. I do think that std::inplace_merge is the solution to the OP's problem.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

Jplus wrote:
dhruvbird wrote:It is possible to merge contiguous chunks of memory (input) in-place in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8523

That paper discusses a merge operation that takes O(n log n) operations, just like the algorithm mentioned by lgw. I do think that std::inplace_merge is the solution to the OP's problem.

dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

They specify "a fixed amount of additional space" which is the same thing the std::inplace_merge requires.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.

WarDaft

Posts: 1553
Joined: Thu Jul 30, 2009 3:16 pm UTC

### Re: Merging lists in place

dhruvbird is right: this merge algorithm takes linear time while std::inplace_merge takes log-linear time. However there is a catch: this algorithm isn't stable!

The authors do claim to have a sorting algorithm (but not a merge algorithm) that is stable, in-place and log-linear though... I'm going to look at the paper they refer to, as I'm rather sceptical about that possibility.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

Keep track of 4 lists:
1: source1
2: source2
3: circle -- lives at the front of source2: circular buffer near the front of source2 that stores extra stuff from destination when we want to copy from source2
4: dest (which eats up the front of source1)

While source1.front() <= source2.front(), we just shove things into dest. Easy.

When source2.front() is smaller than the front of source1 or circle, we pop the front off source 2, give that space to circle, and stuff the front of source1 into circle, and create space for dest. If circle is smaller than or equal to the front of source2, we take source1's head, and put it in circle, while taking circle's head out, and putting it on the tail of dest.

When we run out of source1, we now have the circle buffer and source2 that we need to merge in-place. If circle is either sane enough, or can be made sane with a sufficiently cheap number of operations, we could argue from recursion here.

Hard part: Keeping circle sane. It is poping elements off, and getting room and new elements inserted, in a pretty insane way.

What properties does circle have to have to make this easy?
1) We want to be able to remove the front element from circle, and stop using that memory slot, while keeping it in a valid state. (this is for the 2nd merge pass)
2) We want to be able to insert a new element, higher than every element in the circle, and end up using 1 more unit of memory at the far end of the circle. (when source2.front() is smaller than circle.front(), we take source1.front() and stuff it into circle, and place source2.front() where it came from, and from there to dest).
3) We want to be able to remove the front element of the circle, and add in a new element that is than larger any element in circle, without changing our memory footprint at all. (when circle.front() is less than source2.front(), we want to take source1.front(), put it in circle while circle.front() goes to source1.front(), and from there to dest).

And we need to do 1 2 and 3 in a memory stable way.

---

The reason you often care more about O notation than practical tests is that practical tests on small n are easy to do, and are done all the time. Programs often break down when they work well on "typical" cases, but blow up when fed "extreme" cases. By paying attention to O notation, and making things fast and workable on small cases, you get code that is not always optimal for typical cases, but doesn't break down as often.

And it is really common that an algorithm originally intended to handle X sized cases ends up being used for Z sized cases 5-10 years down the road, as the power of computers makes the other bottlenecks melt away. (Why optimize for more than 10 textures, nobody has that much ram!)

Then again, the code that lives is the code that sucked just badly enough to still work in many cases.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

Yakk
Poster with most posts but no title.

Posts: 10209
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Merging lists in place

Jplus wrote:dhruvbird is right: this merge algorithm takes linear time while std::inplace_merge takes log-linear time. However there is a catch: this algorithm isn't stable!

The authors do claim to have a sorting algorithm (but not a merge algorithm) that is stable, in-place and log-linear though... I'm going to look at the paper they refer to, as I'm rather sceptical about that possibility.

The earlier link was to a merge sort idea that does in-place sorting in time O(n\log{n}). It is not entirely clear to me if the in-place merge-sort is stable in nature.

However, it feels as if the in-place merge can be made stable by a bit a tweaking.

The in-place merge is actually relatively straight-forward. I'll try to explain a simpler version here.

1. Given an array of size 2n, sort 2 parts such that each one is as close to the other and is a multiple of \frac{2n}{\log{2n}} (it works even if this is not true). Basically, we want a close to equal split. As long as they are a constant fraction of each other, we are fine.
2. Chunk it up into blocks of size \frac{2n}{\log{2n}}. You have \log{2n} such chunks.
3. Now, move the last \frac{2n}{\log{2n}} elements of the total sorted sequence to the beginning of the buffer.
4. This can be done by first finding the indexes of the boundary elements in each of the last chunks of size \frac{2n}{\log{2n}} (there are only 2 such chunks). This takes time O(\frac{n}{\log{n}}).
5. Now, using tripple reversal, move these elements to the beginning of the list. This takes time O(n).
6. Use the same merging technique as shown in the paper. You will be left with an unsorted buffer of size \frac{2n}{\log{2n}} towards the end of the list.
7. Use any in-place sorting algorithm that takes time O(\log{n}) to sort this buffer. It till be done in time O(n) since the input is only of size O(\frac{n}{\log{n}}).
8. We are done!
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

Yes, the merge in isolation is linear and the sorting algorithm overall is in-place, O(n log n) and unstable. They state that very clearly in section 5. In section 5 they also claim to have found a non-merging sorting algorithm that is in-place, O(n log n) and stable, but that one is discussed in another paper. They do consider the latter algorithm to be related to the former.

To be honest, I found the explanation from the paper easier to digest than yours. In fact, you seem to describe a different algorithm since the paper chopped the array in O(sqrt(n)) blocks while you're chopping it in O(log n) blocks.

@Yakk: it appears that you're explaining a different merge algorithm. Is it a stable one? Where did you get it from? Was it supposed to be an answer to a question, and if so, to what question?
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

I was trying to write a sorting algorithm from my head.

You walk over the first and second list in parallel, with the sorted elements being stored in the part of the first list you have already iterated over. The hard part is storing the excess elements in the first half of the second list whenever we take an element from the second list.

So I abstracted out that problem, and described what properties we'd need for the "container" stored there (which I called circle, because the naive non-working solution is a circular buffer) in order for the problem to be solved using this naive method. Probably won't work, because those are some strange requirements. On the other hand, so long as we can do those operations in lg(n) time, we are good, so that gives some space to solve the problem...

Hmm. I might be able to argue that, if we can turn our structure stably into a list in something nice like O(n) or O(n lg n) time, requirement (1) can be done away with, because we only switch from needing (2) and (3) to needing (1) lg(n) times?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

Yakk
Poster with most posts but no title.

Posts: 10209
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Merging lists in place

Hm. Seems like this algorithm is either going to take additional storage or lots of rotations in the circle. Interesting idea though.

Semi-related: I'm going to look up that other algorithm right now.

EDIT: So I dived headlong into it, and I found out that it is a merging sort after all (so I was wrong about that). Again there is a catch: the algorithm only works if no key appears more than approximately sqrt(n) times in the array. It's basically a faster version of an algorithm that was already published in 1977 by Luis Trabb Pardo and which has more relaxed requirements on the key distribution. The older algorithm however still requires that the array has at least about sqrt(n) distinct keys in order to keep its log-linear promise, and even when that works out it has a high constant factor.

Links to the papers (neither should require any special subscriptions):
Huang and Langston 1992
Trabb Pardo 1977
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

Jplus wrote:Yes, the merge in isolation is linear and the sorting algorithm overall is in-place, O(n log n) and unstable. They state that very clearly in section 5. In section 5 they also claim to have found a non-merging sorting algorithm that is in-place, O(n log n) and stable, but that one is discussed in another paper. They do consider the latter algorithm to be related to the former.

Jplus wrote:To be honest, I found the explanation from the paper easier to digest than yours. In fact, you seem to describe a different algorithm since the paper chopped the array in O(sqrt(n)) blocks while you're chopping it in O(log n) blocks.

The algorithm I describe is similar to the one in the paper. Just the choice of partitions is different. Trying to drive home the point that any partitioning scheme that partitions into blocks of size [\sqrt{n} \ldots{} \frac{n}{\log{n}}] will work.

The reason being that you get partitions that are between \log{n}\ \&\ \sqrt{n} in number, both of which can be selected in O(n) time. Further, you can use any O(n\log{n}) sort for the blocks and get away with a O(n) overall running time.
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

Jplus wrote:Links to the papers (neither should require any special subscriptions):
http://comjnl.oxfordjournals.org/content/35/6/643]Huang and Langston 1992
ftp://db.stanford.edu/pub/public_html/cstr.old/reports/cs/tr/74/470/CS-TR-74-470.pdf]Trabb Pardo 1977

Thanks for the links! I shall be reading these over the week. The first one is not free for all though.
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

There's a very conceptually simple O(n(logn)0.5) version.

Break the lists into chunks of sqrt(logn), sort by the largest element. Perform a basic insertion sort starting from the end. No piece can have to move more than 2(sqrt(logn)) pieces, as there are only two monotonically increasing lists to choose blocks from, even adversarially I'm pretty sure, but certainly on average. Any given item cannot need to go back past items of the list it itself is from, and from the other list, you cannot have more than 1 chunk that you need to go past, because the chunks are sorted by the largest element. I'm not sure if its stable.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.

WarDaft

Posts: 1553
Joined: Thu Jul 30, 2009 3:16 pm UTC

### Re: Merging lists in place

Jplus wrote:EDIT: So I dived headlong into it, and I found out that it is a merging sort after all (so I was wrong about that). Again there is a catch: the algorithm only works if no key appears more than approximately sqrt(n) times in the array. It's basically a faster version of an algorithm that was already published in 1977 by Luis Trabb Pardo and which has more relaxed requirements on the key distribution. The older algorithm however still requires that the array has at least about sqrt(n) distinct keys in order to keep its log-linear promise, and even when that works out it has a high constant factor.

Links to the papers (neither should require any special subscriptions):
Huang and Langston 1992
Trabb Pardo 1977

Actually, there is one more catch. The elements of the buffer get randomly permuted, and that is harder than getting around the \sqrt{n} restriction on the number of unique elements. I haven't been able to understand how Langston ensures that a final stable sorting of the buffer will ensure overall stability. i.e. How the buffer is manipulated throughout so that the buffer's elements can be reconstructed stably. It seems hard at first blush.
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

The trick is in the composition of the buffer. All keys in the buffer are distinct, and each of the keys was the first of its rank (or the last, depending on which algorithm we're talking about). So when you sort the buffer in the end you don't need to worry about the relative order of the keys within, and when you merge the sorted buffer back into the rest of the array you know that ties should be broken in favour of the buffer (or the other way round, depending on which algorithm we're talking about).

So this is not really a catch, it's just a relatively complex way to achieve something relatively simple.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

Jplus wrote:The trick is in the composition of the buffer. All keys in the buffer are distinct, and each of the keys was the first of its rank (or the last, depending on which algorithm we're talking about). So when you sort the buffer in the end you don't need to worry about the relative order of the keys within, and when you merge the sorted buffer back into the rest of the array you know that ties should be broken in favour of the buffer (or the other way round, depending on which algorithm we're talking about).

So this is not really a catch, it's just a relatively complex way to achieve something relatively simple.

So, I am not able to understand how can you ensure that the buffer contains distinct keys if all keys are the same?

Even if they aren't, what strategy does the paper suggest in case (say) the largest, 2nd largest, 3rd largest, etc... keys are repeated 3 times each. Let's assume that the buffer holds the last key of it's rank for the purpose of this discussion.
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

Well, the paper doesn't suggest any strategy for the case where all keys have the same value. That's why there is a lower bound of sqrt(n) on the number of distinct key values. There is no way to make it work for fewer distinct key values, AFAIK. That is the catch I was talking about.

I know, it makes these algorithms of rather limited use.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Merging lists in place

Jplus wrote:Well, the paper doesn't suggest any strategy for the case where all keys have the same value. That's why there is a lower bound of sqrt(n) on the number of distinct key values. There is no way to make it work for fewer distinct key values, AFAIK. That is the catch I was talking about.

It seems that the paper does mention (in section 4.4: Other Relevant Details):
In the event that we exhaust the first sublist without filling the internal buffer, we must employ fewer but larger blocks. Specifically, if we obtain only s < \sqrt{n} buffer elements, then we use s blocks, each of size at most \left\lceil \dfrac{n}{\sqrt{s}} \right\rceil.

This makes me think that any arbitrary input is supported by this scheme. Do you think so?
dhruvbird

Posts: 9
Joined: Fri Feb 24, 2012 12:59 am UTC

### Re: Merging lists in place

Oh right, I forgot about that. Yes, you're right that any arbitrary input is supported, but note that the algorithms become less efficient as the block size differs more from sqrt(n). If you deviate much from sqrt(n), the log-log-linear inplace merge sort will be more efficient.

As an aside: I notice that you're quite fond of jsMath, but others do not. AFAICT you don't really need it here.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1293
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

Next