3 Methods for Sorting Problems in Algorithms and Data Structure Classes
3 Methods for Sorting Problems in Algorithms and Data Structure Classes
Join the DZone community and get the full member experience.
Join For FreeBuilt by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph^{™} is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
#1: Sorting huge files
Sorting big files might not be as simple as just implementing an sort algorithm. As soon as the file does not fit in memory any more smarter implementations have to be applied. One way is to sort the file on the hard disk. We remark that not every algorithm is easily adopted for this kind of task. So your task for the exercise is to decide what kind of alogrithms are good to solve the problem and what approach to handle huge files could be taken?
 Discuss what kind of operations are efficient while retrieving / processing data from the hard disk
 Discuss what kind of operations are needed in the different algorithms
 Create a table to display the results and choose the most apropriate algorithm.
One possible way of implementing this would be to split the file in smaller files which can be sorted in memory and then use a bottom up merge function to merge all those files.
In order to do so you can sort this Snapshot of all wikipedia revisions taken from the German wikipedia 2011. The file is uncompressed 3.1 gigabyte in size and consists of 128 million rows. In particular it already contains a partial order.
#2: Finding the k smallest element in an unsorted List
Your task is to find the k smallest element from an unsorted list. (thanks to Robert Sedgwick for inspiration!)
Obviously one approach would be to sort all the data and then retrieve the kth element. The runtime of this approach would be O(n log(n)) though. We want to achieve this in linear runtime which is possible due to the help of the partition function of quicksort.
After calling the partition function the unordered list is split in two sublists with lenght i and ni. The first list contains the first i elements (not neccessarily sorted). comparing i to k tells you weather to search in the first or second sublist for the element.
 Use this idea to implement findMinK(ArrayList <Integer> array, int k, int l, int r)
 Test the runtime of your implementation against the primitive approach of first sorting. In order to test you can just download the testframe work code below. In your function you should increase the global variable cmpcnt every time the partition function swaps elements in the list
 Write down the recursive equation of your solution and solve it in order to prove that the average case runtime is also theoretically linear.
 Compare this runtime behaviour to quicksort (next exercise) and explain why these approaches are in different complexity classes
import java.util.ArrayList; import java.util.Collections; import java.util.Random; public class mink { static public int cmpcnt = 0; public static void main(String[] args) { testFramework(); } public static int findMinK(ArrayList<Integer> array, int k, int l, int r) { // Implement here } public static int findMinK(ArrayList<Integer> array, int k){ Collections.sort(array); return array.get(k); } private static void testFramework() { ArrayList<Integer> a = new ArrayList<Integer>(); for (int j=2;j<8;j++){ a.clear(); for (int i=0;i<(int)Math.pow(10, j);i++){ a.add(i); } System.out.println("nn"+a.size()+" Elementsnn"); double slow=0; double fast=0; for (int i = 0; i < 10; i++) { cmpcnt = 0; Collections.shuffle(a); int k = (int)(Math.random()*(Math.pow(10, j)1))+1; System.out.println("test run number: " + i + " find: " + k); long start = System.currentTimeMillis(); findMinK(a, k, 0, a.size()1); long end = System.currentTimeMillis(); long smarttime=(endstart); fast = fast + smarttime; System.out.println("SMART ALGO t  time in ms: " + smarttime + " comparisons: " + cmpcnt); start = System.currentTimeMillis(); findMinK(a, k); end = System.currentTimeMillis(); long slowtime = (endstart); System.out.println("WITH SORTING t  time in ms: " + slowtime); System.out.println("sorting is " +(double)slowtime/(double)smarttime + " times slower"); slow = slow + slowtime; } System.out.println("sorting (="+slow+"ms) is " +slow/fast + " times slower than smart algo (="+fast+"ms)"); } } }
#3: Solving recursive equations: Proving runtime of Quicksort
Quicksort is a probabilistic algorithm and its recursive equation is given by the implicit equation

Explain the meaning of the sum in this equation and its connection to stochastics.
Solve the equation. In order to do so. you can use the following equivalences and solve the recursive equation by substitution.
If you like this post, you might like these related posts:
 My Blog guesses your name – Binary Search Exercise for Algorithms and data structures class Binary Search http://en.wikipedia.org/wiki/Binary_search_algorithm is a very basic algorithm in computer...
 balanced binary search trees exercise for algorithms and data structures class I created some exercises regarding binary search trees. This time...
 how to move wordpress directory: Problems with upload_path wpcontent/uploads Recently I was forced to move a wordpress site to...
 Data structure for Social news streams on Graph data bases UPDATE: look at http://www.renepickhardt.de/graphity for a more scientific survey and...
 About Internet Start ups The web already brought up some really cool internet products...
Download AnzoGraph now and find out for yourself why it is acknowledged as the most complete allinone data warehouse for BI style and graph analytics.
Published at DZone with permission of René Pickhardt , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
{{ parent.title  parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}