Quicksort For SQR

Posted by Steven Alexander | Feb 8, 2009 | Algorithms | 13 |

We can’t always output information in the proper order by using the DBMS to sort the incoming data. Imagine a corporate roster showing who reports to whom. That’s going to require multiple selects of PS_EMPLOYEES. We could write the data to a scratch table and select it again in order. We could write it to a disk file, use a sort utility, and then read the file.

At moments like this, we face the greatest mystery of our time: why doesn’t SQR have a sort command? My theory is that SQ Software, Sybase, MITI/Sqribe, Brio, Hyperion, and Oracle never added a sort command for our own good, because programming sorts builds character.

(Seriously, does anyone know why there is no sort command in SQR?)

Quicksort is algorithm invented by C.A.R. Hoare in 1962*. It runs fast and small. It’s not optimal in every situation, but it’s usually a good choice.

The challenge for SQR programmers is that Quicksort is recursive and SQR is not. Quicksort’s basic approach is to move the “lesser” data to the first half of the list and the “greater” data to the second half of the list. Next, it calls itself twice, first to repeat the process on the first half of the list, then to repeat the process on the second half of the list.

I searched the Internet for “nonrecursive quicksort,” and found a Basic program by James Haley, dated August 28, 1985. I rewrote it for SQR and generalized it to handle any array. We can put the generic quicksort procedure in an sqc include file and write specific supporting procedures (save_pivot, compare_rows, and swap_rows) for each program.

Here is the generic quicksort procedure. I used the “local” option so that quicksort’s variables wouldn’t interfere with those variables in the main program.

(History quiz – does anyone know why #i and #j are used so often for loop counters?)

begin-procedure quicksort local create-array name=ranges size=32 field=from:integer=0 field=thru:integer=0

clear-array name=ranges move 0 to #range_num put #first_row #last_row into ranges(#range_num)

while #range_num >= 0 get #from #thru from ranges(#range_num) subtract 1 from #range_num if #from < #thru do save_pivot let #i = #from let #j = #thru + 1 while #i < #j move '<' to $relationship while $relationship = '<' and #i < #thru add 1 to #i do compare_rows(#i, $relationship) end-while move '>' to $relationship while $relationship = '>' subtract 1 from #j do compare_rows(#j, $relationship) end-while if #i < #j do swap_rows(#i, #j) end-if end-while do swap_rows(#from, #j) add 1 to #range_num let ranges.from(#range_num) = #from let ranges.thru(#range_num) = #j - 1 add 1 to #range_num let ranges.from(#range_num) = #j + 1 let ranges.thru(#range_num) = #thru end-if end-while end-procedure quicksort

Each SQR program that uses quicksort needs to have its own versions of the save_pivot, compare_rows, and swap_rows procedures. For this example, I imagine a program that needs to sort three times. It uses the $sort variable to control evaluate commands so that quicksort will act on three different arrays (array, employee, and dept) and sort employee in two different ways (“by salary” and “by name”).

Note that save_pivot does not have local variables; it uses global variable $sort to set global variables #value2, or #salary2, or $last_name2 and $first_name2, or department $name2.

begin-procedure save_pivot evaluate $sort when = 'array' let #value2 = array.field(#from) when = 'employee by salary' let #salary2 = employee.salary(#from) when = 'employee by name' get $last_name2 $first_name2 from employee(#from) last_name first_name when = 'dept by name' let $name2 = dept.name(#from) end-evaluate end-procedure save_pivot

Note that compare_rows uses local variables, so it has to refer to $sort and the variables from save_pivot with underscores. It compares to rows from an array and returns the symbols for less than, equal to, or greater than in the $relationship variable. The value of $relationship tells quicksort whether #row1 should come before or after the pivot in the array. If we want to sort in descending order, we would reverse the rules. We can sort by multiple fields (see the employee by name example), with some field ascending and some fields descending.

begin-procedure compare_rows(#row1, :$relationship) evaluate $_sort when = 'array' let #value1 = array.field(#row1) evaluate #value1 when < #_value2 let $relationship = '<' when = #_value2 let $relationship = '=' when > #_value2 let $relationship = '>' end-evaluate when = 'employee by salary' let #salary1 = employee.salary(#row1) evaluate #salary1 when < #_salary2 let $relationship = '<' when = #_salary2 let $relationship = '=' when > #_salary2 let $relationship = '>' end-evaluate when = 'employee by name' get $last_name1 $first_name1 from employee(#row1) last_name first_name evaluate $last_name1 when < $_last_name2 let $relationship = '<' when = $_last_name2 evaluate $first_name1 when < $_first_name2 let $relationship = '<' when = $_first_name2 let $relationship = '=' when > $_first_name2 let $relationship = '>' end-evaluate when > $_last_name2 let $relationship = '>' end-evaluate when = 'dept by name' let $name1 = dept.name(#row1) evaluate $name1 when < $_name2 let $relationship = '<' when = $_name2 let $relationship = '=' when > $_name2 let $relationship = '>' end-evaluate end-evaluate end-procedure compare_rows

Swapping rows is easy when we’re dealing with just one array and just one field. For the general case, we need something this elaborate. Note that we can use the same swapping code for “employee by salary” and “employee by name.”

begin-procedure swap_rows(#row1, #row2) evaluate $_sort when = 'array' let #value1 = array.field(#row1) let #value2 = array.field(#row2) let array.field(#row2) = #value1 let array.field(#row1) = #value2 when = 'employee by salary' when = 'employee by name' get $emplid1 $last_name1 $first_name1 #salary1 from employee(#row1) get $emplid2 $last_name2 $first_name2 #salary2 from employee(#row2) put $emplid1 $last_name1 $first_name1 #salary1 into employee(#row2) put $emplid2 $last_name2 $first_name2 #salary2 into employee(#row1) when = 'dept by name' get $setid1 $deptid1 $name1 from dept(#row1) get $setid2 $deptid2 $name2 from dept(#row2) put $setid1 $deptid1 $name1 into dept(#row2) put $setid2 $deptid2 $name2 into dept(#row1) end-evaluate end-procedure swap_rows

* March 2, 2009 addendum: C.A.R. Hoare is Sir Charles Antony Richard Hoare, a Britsh Computer Scientist. Not only did he invent Quicksort, he invented null pointers in 1965. He now calls null pointers his billion dollar mistake because of programmers like me who have written countless bugs based on null pointers. Thankfully, I haven’t dereferenced a pointer since I started programming in SQR.

Looking Ahead

Sometimes we sort data for use within the program, not just for output. Looking for data in an unsorted array requires a slow, linear search. Looking for data in a sorted array can use a fast, binary search. Next week we’ll look at binary searches and binary/linear searches. (A binary/linear search is my name for searching through sorted, effective-dated data.)

Brain Teaser

Here’s the warm-up: there are two variables, #x and #y. We need to set #min to the lesser of the two and #max to the greater of the two. Avoiding if commands, what are the formulae for #min and #max?

Here’s the challenge: there are three variables, #x, #y, and #z. What is the simplest code to set the values of #min, #mid, and #max?

About The Author

Steven Alexander

13 Comments

Bob Josephson on February 9, 2009 at 5:06 am

To answer your history quiz question (“does anyone know why #i and #j are used so often for loop counters?”): yes, someone knows.

(And one of those someones would be me.)

(But what letter would you use if you had more than 6 nested loops? You can’t use ‘o’ as a loop variable…)

(And what would ancient Roman FORTRAN programmers do if they had more than 5 nested loops?)

By the way, I notice that as soon as you start talking about recursion and the presence or absence of sorts in a language, you stop keeping score between SQR and C. How conveeeenient!
administrator on February 9, 2009 at 11:04 am

You seem to know the answer, so why don’t you explain it for the readers who aren’t as old as you are? If you do, I’ll tell you how Fortran II (yes, it was identified with roman numerals) supported more than six nested loops – for the sake of the readers who aren’t as old as I am.

I stopped keeping score between SQR and C because I demonstrated that recursion is completely unnecessary. Next week, I will debunk pointers (smirk).
Bob Josephson on February 9, 2009 at 12:41 pm

OK: in the mid-1950s (before I was born – just for the record) FORTRAN defined variables as integers if their names started with an I, J, K, L, M, or N. Variables starting with other letters were real numbers.

So ‘I’ was just the first and shortest legal choice for an integer loop counter.

Now, I don’t have encyclopedic knowledge of features introduced in each version of FORTRAN, but at some point it allowed programmers to explicitly declare variables with any name to have any type. Is that what you’re referring to as the solution introduced in FORTRAN II?

(Of course, I was being facetious in the first place about the problem with nested loop variables. Even the original FORTRAN allowed variables with multi-character names. As for the ancient Romans, they used APL, thus causing Servilius Casca to famously complain “it was Greek to me.”)

As for recursion not being necessary: sure, you can, with varying degrees of difficulty, survive without it. But the same is true of the EVALUATE statement, and you were keeping score for that. I detect blatant pro-SQR bias!! 🙂
administrator on February 9, 2009 at 1:41 pm

You’re right on both counts.

First, FORTRAN, the IBM FORmula TRANslating System, had integer variables and floating point variables. Fortran variable names that started with I through N were integers.

Second, Fortran allowed variables with up to five or six character names (I forget which). I don’t know whether that was in the first version; my earliest experience was with Fortran II and most of my experience was with Fortran IV. In Fortran IV, any variable could be declared as an integer.
AdvandyBainia on February 23, 2009 at 10:48 pm

Thank you!
Nathan on February 24, 2009 at 1:45 pm

Regarding the brain teaser:

Why are we avoiding IF? Can we use the cond() function? How about range()? Or are you trying to stick with “math” functions like sign()?
administrator on February 24, 2009 at 2:15 pm

I wanted to avoid IF because the solution with IF isn’t much of a brain teaser. COND() and RANGE() are good solutions, although they merely move the IF operation into the function. There are ways to get MIN and MAX without comparing the inputs. Note that the puzzle uses numeric variables, not string variables.
Nathan on February 25, 2009 at 9:26 pm

Okay, I was thinking you might be trying to go down the “algebra” road….

For the two-variable case, I remembered from years ago the trick that relies on the fact that the average of two numbers is half-way between them, so if you start at the average and then move one-half of the difference between the two numbers you end back at one of the original two numbers — and by controling which direction you move, you will arrive at either the smaller or the larger of the two.

So, simplifying the formula and converting it into SQR code produces:

begin-procedure min_max(#x, #y, :#min, :#max) let #min= ( #x + #y - abs(#x - #y) )/2 let #max= ( #x + #y + abs(#x - #y) )/2 end-procedure

For the three-variable case, I wasn’t able to convince myself that any code would be “simpler” than breaking down the problem into pairs and invoking the two-variable procedure on each pair:

begin-procedure min_mid_max(#x, #y, #z, :#min, :#mid, :#max) do min_max(#x, #y, #lower, #higher) do min_max(#lower, #z, #min, #dummy1) do min_max(#higher, #z, #dummy2, #max) let #mid= #x + #y + #z - #min - #max end-procedure

(We know that at least one of #dummy1 or #dummy2 is going to be #mid, but we don’t know which one, so as long as we are avoiding IF commands we just use addition/subtraction to calculate #mid.)

Did you find anything simpler than that? (Well, other than just using IF, which would clearly be much simpler 🙂 )
administrator on February 26, 2009 at 9:06 am

I think you’ve found the simplest approach, maybe even simpler than using IF statements. I’ve tried to sketch out the code with IF statements in my head and it was surprisingly complicated.
Nathan on February 26, 2009 at 12:04 pm

Well, the thing about using the mathematical method is that I’m always slightly nervous that some math-implementation issue ( e.g. rounding errors ) might show up at some point in the life of the application. Given a choice, I’d lean toward straight MOVEs rather than relying on floating-point arithmetic, even though I know that most applications where SQR is used don’t involve numbers that push any implementation limits.

So, I’d want to at least implement my “min_max” routine above with a simple IF statement (no nesting would be required for a two-variable min/max) instead of using calculations. (That would still leave the LET expression that calculats #mid, but I trust addition more than division. 🙂 )

But really I’d probably want to replace both min_max and min_mid_max with the following:

begin-procedure min_mid_max_if(#x, #y, #z, :#min, :#mid, :#max)

move #x to #min

if #y < #min move #min to #mid move #y to #min else move #y to #mid end-if

if #z < #min move #mid to #max move #min to #mid move #z to #min else if #z < #mid move #mid to #max move #z to #mid else move #z to #max end-if end-if

end-procedure ! min_mid_max_if

Although this routine involves more lines of code than the min_max/min_mid_max combo, I think it is somewhat easier to understand what each line is trying to accomplish (and it involves no arithmetic calculations at all)….
administrator on February 26, 2009 at 2:30 pm

This is better than what I had in mind; you’ve basically written a specialized, three element insertion sort. The advantage of this is that you could extend it to four elements more easily than extending a mathematical solution. The next step would be to try a three element bubble sort.
John on March 30, 2011 at 1:37 pm

There is a problem with the “#from” variable. In the quicksort procedure, this is a local variable. When the save_pivot procedure tries to reference it, it’s not available. I change this to the global variable “#_from” in the quicksort procedure and it seems to be working now.
administrator on April 4, 2011 at 10:32 am

John, thank you for discovering and fixing this. I guess we can take three approaches;

(1) use #_from in the quicksort procedure
(2) pass #from as a parameter to the save_pivot procedure
(3) make quicksort a non-local procedure (I made it local so I could assign variable names freely)

I should change the code, but I need to think about these alternatives.

Quicksort For SQR

Looking Ahead

Brain Teaser

About The Author

Steven Alexander

13 Comments

Recent Posts

Recent Comments

Archives

Categories

Meta

Quicksort For SQR

Looking Ahead

Brain Teaser

About The Author

Steven Alexander

Related Posts

How To Parse A Filename in SQR

Dynamic Views in Application Designer

Rounding Error

6 Lessons Learned From Bad SQL

13 Comments

Recent Posts

Recent Comments

Archives

Categories

Meta