We can’t always output information in the proper order by using the DBMS to sort the incoming data. Imagine a corporate roster showing who reports to whom. That’s going to require multiple selects of PS_EMPLOYEES. We could write the data to a scratch table and select it again in order. We could write it to a disk file, use a sort utility, and then read the file.
At moments like this, we face the greatest mystery of our time: why doesn’t SQR have a sort command? My theory is that SQ Software, Sybase, MITI/Sqribe, Brio, Hyperion, and Oracle never added a sort command for our own good, because programming sorts builds character.
(Seriously, does anyone know why there is no sort command in SQR?)
Quicksort is algorithm invented by C.A.R. Hoare in 1962*. It runs fast and small. It’s not optimal in every situation, but it’s usually a good choice.
The challenge for SQR programmers is that Quicksort is recursive and SQR is not. Quicksort’s basic approach is to move the “lesser” data to the first half of the list and the “greater” data to the second half of the list. Next, it calls itself twice, first to repeat the process on the first half of the list, then to repeat the process on the second half of the list.
I searched the Internet for “nonrecursive quicksort,” and found a Basic program by James Haley, dated August 28, 1985. I rewrote it for SQR and generalized it to handle any array. We can put the generic quicksort procedure in an sqc include file and write specific supporting procedures (save_pivot, compare_rows, and swap_rows) for each program.
Here is the generic quicksort procedure. I used the “local” option so that quicksort’s variables wouldn’t interfere with those variables in the main program.
(History quiz – does anyone know why #i and #j are used so often for loop counters?)
begin-procedure quicksort local
create-array name=ranges size=32
field=from:integer=0
field=thru:integer=0
clear-array name=ranges
move 0 to #range_num
put #first_row #last_row into ranges(#range_num)
while #range_num >= 0
get #from #thru from ranges(#range_num)
subtract 1 from #range_num
if #from < #thru
do save_pivot
let #i = #from
let #j = #thru + 1
while #i < #j
move '<' to $relationship
while $relationship = '<' and #i < #thru
add 1 to #i
do compare_rows(#i, $relationship)
end-while
move '>' to $relationship
while $relationship = '>'
subtract 1 from #j
do compare_rows(#j, $relationship)
end-while
if #i < #j
do swap_rows(#i, #j)
end-if
end-while
do swap_rows(#from, #j)
add 1 to #range_num
let ranges.from(#range_num) = #from
let ranges.thru(#range_num) = #j - 1
add 1 to #range_num
let ranges.from(#range_num) = #j + 1
let ranges.thru(#range_num) = #thru
end-if
end-while
end-procedure quicksort
Each SQR program that uses quicksort needs to have its own versions of the save_pivot, compare_rows, and swap_rows procedures. For this example, I imagine a program that needs to sort three times. It uses the $sort variable to control evaluate commands so that quicksort will act on three different arrays (array, employee, and dept) and sort employee in two different ways (“by salary” and “by name”).
Note that save_pivot does not have local variables; it uses global variable $sort to set global variables #value2, or #salary2, or $last_name2 and $first_name2, or department $name2.
begin-procedure save_pivot
evaluate $sort
when = 'array'
let #value2 = array.field(#from)
when = 'employee by salary'
let #salary2 = employee.salary(#from)
when = 'employee by name'
get $last_name2 $first_name2
from employee(#from) last_name first_name
when = 'dept by name'
let $name2 = dept.name(#from)
end-evaluate
end-procedure save_pivot
Note that compare_rows uses local variables, so it has to refer to $sort and the variables from save_pivot with underscores. It compares to rows from an array and returns the symbols for less than, equal to, or greater than in the $relationship variable. The value of $relationship tells quicksort whether #row1 should come before or after the pivot in the array. If we want to sort in descending order, we would reverse the rules. We can sort by multiple fields (see the employee by name example), with some field ascending and some fields descending.
begin-procedure compare_rows(#row1, :$relationship)
evaluate $_sort
when = 'array'
let #value1 = array.field(#row1)
evaluate #value1
when < #_value2
let $relationship = '<'
when = #_value2
let $relationship = '='
when > #_value2
let $relationship = '>'
end-evaluate
when = 'employee by salary'
let #salary1 = employee.salary(#row1)
evaluate #salary1
when < #_salary2
let $relationship = '<'
when = #_salary2
let $relationship = '='
when > #_salary2
let $relationship = '>'
end-evaluate
when = 'employee by name'
get $last_name1 $first_name1
from employee(#row1) last_name first_name
evaluate $last_name1
when < $_last_name2
let $relationship = '<'
when = $_last_name2
evaluate $first_name1
when < $_first_name2
let $relationship = '<'
when = $_first_name2
let $relationship = '='
when > $_first_name2
let $relationship = '>'
end-evaluate
when > $_last_name2
let $relationship = '>'
end-evaluate
when = 'dept by name'
let $name1 = dept.name(#row1)
evaluate $name1
when < $_name2
let $relationship = '<'
when = $_name2
let $relationship = '='
when > $_name2
let $relationship = '>'
end-evaluate
end-evaluate
end-procedure compare_rows
Swapping rows is easy when we’re dealing with just one array and just one field. For the general case, we need something this elaborate. Note that we can use the same swapping code for “employee by salary” and “employee by name.”
begin-procedure swap_rows(#row1, #row2)
evaluate $_sort
when = 'array'
let #value1 = array.field(#row1)
let #value2 = array.field(#row2)
let array.field(#row2) = #value1
let array.field(#row1) = #value2
when = 'employee by salary'
when = 'employee by name'
get $emplid1 $last_name1 $first_name1 #salary1
from employee(#row1)
get $emplid2 $last_name2 $first_name2 #salary2
from employee(#row2)
put $emplid1 $last_name1 $first_name1 #salary1
into employee(#row2)
put $emplid2 $last_name2 $first_name2 #salary2
into employee(#row1)
when = 'dept by name'
get $setid1 $deptid1 $name1 from dept(#row1)
get $setid2 $deptid2 $name2 from dept(#row2)
put $setid1 $deptid1 $name1 into dept(#row2)
put $setid2 $deptid2 $name2 into dept(#row1)
end-evaluate
end-procedure swap_rows
* March 2, 2009 addendum: C.A.R. Hoare is Sir Charles Antony Richard Hoare, a Britsh Computer Scientist. Not only did he invent Quicksort, he invented null pointers in 1965. He now calls null pointers his billion dollar mistake because of programmers like me who have written countless bugs based on null pointers. Thankfully, I haven’t dereferenced a pointer since I started programming in SQR.
Looking Ahead
Sometimes we sort data for use within the program, not just for output. Looking for data in an unsorted array requires a slow, linear search. Looking for data in a sorted array can use a fast, binary search. Next week we’ll look at binary searches and binary/linear searches. (A binary/linear search is my name for searching through sorted, effective-dated data.)
Brain Teaser
Here’s the warm-up: there are two variables, #x and #y. We need to set #min to the lesser of the two and #max to the greater of the two. Avoiding if commands, what are the formulae for #min and #max?
Here’s the challenge: there are three variables, #x, #y, and #z. What is the simplest code to set the values of #min, #mid, and #max?
To answer your history quiz question (“does anyone know why #i and #j are used so often for loop counters?”): yes, someone knows.
(And one of those someones would be me.)
(But what letter would you use if you had more than 6 nested loops? You can’t use ‘o’ as a loop variable…)
(And what would ancient Roman FORTRAN programmers do if they had more than 5 nested loops?)
By the way, I notice that as soon as you start talking about recursion and the presence or absence of sorts in a language, you stop keeping score between SQR and C. How conveeeenient!
You seem to know the answer, so why don’t you explain it for the readers who aren’t as old as you are? If you do, I’ll tell you how Fortran II (yes, it was identified with roman numerals) supported more than six nested loops – for the sake of the readers who aren’t as old as I am.
I stopped keeping score between SQR and C because I demonstrated that recursion is completely unnecessary. Next week, I will debunk pointers (smirk).
OK: in the mid-1950s (before I was born – just for the record) FORTRAN defined variables as integers if their names started with an I, J, K, L, M, or N. Variables starting with other letters were real numbers.
So ‘I’ was just the first and shortest legal choice for an integer loop counter.
Now, I don’t have encyclopedic knowledge of features introduced in each version of FORTRAN, but at some point it allowed programmers to explicitly declare variables with any name to have any type. Is that what you’re referring to as the solution introduced in FORTRAN II?
(Of course, I was being facetious in the first place about the problem with nested loop variables. Even the original FORTRAN allowed variables with multi-character names. As for the ancient Romans, they used APL, thus causing Servilius Casca to famously complain “it was Greek to me.”)
As for recursion not being necessary: sure, you can, with varying degrees of difficulty, survive without it. But the same is true of the EVALUATE statement, and you were keeping score for that. I detect blatant pro-SQR bias!! 🙂
You’re right on both counts.
First, FORTRAN, the IBM FORmula TRANslating System, had integer variables and floating point variables. Fortran variable names that started with I through N were integers.
Second, Fortran allowed variables with up to five or six character names (I forget which). I don’t know whether that was in the first version; my earliest experience was with Fortran II and most of my experience was with Fortran IV. In Fortran IV, any variable could be declared as an integer.
Thank you!
Regarding the brain teaser:
Why are we avoiding IF? Can we use the cond() function? How about range()? Or are you trying to stick with “math” functions like sign()?
I wanted to avoid IF because the solution with IF isn’t much of a brain teaser. COND() and RANGE() are good solutions, although they merely move the IF operation into the function. There are ways to get MIN and MAX without comparing the inputs. Note that the puzzle uses numeric variables, not string variables.
Okay, I was thinking you might be trying to go down the “algebra” road….
For the two-variable case, I remembered from years ago the trick that relies on the fact that the average of two numbers is half-way between them, so if you start at the average and then move one-half of the difference between the two numbers you end back at one of the original two numbers — and by controling which direction you move, you will arrive at either the smaller or the larger of the two.
So, simplifying the formula and converting it into SQR code produces:
begin-procedure min_max(#x, #y, :#min, :#max)
let #min= ( #x + #y - abs(#x - #y) )/2
let #max= ( #x + #y + abs(#x - #y) )/2
end-procedure
For the three-variable case, I wasn’t able to convince myself that any code would be “simpler” than breaking down the problem into pairs and invoking the two-variable procedure on each pair:
begin-procedure min_mid_max(#x, #y, #z, :#min, :#mid, :#max)
do min_max(#x, #y, #lower, #higher)
do min_max(#lower, #z, #min, #dummy1)
do min_max(#higher, #z, #dummy2, #max)
let #mid= #x + #y + #z - #min - #max
end-procedure
(We know that at least one of #dummy1 or #dummy2 is going to be #mid, but we don’t know which one, so as long as we are avoiding IF commands we just use addition/subtraction to calculate #mid.)
Did you find anything simpler than that? (Well, other than just using IF, which would clearly be much simpler 🙂 )
I think you’ve found the simplest approach, maybe even simpler than using IF statements. I’ve tried to sketch out the code with IF statements in my head and it was surprisingly complicated.
Well, the thing about using the mathematical method is that I’m always slightly nervous that some math-implementation issue ( e.g. rounding errors ) might show up at some point in the life of the application. Given a choice, I’d lean toward straight MOVEs rather than relying on floating-point arithmetic, even though I know that most applications where SQR is used don’t involve numbers that push any implementation limits.
So, I’d want to at least implement my “min_max” routine above with a simple IF statement (no nesting would be required for a two-variable min/max) instead of using calculations. (That would still leave the LET expression that calculats #mid, but I trust addition more than division. 🙂 )
But really I’d probably want to replace both min_max and min_mid_max with the following:
begin-procedure min_mid_max_if(#x, #y, #z, :#min, :#mid, :#max)
move #x to #min
if #y < #min
move #min to #mid
move #y to #min
else
move #y to #mid
end-if
if #z < #min
move #mid to #max
move #min to #mid
move #z to #min
else
if #z < #mid
move #mid to #max
move #z to #mid
else
move #z to #max
end-if
end-if
end-procedure ! min_mid_max_if
Although this routine involves more lines of code than the min_max/min_mid_max combo, I think it is somewhat easier to understand what each line is trying to accomplish (and it involves no arithmetic calculations at all)….
This is better than what I had in mind; you’ve basically written a specialized, three element insertion sort. The advantage of this is that you could extend it to four elements more easily than extending a mathematical solution. The next step would be to try a three element bubble sort.
There is a problem with the “#from” variable. In the quicksort procedure, this is a local variable. When the save_pivot procedure tries to reference it, it’s not available. I change this to the global variable “#_from” in the quicksort procedure and it seems to be working now.
John, thank you for discovering and fixing this. I guess we can take three approaches;
(1) use #_from in the quicksort procedure
(2) pass #from as a parameter to the save_pivot procedure
(3) make quicksort a non-local procedure (I made it local so I could assign variable names freely)
I should change the code, but I need to think about these alternatives.