I’ve written about variables several times (notably in Bottom #Ten Worst Variable Names and SQR Without Literals), but a recent entry on a new blog gave me a new way to think about them.
Psychology of Programming
Mats Stafseng Einarsen writes a blog called “Psychology of Programming” to explore “the part human brains play in the process of computer programming.” His September 9, 2009 post is Variables and the roles they play, which summarizes the work of Jorma Sajaniemi and provides links to various academic papers on the subject.
You should read Einarsen’s article before we continue, but I will summarize his entry and hope to not distort it. We can categorize variables by the roles they take in algorithms, and there are a small number of roles that cover almost all variable use, at least for programming novices. Adapting Einarsen’s list of Sajaniemi’s roles to SQR gives us the following.
- Fixed value (e.g. {max_401k_contribution}): set only once.
- Stepper (e.g. #employee_counter): assumes a predictable succession of values.
- Most-recent (e.g. &ANNUAL_RT): latest value in a succession of unpredictable values.
- Most-wanted (e.g. #max_salary): best value so far.
- Gatherer (e.g. #total_salary): affected by all most-recent values.
- Follower (e.g. $prev_employee): previous value of a most-recent variable.
- One-way flag (e.g. #row_found): Boolean (true/false) that changes value only once.
- Temporary (e.g. #i): a variable used for a short period of time or in a small block of code.
- Organizer (e.g. array): data structure with elements that can be rearranged.
- Container (e.g. string): dynamic data structure.
- Walker (e.g. #index): variable that traverses an array.
This is a thought provoking taxonomy. We often classify variables by the type of data they store; numbers, strings, pointers, and logical states, or by their scope and persistence. We also differentiate scalars versus structures, read-only versus read-and-write. In Sajaniemi’s approach, types 1 – 8 and 11 are defined by how we use the variables.
Sajaniemi seems to use this model for insight into the process of teaching and learning programming. Einarsen speculates that this model might help us improve the programming process, perhaps with new language syntax. I wonder if this approach will give us insights into better programming practices.
Fixed Value Variables
My example of a fixed value variable was an SQR substitution variable. The compiler performs a textual find-and-replace before the program executes. Another example would be a regular variable that we set once. The compiler would permit us to change it, but we don’t. How flexible is this concept? Does this qualify? (I’m making up numbers for the 401k maximums. Please do not cut and paste this code.)
evaluate #year
when <= 2005
let #max_401k_contribution = 10000
when = 2006
let #max_401k_contribution = 11000
when = 2007
let #max_401k_contribution = 12000
when-other
let #max_401k_contribution = 12000 + 1000 * (#year – 2007)
end-evaluate
This code has four statements that set the value of #max_401k_contribution, but only one of them will be executed. Is that fixed value?
The #year variable is also a fixed-value variable. That’s clear enough, but it feels incomplete to me. I think I’m using the #year variable in two roles; first it’s a decision-making variable, then it’s an input to computation. Is this a bad practice? Am I misusing the concept of variable role or even creating a different map?
Stepper Variables
Einarsen used a variable named “count” as an example, so I originally thought of “#counter,” as in this loop.
move 0 to #counter
while #counter < {num_iterations}
print '--*--' ()
add 1 to #counter
end-while
SQR requires us to use a variable to control looping, but I might not care about the value of the variable. I might be just as happy to write the equivalent of the COBOL block that loops a certain number of times but hides the loop counter (if SQR had that feature).
perform 10 times
…
end-perform
While loop control variables are stepper variables, Einarsen said “count,” not counter. He may be thinking of something like this.
begin-select
sex
if &sex = 'm'
add 1 #count_male
else
add 1 to #count_female
end-if
from ps_personal_data
end-select
Non-negative integers are a predictable, ordered succession of values. We can use them to control program flow or to produce information (How many employees are female?). We can also create our own set of values.
let $states = 'CA TX NY'
let #state_position = 1
while #state_position < length($states)
let $state = substr($states, #state_position, 2)
do print_state_report($state)
add 3 to #state_position
end-while
One Way Flag Variables
This category is very specific; only flags that only have two values and they only change from their first value to their second value once. It seems like the Boolean value returned by a “where exists” or “where not exists” clause in SQL.
I’ve used flags to store the results of complex Boolean statements, so I could use the results all over the program rather than recalculate them repeatedly. Those flags took new values with every row of data I read. I could call them most-recent variables, but is that the best way to understand them?
I’ve used flags to express user choices from multiple options. I wrote a general ledger interface that could export a payroll or a set of journal entry corrections or yearend accruals. My flag, $export_mode, took the values ‘payroll’, ‘corrections’, and ‘yearend’ to route program flow where my algorithm differed for the different cases. I could call them fixed value, but aren’t they different from a variable like #number_of_states (which has been set to 50 for the USA since 1959)?
Containers
Other languages have memory allocation and pointers, which can be used to form linked lists, trees, and other exotic data objects. Some versions of SQR have expandable arrays. Load-lookup tables are like read-only hash arrays. I simulate a dynamic data structure by treating a string variable as an array.
Strings can grow and shrink in size up to almost 32,768 characters. The substr() function can read a small string from an “array” of fixed length strings by position. Here is a way to use string “arrays” to map letters to digits on a telephone keypad.
let #digit = substr('22233344455566677778889999',
instr('ABCDEFGHIJKLMNOPQRSTUVWXYZ', $letter, 1), 1)
Alternately, we can delimit the items in the “array” with a special character and read them sequentially by repeated use of the instr() function.
let $fruits = 'apple@banana@cherry@grape@honeydew@lemon@nectarine@orange@pineapple@'
let #start_position = 1
while {true}
let #end_position = instr($fruits, '@', #start_position)
if #end_position = 0
break
end-if
let $fruit = substr($fruits, #start_position, #end_position - #start_position)
let #start_position = #end_position + 1
print $fruit (+1,1)
end-while
In this example, some variables take multiple roles.
- $fruits is a container
- #start_position is a most-recent variable, a follower variable, and a walker
- #end_position is a most-recent variable, a flag (according to my expanded definition because it controls a loop exit), and a walker
- $fruit is a stepper (controlled by most-recent / walker variables)
Alternate Variable Roles
After writing this far, I realize that Sajaniemi’s roles are based on the history (within a single execution of a program) of setting a variable.
- Fixed values are set only once.
- One-way flags are set only once or twice.
- Followers are set to every value of a most-recent variable.
- Most-wanted are set to certain qualifying values of a most-recent variable.
- A gatherer is not set to a most-recent variable, it is set to a formula that includes itself and the most-recent variable.
Is this a useful way to think about variables? I believe that every system of classification has some insights to offer. Another approach would be to define roles based on the history of using a variable. This may or may not work as well. If we end up with 117 categories and the suspicion that there are many more, what have we learned? I can think of a few categories right away. What else is there?
- to calculate a value: let #profit = #revenue – #expenses
- to make a decision: if #age > 24
- to select data: #index in let $name = emp.name(#index)
- to communicate: print $report_title (1,1)
- to be an adverb (parameter for a command): print ‘Employee Report’ (1, #column)
What’s The Moral?
Is it me, is it SQR, or is it the truth about roles? As my computer science professor, William Shakespeare, said, it seems that all the program’s a stage, and all variables merely players, and one variable in its time plays many parts.
Is the concept of variable roles normative (i.e. an indicator of good practice) or descriptive? I wonder if we should try to restrict each variable to a specific role or if we should use the concept of roles to standardize our detailed descriptions of algorithms.
If variables take different roles, under my classification by usage, should we encourage restraint? If a flag variable is set to zero or one to denote false or true, we can use it for program control:
if #file_not_found
show 'Cannot open file ' $filename
stop
end-if
But should we use it for arithmetic?
let #unknown_depts = #unknown_depts + #dept_not_found
Didn’t your professor also say “The first thing we do, let’s kill all the programmers”?