I’ve written about variables several times (notably in Bottom #Ten Worst Variable Names and SQR Without Literals), but a recent entry on a new blog gave me a new way to think about them.

Psychology of Programming

Mats Stafseng Einarsen writes a blog called “Psychology of Programming” to explore “the part human brains play in the process of computer programming.”  His September 9, 2009 post is Variables and the roles they play, which summarizes the work of Jorma Sajaniemi and provides links to various academic papers on the subject.

You should read Einarsen’s article before we continue, but I will summarize his entry and hope to not distort it.  We can categorize variables by the roles they take in algorithms, and there are a small number of roles that cover almost all variable use, at least for programming novices.  Adapting Einarsen’s list of Sajaniemi’s roles to SQR gives us the following.

  1. Fixed value (e.g. {max_401k_contribution}): set only once.
  2. Stepper (e.g. #employee_counter): assumes a predictable succession of values.
  3. Most-recent (e.g. &ANNUAL_RT): latest value in a succession of unpredictable values.
  4. Most-wanted (e.g. #max_salary): best value so far.
  5. Gatherer (e.g. #total_salary): affected by all most-recent values.
  6. Follower (e.g. $prev_employee): previous value of a most-recent variable.
  7. One-way flag (e.g. #row_found): Boolean (true/false) that changes value only once.
  8. Temporary (e.g. #i): a variable used for a short period of time or in a small block of code.
  9. Organizer (e.g. array): data structure with elements that can be rearranged.
  10. Container (e.g. string): dynamic data structure.
  11. Walker (e.g. #index): variable that traverses an array.

This is a thought provoking taxonomy.  We often classify variables by the type of data they store; numbers, strings, pointers, and logical states, or by their scope and persistence.  We also differentiate scalars versus structures, read-only versus read-and-write.  In Sajaniemi’s approach, types 1 – 8 and 11 are defined by how we use the variables.

Sajaniemi seems to use this model for insight into the process of teaching and learning programming.  Einarsen speculates that this model might help us improve the programming process, perhaps with new language syntax.   I wonder if this approach will give us insights into better programming practices.

Fixed Value Variables

My example of a fixed value variable was an SQR substitution variable.  The compiler performs a textual find-and-replace before the program executes.  Another example would be a regular variable that we set once.  The compiler would permit us to change it, but we don’t.  How flexible is this concept?  Does this qualify?  (I’m making up numbers for the 401k maximums.  Please do not cut and paste this code.)

evaluate #year
  when <= 2005
    let #max_401k_contribution = 10000
  when = 2006
    let #max_401k_contribution = 11000
  when = 2007
    let #max_401k_contribution = 12000
  when-other
    let #max_401k_contribution = 12000 + 1000 * (#year – 2007)
end-evaluate

This code has four statements that set the value of #max_401k_contribution, but only one of them will be executed. Is that fixed value?

The #year variable is also a fixed-value variable.  That’s clear enough, but it feels incomplete to me.  I think I’m using the #year variable in two roles; first it’s a decision-making variable, then it’s an input to computation.  Is this a bad practice?  Am I misusing the concept of variable role or even creating a different map?

Stepper Variables

Einarsen used a variable named “count” as an example, so I originally thought of “#counter,” as in this loop.

move 0 to #counter
while #counter < {num_iterations}
  print '--*--' ()
  add 1 to #counter
end-while

SQR requires us to use a variable to control looping, but I might not care about the value of the variable.  I might be just as happy to write the equivalent of the COBOL block that loops a certain number of times but hides the loop counter (if SQR had that feature).

perform 10 times

end-perform

While loop control variables are stepper variables, Einarsen said “count,” not counter.  He may be thinking of something like this.

begin-select
sex
  if &sex = 'm'
    add 1 #count_male
  else
    add 1 to #count_female
  end-if
 from ps_personal_data
end-select

Non-negative integers are a predictable, ordered succession of values.  We can use them to control program flow or to produce information (How many employees are female?).  We can also create our own set of values.

let $states = 'CA TX NY'
let #state_position = 1
while #state_position < length($states)
  let $state = substr($states, #state_position, 2)
  do print_state_report($state)
  add 3 to #state_position
end-while

One Way Flag Variables

This category is very specific; only flags that only have two values and they only change from their first value to their second value once.  It seems like the Boolean value returned by a “where exists” or “where not exists” clause in SQL.

I’ve used flags to store the results of complex Boolean statements, so I could use the results all over the program rather than recalculate them repeatedly.  Those flags took new values with every row of data I read.  I could call them most-recent variables, but is that the best way to understand them?

I’ve used flags to express user choices from multiple options.  I wrote a general ledger interface that could export a payroll or a set of journal entry corrections or yearend accruals.  My flag, $export_mode, took the values ‘payroll’, ‘corrections’, and ‘yearend’ to route program flow where my algorithm differed for the different cases.  I could call them fixed value, but aren’t they different from a variable like #number_of_states (which has been set to 50 for the USA since 1959)?

Containers

Other languages have memory allocation and pointers, which can be used to form linked lists, trees, and other exotic data objects.  Some versions of SQR have expandable arrays.  Load-lookup tables are like read-only hash arrays.  I simulate a dynamic data structure by treating a string variable as an array.

Strings can grow and shrink in size up to almost 32,768 characters.  The substr() function can read a small string from an “array” of fixed length strings by position.  Here is a way to use string “arrays” to map letters to digits on a telephone keypad.

let #digit = substr('22233344455566677778889999',
              instr('ABCDEFGHIJKLMNOPQRSTUVWXYZ', $letter, 1), 1)

Alternately, we can delimit the items in the “array” with a special character and read them sequentially by repeated use of the instr() function.

let $fruits = 'apple@banana@cherry@grape@honeydew@lemon@nectarine@orange@pineapple@'
let #start_position = 1
while {true}
  let #end_position = instr($fruits, '@', #start_position)
  if #end_position = 0
    break
  end-if
  let $fruit = substr($fruits, #start_position, #end_position - #start_position)
  let #start_position = #end_position + 1
  print $fruit (+1,1)
end-while

In this example, some variables take multiple roles.

  • $fruits is a container
  • #start_position is a most-recent variable, a follower variable, and a walker
  • #end_position is a most-recent variable, a flag (according to my expanded definition because it controls a loop exit), and a walker
  • $fruit is a stepper (controlled by most-recent / walker variables)

Alternate Variable Roles

After writing this far, I realize that Sajaniemi’s roles are based on the history (within a single execution of a program) of setting a variable.

  • Fixed values are set only once.
  • One-way flags are set only once or twice.
  • Followers are set to every value of a most-recent variable.
  • Most-wanted are set to certain qualifying values of a most-recent variable.
  • A gatherer is not set to a most-recent variable, it is set to a formula that includes itself and the most-recent variable.

Is this a useful way to think about variables?  I believe that every system of classification has some insights to offer.  Another approach would be to define roles based on the history of using a variable.  This may or may not work as well.  If we end up with 117 categories and the suspicion that there are many more, what have we learned?  I can think of a few categories right away.  What else is there?

  • to calculate a value: let #profit = #revenue – #expenses
  • to make a decision: if #age > 24
  • to select data: #index in let $name = emp.name(#index)
  • to communicate: print $report_title (1,1)
  • to be an adverb (parameter for a command): print ‘Employee Report’  (1, #column)

What’s The Moral?

Is it me, is it SQR, or is it the truth about roles?  As my computer science professor, William Shakespeare, said, it seems that all the program’s a stage, and all variables merely players, and one variable in its time plays many parts.

Is the concept of variable roles normative (i.e. an indicator of good practice) or descriptive?  I wonder if we should try to restrict each variable to a specific role or if we should use the concept of roles to standardize our detailed descriptions of algorithms.

If variables take different roles, under my classification by usage, should we encourage restraint?  If a flag variable is set to zero or one to denote false or true, we can use it for program control:

if #file_not_found
  show 'Cannot open file ' $filename
  stop
end-if

But should we use it for arithmetic?

let #unknown_depts = #unknown_depts + #dept_not_found