The adventure begins – finally!  I chose “Load-Lookup and Lookup” as a topic to discuss the wonderful things we can do with these commands.  I realized that not everyone knows everything about these commands, so I’ve written three posts to lay the groundwork for this post. Now, what can we do with load-lookup?

1. Avoid Making The Same Single-Row Query Over And Over.

A report of employees might feature their job titles and department names.  In Peoplesoft Human Resources, the PS_JOB table has their JOBCODE and DEPTID.  One approach is to select a row from PS_JOBCODE_TBL and a row from PS_DEPT_TBL, and print the DESCR field from each table.  If there are five thousand employees, we have to read each table five thousand time and almost definitely read the same rows over and over.  Two load-lookup commands and ten thousand lookup commands execute much faster.

2. Simplify And Expedite Multi-Table Queries

Rather than calling a procedure to select from PS_JOBCODE_TBL and PS_DEPT_TBL, we could join them to PS_JOB when the latter is selected.  That way, the database uses SETID_JOBCODE and JOBCODE from PS_JOB to join with PS_JOBCODE_TBL and retrieve the appropriate DESCR value.  That is faster than five thousand selects of PS_JOBCODE_TBL but slower than not using PS_JOBCODE_TBL at all.

There are risks to this approach.  If there is no matching row in PS_JOBCODE_TBL, the join will discard the corresponding PS_JOB row.  If the where clause is too complex (too many tables or too many conditions), the database’s query optimizer will abandon the indexes and slowly inspect every row of the tables.  Both risks are eliminated when we use the load-lookup command.

3. Map A Key Field To An Array

Arrays are much more useful when it’s easy to find the data we’ve put in them.  Load-lookup can replace a binary search by mapping a meaningful value (EMPLID, ACCT_CD, CUSTOMER_NBR) to an array index.  Here’s an example.

load-lookup
  name=emp_map
  table='(select EMPLID, ROWNUM from PS_PERSONAL_DATA order by EMPLID)'
  key=EMPLID
  return_value=ROWNUM
  rows={num_emps}

ROWNUM is an Oracle DBMS keyword that returns a series of natural numbers (1, 2, 3, etc.).  The syntax would be different for other DBMS.  Note that PS_PERSONAL_DATA is a table in Peoplesoft Human Resources that contains exactly one row for anybody who was ever an employee.  If we don’t need former employees, we could use PS_EMPLOYEES.  Here’s how we use the lookup object.

begin-select
A.EMPLID
A.NAME
  lookup emp_map &A.EMPLID $emp_num
  if $emp_num <> ''
    move $emp_num to #emp_num
    let employee.name(#emp_num) = &A.NAME
  end-if
 from PS_NAMES A

end-select

4. Map Multiple Key Fields To An Array In Order

It’s best to check the results of the SQL that defines the table in the load-lookup.  The ROWNUM may not be in order.  The order of the array doesn’t matter if we always use lookup to access it.  But if we want to read every element of the array in order of the key, the ROWNUM should be in order.

Here’s an example that combines two operations, retrieving an array index and the employee’s name.

load-lookup
  name=emp_to_name
  table='(select EMPLID, NAME, ROWNUM from (select EMPLID, NAME from PS_PERSONAL_DATA order by NAME))'
  key=EMPLID
  return_value='ROWNUM || ''@'' || NAME'
  rows={num_emps}

begin-select
J.EMPLID
  lookup emp_to_name &J.EMPLID $emp_data
  if $emp_data <> ''
    unstring $emp_data by '@' into $emp_num $name
    move $emp_num to #emp_num
    let employee.name(#emp_num) = $name
  end-if
 from PS_JOB J

end-select

Here’s an example of using a lookup object as an array.

load-lookup
  name=num_to_emp
  table='(select EMPLID, NAME, ROWNUM from (select EMPLID, NAME from PS_PERSONAL_DATA order by NAME))'
  key=ROWNUM
  return_value='EMPLID || ''@'' || NAME'
  rows={num_emps}

move 1 to #emp_num
while #emp_num <= {num_emps}
  move #emp_num to $emp_num edit 999999
  lookup num_to_emp $emp_num $personal_data
  unstring $personal_data by '@' into $emplid $name
  let $msg = 'Employee #' || $emp_num || ' has ID ' || $emplid || ' and is named ' || $name
  print $msg (+1,1)
  add 1 to #emp_num
end-while

5. Step Through Two Tables In Parallel

It’s easy to read a table in SQR, one row at a time.  It’s easy to start reading a table, and at some point in the middle, call a procedure to read all of another table.  Is there any way to read the first row of table #1, then the first row of table #2, then the second row of table #1, then the second row of table #2, and so on?

Although SQR can open many table cursors at once, it has to close the latest one before reading more from a previous one.  That means we can be reading rows from one cursor, but any reads from another cursor must be completely in memory by then.

One approach is to define the in-memory “table” with create-array and load it with begin-select.  Alternately, define and load the in-memory “table” with load-lookup.  The first method is more flexible; arrays can be two-dimensional, they support various data types, the data can be altered later, and the data is accessible through several commands.  The second method is easier for simple cases.

The Breakup

I’ve written four love letters to load-lookup.  We’ve come a long way and frankly, my infatuation has flagged.  Maybe I loved not wisely but too well.  (Don’t worry, I won’t go Othello on you.)  Sure, it’s a useful command, but it could be better.

Lookup always takes and returns strings, even if the database data was dates or numbers.  That means we have to convert the key before the lookup and convert the return value after the lookup.

When lookup doesn’t find a key value, it returns null.  When it does find a key value, and the return value is null, it returns null.  This isn’t a common problem in Peoplesoft environment because character columns are always defined as “non-null,” but what about date columns?  Perhaps lookup should have a status flag like file I/O commands (open, read, and write).

We can only retrieve the return values if we know the key values exactly, and an “in-between” key returns no information at all.  That means we need to construct our own sorted array and binary/linear search to work with Peoplesoft effective dates.

The key values are sorted, and that could be a useful resource in itself, but there’s no way to read the entire lookup object in order.

When load-lookup executes, it reports the number of rows it read and the number of duplicate keys in the log file, but it doesn’t tell us what those duplicate keys were.  Nor is any of that information available to the SQR program.

Many of the load-lookup parameters require literal values.  That’s reasonable when load-lookup is in the begin-setup block and is executed during compilation.  It seems like an unnecessary restriction when the compiler isn’t going to validate the SQL anyway.  If the table, key, or return_value parameters are complex, it would be more readable to assemble the strings in a multi-line let statement rather than having them extend one hundred spaces past the right side of our editors’ windows.  Also, it would be enabling to allow variables and expressions in those values.  SQR allows this flexibility with the where parameter, why not elsewhere?

Wouldn’t it be great if the encapsulated sorting and searching features of load-lookup were available for any data, not just for data from external databases and datasources?  Suppose there were a load-lookup mode that took two internal variables or literals, as key and return_value, and added them to a b-tree.  Then lookup could read that b-tree.  We’d have associative arrays!

Request For Information

Do you have some other clever uses for load-lookup?  Do you have some other ideas for improving load-lookup?  Please share them with us. (Also, let me know if you find the bug in my sample code.)

Looking Ahead

Next week, we’ll introduce a new category of topics, Programming Practices, with my list of the worst variable names.  That would seem to be a universal topic, but there are some bad variable names that are unique to SQR.

Brain Teaser

Meanwhile, here is a brain teaser.  Please post the solution as a comment.

We are producing a complicated report with breakpoints and subtotals.  Every time we finish processing a subset of the data we have to initialize ten variables.  Some variables are numeric, some are string.  Most of the numeric variables should be reset to zero, but some have other, arbitrary values.  Similarly, most of the string variable should be reset to null, but some have other, arbitrary values.  Do it in one command.