Parameters
Load-lookup takes four to nine parameters. The syntax looks like this.
load-lookup
name=
table=
key=
return_value=
rows=
extent=
where=
sort=
quiet
The parameters may be in any order. If a parameter appears more than once, its last value is used. There can be spaces and line breaks before and after the equals signs. More than one parameter can be on a line.
The SQL statement formed from the table, key, return_value, and where parameters will not be validated during program compilation. It will be passed to the database management system for validation. Load-lookup does not evaluate dynamic variables.
Name is required. It identifies the lookup object for later reference. It is a literal value with no quote marks.
Table is required. It is the text that follows the keyword “from” in the SQL statement that load-lookup constructs. There are four forms for this parameter.
1. A single database table name, with or without quote marks.
table=PS_DEPT_TBL
OR
table='PS_DEPT_TBL'
2. A single database table name, one or more spaces, and an alias, enclosed in quote marks.
table='PS_DEPT_TBL D'
3. Multiple database table names, with or without aliases, separated by commas, enclosed in quote marks.
table='PS_EMPLOYEES E, PS_DEPT_TBL D'
4. An SQL select statement enclosed in parentheses, enclosed in quote marks.
table='(select EMPLID, ROWNUM from PS_PERSONAL_DATA order by EMPLID)'
Key is required. It will be the first column or expression that follows the keyword “select” in the SQL statement that load-lookup constructs. It must be either a character string, a date, or an integer no longer than 12 digits. There are two forms for this parameter.
1. A single database column name, with or without alias, with or without quote marks.
key=DEPTID
OR
key=D.DEPTID
OR
key='DEPTID'
OR
key='D.DEPTID'
2. Any valid SQL expression that combines one or more database columns, with quote marks. An expression can be the way to use non-integer numbers or lengthy integers.
key='SETID || DEPTID'
OR
key='SETID || ''-'' || DEPTID'
OR
key='to_char(NET_PAY)'
Return_value is required. It will be the other column or expression that follows the keyword “select” in the SQL statement that load-lookup constructs. There are two forms for this parameter.
1. A single database column name, with or without alias, with or without quote marks.
key=DESCR
OR
key=D.DESCR
OR
key='DESCR'
OR
key='D.DESCR'
2. Any valid SQL expression that combines one or more database columns, with quote marks.
key='upper(DESCR)'
OR
key='DESCR || ''-'' || LOCATION'
Rows is optional. (Grammatical note: “rows” has the letter “s” at the end, but it is the name of one parameter. Therefore, I use the singular version of the verb “to be.”) It is an estimate of the number of rows that the SQL statement will return, to guide load-lookup in allocating memory. If the rows value is not specified, load-lookup will start with an allocation for 100 rows. If load-lookup appears in the setup section, rows must be a literal number without quote marks. Anywhere else, rows may be a literal number, a numeric variable, or a numeric database column.
Extent is optional. If load-lookup receives more rows from the SQL statement than were specified in the rows parameter, it will allocate more memory according to the number of rows specified in the extent parameter. You would think that extent couldn’t be used unless rows is used and that extent should be less than rows, but neither condition is required. If the extent value is not specified, load-lookup will set it to 25% of the value for the rows parameter. If load-lookup appears in the setup section, extent must be a literal number without quote marks. Anywhere else, extent may be a literal number, a numeric variable, or a numeric database column.
Where is optional. If it is present, the SQL statement will have a where clause and the text of the where parameter will follow the keyword “where” in the SQL statement. If load-lookup appears in the setup section, where must be a literal string, on one line, enclosed in quote marks. Anywhere else, where can be a literal string, on one line, enclosed in quote marks, or a character string variable, or a character string database column. The value of where is limited to 255 characters.
Sort is optional. It can take one of four values without quote marks: DC means that the database sorts the key and is case sensitive (“A” and “a” are different). DI means that the database sorts the key and is case insensitive (“A” and “a” are the same). DI doesn’t work on every database. SC and SI mean that the keys are sorted by the SQR interpreter.
If load-lookup doesn’t have a sort parameter, SQR looks for the –LL parameter on the command line, which uses the same four codes. Lacking that, the situation turns muddy. The SQR Language Reference says “By default, SQR lets the database sort the data.” It also says, “The default is SC …”. In my experience, on MS Windows and Oracle, the default is SC.
Quiet is optional and does not have an equals sign or a value. If it is present, load-lookup executes without notifying the user. If it is absent, load-lookup displays one or two lines of text in the terminal (DOS) window while the program executes. That text also appears in the log file. The first line of text declares that the program is loading the lookup array and reports the number of rows in the array. The second line of text appears only if there are duplicate values of the key found by the SQL select statement and reports the number of duplicates.
Duplicate Keys
A load-lookup command can find duplicate keys even if the underlying table doesn’t have duplicates. For example, the PS_JOB table has a unique, multi-column key consisting of EMPLID, EMPL_RCD, EFFDT, and EFFSEQ. If load-lookup selects the entire table and the Key=EMPLID, there will be duplicate keys. We can avoid that in three ways:
1. Use the all the columns of the unique, multi-column key in the key parameter.
load-lookup
key='EMPLID || ''@'' || EMPL_RCD || ''@'' || EFFDT || ''@'' || EFFSEQ'
2. Write a where clause that returns a single row for the key parameter we want.
let $where = 'J.EMPL_RCD = (select max(EMPL_RCD) '
|| 'from PS_JOB '
|| 'where EMPLID = J.EMPLID) '
|| 'and J.EFFDT = (select max(EFFDT) '
|| 'from PS_JOB '
|| 'where EMPLID = J.EMPLID '
|| 'and EMPL_RCD = J.EMPL_RCD '
|| 'and EFFDT <= SYSDATE) '
|| 'and J.EFFSEQ = (select max(EFFSEQ) '
|| 'from PS_JOB '
|| 'where EMPLID = J.EMPLID '
|| 'and EMPL_RCD = J.EMPL_RCD '
|| 'and EFFDT = J.EFFDT) '
load-lookup
key=EMPLID
where=$where
3. Something in-between 1 and 2.
let $where = 'J.EMPL_RCD = (select max(EMPL_RCD) '
|| 'from PS_JOB '
|| 'where EMPLID = J.EMPLID) '
|| 'and J.EFFSEQ = (select max(EFFSEQ) '
|| 'from PS_JOB '
|| 'where EMPLID = J.EMPLID '
|| 'and EMPL_RCD = J.EMPL_RCD '
|| 'and EFFDT = J.EFFDT) '
load-lookup
key='EMPLID || ''@'' || EFFDT'
where=$where
Looking Ahead
Next week I’ll try to describe the load-loadup syntax for DDO datasources. Since I’ve never used a DDO datasource, this could be funny … or pathetic.
Brain Teaser
Meanwhile, here is a brain teaser. Please post the solution as a comment.
An SQR program is importing data about people who are already in the database. The Social Security numbers (SSN) of those people are in a table (PS_PERS_NID). The people are identified only by their SSN in the import file and some of the SSN have typos. Explain or provide code for identifying the people with bad SSN.
For the brain teaser, I went to Wikipedia and looked up “social security number”. It looks from there that there are not many opportunities for SSN validation. There’s no checksum, and any combination of digits is potentially valid. Other than checking for numeric-only characters and the 3-2-4 digit pattern, what can we do?
(I wonder what SSNs would look like if they had been invented today instead of in the 1930s.)
Social Security numbers (SSN) can start with almost any three digit combination except 666 – seriously. There are still some three digit prefixes that haven’t been used, but they’re going fast, so it is unwise to try to validate them. But the brain teaser is not about validating SSN; it is about fixing typos. There’s a basic assumption here, that we have a relatively small group of people (employees of one organization) with a random scattering of SSNs. The incoming SSNs are also not random; they should match the ones in our table. I’m also making an assumption about typos.
You know, the thing about SQL is, that there is virtually nothing that can replace it.
Does anyone know if a substitute exists for sql? I mean besides MS SQL and Oracle and all that jazz. Thanks.
There are object-oriented databases and multidimensional databases. They might not use SQL. There is Open Insight from Revelation Software, which I think of as a hierarchical database. It doesn’t use SQL. There may be other relational databases that don’t use SQL. SQR for DDO uses DDO instead of SQL. I’ve heard of all these things, but I’ve worked almost exclusively with Oracle (therefore SQL) for the past 11 years. Can anyone else comment?
I need a clarification regarding this brain teaser
When you say typo with the SSN in the file did you mean having some special characters, spaces in the SSN. Or is it some thing like the digits in the SSN being jumbled like 123-45-6789 123-54-6789
If the scenario is having special characters in the SSN then
Let $BLANK = ”
Let $SSN_NOSPLCHR = TRANSLATE($SSN,’./-,’,$BLANK)
Do Validate-SSN
Begin-Procedure Validate-SSN
Let $FOUND = ‘N’
Begin-SELECT
NATIONAL_ID
Let $FOUND = ‘Y’
FROM PS_PERS_NID
WHERE REPLACE(TRANSLATE(NATIONAL_ID,’./-,’,’ ‘),’ ‘,”) = $SSN_NOSPLCHR
End-SELECT
IF $FOUND = ‘N’
Show $SSN ‘ Does Not exist in the Database
End-IF
End-Procedure Validate-SSN
Part of the brain teaser is to think about what can go wrong in data entry. You’ve identified the same ones I did; a single wrong character (digit, letter, or special), transposed digits (54 instead of 45). The only other case I’d suggest is a missing digit; eight characters instead of nine. Also, I assumed that there would not be dashes.
Good start, but I should have told you that PS_PERS_NID stores the NATIONAL_ID without special characters. That would have simplified your where clause. Sorry.
I have a store proc that runs with no parameters, goes over a wire to a linked server outside the network and builds a local global temp table of all objects inside the linked server. This proc determines changes from the table, column, and type level compared to a local server and seems to be very quick in execution. However I started to get concerned when I viewed the execution plan. The largest cost was in a Hash Mash lookup (73%). I want the best execution possible on the server side. During server down time, a job will execute this proc and rebuild any changes in the schema dynamically, but for 3 seconds it consumes almost 50% of CPU resources on Vitrual server with a 2 Ghz Quad Processor and 3 Gigs of Ram allocated. This concerns me since the record count is about 2300+ rows and I still haven’t imported the data. Besides upgrading the server or modifying allocation, is there any tricks to balance the load? I have looked into multiple solutions and still face the same load issue.
This question is somewhat outside my expertise (stored procedures, database administration, server management), not specific enough (I can’t tune SQL I haven’t seen), and beyond my resources (I don’t have the systems to check it out). But, here goes. You mention that one virtual server (a simulated server that is a fraction of the actual hardware) worked 50% (yeah, that’s a lot) for 3 seconds during down time (do you mean while it’s not busy?). The first question is – is this really a problem unless it gets worse?
If it’s really a problem that deserves 100 hours of your time to fix, perhaps you should reconsider your algorithm. It sounds like you’re comparing the data dictionaries of two databases to keep one in sync with the adds and deletes of the other. Are you performing complete outer joins? That’s very thorough and a lot of extra work if you’re expecting a 99% match. Another approach would be for each database instance to run an SQR to sort the database objects and write them to disk files. Then another SQR program could read both files into arrays and compare the arrays row by row.
Okay, this is kind of a brute force approach, but:
I’m assuming PERS_NID is too big to comfortably load into an array, so we’ll do it with individual selects.
1) Create a temp table mirroring PERS_NID. Run through the file once looking at PERS_NID for exact matches. For each exact match, knock that row out of the temp table. This reduces the chance of a bad guess in the subsequent steps.
2) Create a local procedure – let’s call it Test-All – which takes two input parms: the SSN being tested and the suspect character position. This procedure tries the temp table for each digit 0 through 9 in that position until it finds a match (presumably by calling another procedure, not by using 10 distinct queries).
3) If the SSN being tested is short one character, do a loop to insert ‘x’ in each position where a character could have been missed and call Test-All specifying that position. (e.g. if you have 12345678, call it for x12345678 with suspect position as 1, 1×2345678 with suspect position as 2, and so on through 12345678x.)
4) Parse the SSN being tested for specials or alphas. If one is found, that’s your suspect character position for calling the Test-All procedure.
5) If no specials or alphas were found in step 2, run Test-All for each position 1 through 9 in sequence, looking for a single wrong digit.
6) If still no match, do a loop to parse the SSN under test and swap pairs of adjacent digits. For instance, if the test is 123456789, then the first loop iteration should be 213456789, the second should be 132456789, and so on. Test each against the temp table.
7) If still no match, do a similar loop to swap pairs of nonadjacent digits.
8) If still no match, the SSN has more than one problem. Print it on a report and send an email to the idiots who sent you the bad import file.
Now that I think about it, you could use all the above logic to build one giant dynamic $Where clause variable containing all the possible variations, instead of hitting the temp table multiple times.
Well done. You’ve addressed some error conditions I didn’t consider. My input data always had 9 digits.
I don’t think PERS_NID is too big for an array. We only need the EMPLID and the NATIONAL_ID. That’s less than 20 bytes per person, and most organizations will have fewer than 1 million rows; twenty megabytes on a server that probably has 100 gigabytes of RAM. You should rarely worry about running out of memory. However, SQL has a much better tool for fuzzy searches, the LIKE comparison, so we’ll use the database anyway.
My source for questionable SSN was a disk file, so I worked with one value at a time. I checked for an exact match first, since I expected to find it 99.9% of the time. For the “row not found” cases, I assumed that the most common errors were (1) one bad digit and (2) two digits swapped. Let’s suppose I was searching for an approximate match to 123-45-6789. I created a loop that generated a series of strings:
%23456789
1%3456789
…
12345678%
You can query for all of these in one SQL statement or in nine. Remember that there could be multiple matches to show the user.
The second round of queries can feature these strings:
213456789
132456789
124356789
These are harder to construct. The general form is like this:
let $test_string = substr($bad_ssn, 1, #n) || substr($bad_ssn, #n + 2, 1) || substr($bad_ssn, #n + 1, 1) || substr($bad_ssn, #n + 3, 7 – #n)
but it doesn’t work for the first, seventh, eighth, or ninth digit. It might be easier to use literals. If you copy and paste, edit very carefully.
let $s1 = substr($bad_ssn, 1, 1)
let $s2 = substr($bad_ssn, 2, 1)
…
let $s9 = substr($bad_ssn, 9, 1)
let $test1 = $s2 || $s1 || $s3 || $s4 || $s5 || $s6 || $s7 || $s8 || $s9
…
let $test8 = $s1 || $s2 || $s3 || $s4 || $s5 || $s6 || $s7 || $s9 || $s8