SQL for Breaks in a Sequence

In a previous post, Views You Can Use: Ordinal Numbers, we discussed how to acknowledge children who visited Pee Wee Bees most often.  The little boy who had been here the most (88 times to date) should be in first place.  The little girl who has visited 83 times should be in second place.  It’s easy to count each child’s number of visits; harder to assign ordinal numbers (first, second, third) and recognize ties (two siblings have been here 79 times).

I also have a “Hot Streaks” list that shows the children who have visited Pee Wee Bees at least once per calendar month for at least the past three calendar months.  This was a more challenging query.  A child who visits once in January, once in February, and once in each following month throughout 2012, would have a streak of twelve consecutive months.  A child who visited ten times in September, ten times in October, not once in November, and ten times in December would have a “streak” of one month.

Here is the SQL for the end of February report.  It’s a little more complicated than a report for March through December because those months are at the end of a three month period that is within a single year.

We start by creating a view that combines each child’s multiple visits to one row per month, by grouping  by the child and month.  The NBR_VISITS column shows the number of visits per month.  That’s an interesting number but not relevant to this task.

create view PWB_VISITS_PER_MONTH
as (select CHILD_FIRST_NAME,
           CHILD_LAST_NAME,
           year(VISIT_DATE) as VISIT_YEAR,
           month(VISIT_DATE) as VISIT_MONTH,
           count(*) as NBR_VISITS
      from PWB_WAIVERS
     group by CHILD_FIRST_NAME, CHILD_LAST_NAME, year(VISIT_DATE), month(VISIT_DATE))

This SQL uses SQL Server built-in functions “month” and “year.”  These functions return the number of the month (1 – 12) and the four digit number of the year (e.g. 2012) from a date.

The selection starts by requiring that the child had to have visited in each of the previous three months, the minimum length of a streak I chose to recognize.

select *
  from PWB_VISITS_PER_MONTH A
 where 3 = (select count(*) from PWB_VISITS_PER_MONTH
             where CHILD_FIRST_NAME = A.CHILD_FIRST_NAME
               and CHILD_LAST_NAME = A.CHILD_LAST_NAME
               and ((VISIT_YEAR = 2012 AND VISIT_MONTH = 12) or (VISIT_YEAR = 2013 AND VISIT_MONTH in (1,2)))

It is unusual but legal to have a constant on the left side of the equation, while not every SQL engine will allow a sub-selection on the left side.  The sub-selection counts the number of rows from the PWB_VISITS_PER_MONTH view that occur in the last three months.  The “group by” in that view ensures that each child will have either zero or one row per month.  Requiring the count(*) to equal 3 ensures that the children have visited the store at least once in each of the past three months.  Note that VISIT_MONTH must be January or February, and ignore any March visits, even if the query runs in March.

  and (12 * VISIT_YEAR + VISIT_MONTH) = (select max(12 * VISIT_YEAR + VISIT_MONTH)
                                           from PWB_VISITS_PER_MONTH B
                                          where CHILD_FIRST_NAME = A.CHILD_FIRST_NAME
                                            and CHILD_LAST_NAME = A.CHILD_LAST_NAME
                                            and not exists (select 1 from PWB_VISITS_PER_MONTH
                                                             where CHILD_FIRST_NAME = B.CHILD_FIRST_NAME
                                                               and CHILD_LAST_NAME = B.CHILD_LAST_NAME
                                                               and 12 * VISIT_YEAR + VISIT_MONTH = 12 * B.VISIT_YEAR + B.VISIT_MONTH - 1))

Each qualified child will have a PWB_VISITS_PER_MONTH row for each month of the streak.  It’s easiest to inspect the result set with only one row per child and it’s most useful for that row to represent the first month of the streak.  That indicates when the current streak began, which is what we wish to recognize; not the first month the child ever visited Pee Wee Bees nor the first month of a previous streak.

The first month of any streak is the month with a visit that is one month after a month without a visit.  The first month of the current (or latest) streak is the maximum month with a visit that is one month after a month without a visit.

To keep months in sequence and allow for January to follow December, a month must be represented as a number that reflects the month and year.  The simplest formula for that is 12 times the year (since there are 12 months in a year) plus the month.  January 2000 is 24001, January 2001 is 24013, and June 2012 is 24150.

A child has a streak that starts in June 2012 if that child has:

  1. No PWB_VISITS_PER_MONTH row for 24149.
  2. A PWB_VISITS_PER_MONTH row for 24150.
  3. No case of a two month sequence of {no row, then a row} after 24150.
  4. Rows exist for 24156 (December 2012), 24157 (January 2013), and 24158 (February 2013).

The complete SQL statement is as follows.

select *
  from PWB_VISITS_PER_MONTH A
 where 3 = (select count(*) from PWB_VISITS_PER_MONTH
             where CHILD_FIRST_NAME = A.CHILD_FIRST_NAME
               and CHILD_LAST_NAME = A.CHILD_LAST_NAME
               and ((VISIT_YEAR = 2012 AND VISIT_MONTH = 12) or (VISIT_YEAR = 2013 AND VISIT_MONTH in (1,2)))
  and (12 * VISIT_YEAR + VISIT_MONTH) = (select max(12 * VISIT_YEAR + VISIT_MONTH)
                                           from PWB_VISITS_PER_MONTH B
                                          where CHILD_FIRST_NAME = A.CHILD_FIRST_NAME
                                            and CHILD_LAST_NAME = A.CHILD_LAST_NAME
                                            and not exists (select 1 from PWB_VISITS_PER_MONTH
                                                             where CHILD_FIRST_NAME = B.CHILD_FIRST_NAME
                                                               and CHILD_LAST_NAME = B.CHILD_LAST_NAME
                                                               and 12 * VISIT_YEAR + VISIT_MONTH = 12 * B.VISIT_YEAR + B.VISIT_MONTH - 1))

It is sorted to show the children with the longest streaks first.

Comments are closed.