# Linear interpolation in Postgres using `generate_series`

I like to keep track of how many miles I'm driving in my car. One conceivable way of doing this is to create a table in a Postgres database in which I can track this information.

```CREATE TABLE mileage (
observed_date DATE,
observed_mileage INTEGER
);
```

Unfortunately, I'm not always the most regular data collector. I often collect this data with gaps of days or months between each reading.

```INSERT INTO mileage(observed_date, observed_mileage) VALUES
('2018-05-21', 84088),
('2018-05-26', 84201),
('2018-06-13', 84910);
```

I want to get some sense for how much I'm driving each day, and one reasonable way I might do that is to linearly interpolate the mileage between readings. For instance, if I see a reading of 10,000 on August 1st and a reading of 11,000 on August 10th, I want to see that on average I drove 100 miles each day 1-10 August.

How can we do this in Postgres? First, we pair the data up:

```SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
, LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
, observed_date AS obs_date
, observed_mileage AS obs_mi
FROM mileage;
```

which yields result:

``````  lag_date  | lag_mi |  obs_date  | obs_mi
------------+--------+------------+--------
|        | 2018-05-21 |  84088
2018-05-21 |  84088 | 2018-05-26 |  84201
2018-05-26 |  84201 | 2018-06-13 |  84910
``````

Then, we generate a series between each pair of dates:

```WITH paired_dates AS (
SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
, LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
, observed_date AS obs_date
, observed_mileage AS obs_mi
FROM mileage
)
SELECT *
FROM paired_dates
, generate_series(lag_date, obs_date, INTERVAL '1 day') days(driven_date)
LIMIT 10;
```

which yields result:

``````  lag_date  | lag_mi |  obs_date  | obs_mi |      driven_date
------------+--------+------------+--------+------------------------
2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-21 00:00:00+00
2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-22 00:00:00+00
2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-23 00:00:00+00
2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-24 00:00:00+00
2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-25 00:00:00+00
2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-26 00:00:00+00
2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-26 00:00:00+00
2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-27 00:00:00+00
2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-28 00:00:00+00
2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-29 00:00:00+00
``````

Note that `2018-05-26` occurs twice in the `driven_date` column. We can fix that by stopping our series just before getting to the later date:

```WITH paired_dates AS (
SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
, LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
, observed_date AS obs_date
, observed_mileage AS obs_mi
FROM mileage
)
SELECT *
FROM paired_dates
, generate_series(lag_date, obs_date - INTERVAL '1 minute', INTERVAL '1 day') days(driven_date)
LIMIT 10;
```

Anyway, now we need to calculate the actual number of miles driven on the `driven_date`.

```WITH paired_dates AS (
SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
, LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
, observed_date AS obs_date
, observed_mileage AS obs_mi
FROM mileage
)
SELECT driven_date
, (obs_mi - lag_mi)::NUMERIC / (obs_date - lag_date) AS miles_driven
FROM paired_dates
, generate_series(lag_date, obs_date - INTERVAL '1 minute', INTERVAL '1 day') days(driven_date)
LIMIT 10;
```

which yields result:

``````     driven_date     |    miles_driven
---------------------+---------------------
2018-05-21 00:00:00 | 22.6000000000000000
2018-05-22 00:00:00 | 22.6000000000000000
2018-05-23 00:00:00 | 22.6000000000000000
2018-05-24 00:00:00 | 22.6000000000000000
2018-05-25 00:00:00 | 22.6000000000000000
2018-05-26 00:00:00 | 39.3888888888888889
2018-05-27 00:00:00 | 39.3888888888888889
2018-05-28 00:00:00 | 39.3888888888888889
2018-05-29 00:00:00 | 39.3888888888888889
2018-05-30 00:00:00 | 39.3888888888888889
``````

And thus I have achieved the desired result.