Home > Articles > Using DISTINCT ON in Postgres

Using DISTINCT ON in Postgres

Every once in a while, I'll have a need to do a one-to-many join, but keep only a certain row in the "many" table. For instance, say we have a system to track inventory in retail stores:

CREATE TABLE store (
    id SERIAL PRIMARY KEY,
    name TEXT
);

CREATE TABLE inventory (
    store_id INTEGER REFERENCES store,
    quantity INTEGER,
    item_name TEXT,
    PRIMARY KEY (store_id, item_name)
);

INSERT INTO store(name) VALUES
    ('School Supplies R Us'),
    ('Grocery Mart');

INSERT INTO inventory VALUES
    (1, 1, 'Backpack'),
    (1, 12, 'Pencil'),
    (1, 4, 'Pen'),
    (2, 12, 'Egg'),
    (2, 1, 'Flour (lb.)');

We can get the inventory for all stores easily enough.

SELECT name, quantity, item_name
FROM inventory
JOIN store ON inventory.store_id = store.id;
         name         | quantity |  item_name
----------------------+----------+-------------
 School Supplies R Us |        1 | Backpack
 School Supplies R Us |       12 | Pencil
 School Supplies R Us |        4 | Pen
 Grocery Mart         |       12 | Egg
 Grocery Mart         |        1 | Flour (lb.)

But what if we only want to get the item with highest quantity from each store? Fortunately, Postgres has a syntax that makes this easy.

SELECT DISTINCT ON(store_id) name, quantity, item_name
FROM inventory
JOIN store ON inventory.store_id = store.id
ORDER BY store_id, quantity DESC;
         name         | quantity | item_name
----------------------+----------+-----------
 School Supplies R Us |       12 | Pencil
 Grocery Mart         |       12 | Egg

What does DISTINCT ON do? Well, it selects the first row out of the set of rows whose values match for the given columns. The first row is arbitrary unless we pass along an ORDER BY statement. Note that we have to include the columns from the ON() clause in our ORDER BY. If we don't, we get a helpful error message:

SELECT DISTINCT ON (store_id) name, quantity, item_name
FROM inventory
JOIN store ON inventory.store_id = store.id
ORDER BY quantity DESC;
ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON (store_id) name, quantity, item_name
                            ^

If you run into a situation wherein you need to choose a specific row in a group based on some rules, try using DISTINCT ON. For more detail, check out the Postgres documentation.