Manipulating arrays with PostgreSQL

John Carrot — Sat, 12 Oct 2019 08:44:18 +0000

Before we move on to manipulating array values stored in the database, it is useful to first learn the basics.

Representing a list

SELECT '{a,b,c}'::text[];

   text 
----------
 {a,b,c}
(1 row)

We have to specify the type. Else postgres would not know the type since it looks like a string.

Or we could also use the ARRAY[] style

# SELECT ARRAY['a','b','c'];

 array 
-------
 {a,b,c}

`SELECT`ing `FROM` an array

So why doesn’t the following work?

# SELECT * FROM ARRAY['a','b','c']

ERROR:  42601: syntax error at or near "ARRAY"
LINE 1: SELECT * FROM ARRAY['a','b','c'];
                      ^
LOCATION:  scanner_yyerror, scan.l:1134

SELECT FROM expects that the data source is:

a set of rows
OR points to a set of word (table)

ARRAY[] returns an atomic value. Which is why it works with just SELECT but fails when FROM is used.

Convert arrays into rows using `UNNEST`

# SELECT UNNEST(ARRAY['a','b']);

 unnest 
--------
 a
 b
(2 rows)

Convert rows into arrays using `ARRAY()`

Watch out. We are not using ARRAY[]. We are using array().

SELECT array(
  SELECT UNNEST(ARRAY['a','b'])
);

 array 
-------
 {a,b}
(1 row)

To summarize:

UNNEST converts array into a series of rows.

array() converts a bunch of rows into an array.

ARRAY[] used to represent/define an array.

Concatenate arrays using `array_cat()`

# SELECT array_cat('{a,b}'::text[], '{b,c}'::text[]);

 array_cat 
-----------
 {a,b,b,c}
(1 row)

We coerced the argument into text arrays to indicate to array_cat what data types it was dealing with. If we used the ARRAY[] initializer, this would not be required. Example below.

SELECT array_cat(ARRAY['a','b'], ARRAY['b', 'c']);

Notice that the final array contains b twice. We would love to remove duplicates. There are two ways to do it and we'll explore both below.

Removing duplicates from an array using `DISTINCT`

This method works with integers, text, float, etc

DISTINCT helps remove duplicates from all data types. But it requires rows as input. Not arrays.

So first convert an array to rows using UNNEST
Pass it to DISTINCT
And then if you still require an array as output, use array()

SELECT DISTINCT UNNEST(
  '{apple, banana, cat, apple}'::text[]
);

 unnest 
--------
 banana
 cat
 apple
(3 rows)

SELECT DISTINCT UNNEST(
  '{1.245, 1.3, 1.5, 1.2}'::float[]
) ORDER BY 1;

And we have columns. But we need an array back as output. So convert using array()

SELECT array(
  SELECT DISTINCT UNNEST(
    '{apple, banana, cat, apple}'::text[]
  )
);

       array        
--------------------
 {banana,cat,apple}
(1 row)

What if we want it sorted? Use ORDER BY 1

SELECT array(
  SELECT DISTINCT UNNEST(
    '{apple, banana, cat, apple}'::text[]
  ) ORDER BY 1
);

ORDER BY 1 would sort by the first column (We could starting with 1 here). In this case we are working with on-the-fly arrays, without giving them column names. So we use column number references.

Want to try this with a float array? Let's do it.

SELECT array(
  SELECT DISTINCT UNNEST(
    '{1.2, 1.245, 2.3, 1.245, 1.246}'::text[]
  )
);

         array         
-----------------------
 {1.2,1.246,2.3,1.245}
(1 row)

Removing duplicates from integer arrays using `sort` and `uniq`

Caveat: This method only works for integer arrays (yeah so no floats too).

sort and uniq are functions available in the intarray extension. Enable the extension in your database by running the following statement.

CREATE EXTENSION IF NOT EXISTS intarray;

Try using the uniq function.

SELECT uniq(
  '{1,2,1,3}'::int[]
);

   uniq    
-----------
 {1,2,1,3}
(1 row)

Oops. Does not work. uniq only works on a sorted list. So use the sort function to sort it first.

SELECT uniq(
  sort(
    '{1,2,1,3}'::int[]
  )
);

  uniq   
---------
 {1,2,3}
(1 row)

I haven't yet benchmarked in detail which method works the fastest for sorting integers. I used both methods as a part of another benchmarking to update 100k rows (coming soon). For now, all I can say is, both aren't slow. I'll try a more detailed benchmarking another day.

What else?

These are essentials that I find useful to know. There are a lot more array functions that Postgres supports. Go check it out.

I'll be sharing more from my notes on PostgreSQL. Subscribe to get updates.

Generate dummy data with Postgres

John Carrot — Sat, 12 Oct 2019 07:21:21 +0000

Generate 5 records

# SELECT generate_series(1, 5);

 generate_series 
-----------------
               1
               2
               3
               4
               5
(5 rows)

Generate in increments of 5

Start with 1, and generates a series by adding 5 to the previous number.

# SELECT * FROM generate_series(1,25,5)

 generate_series 
-----------------
               1
               6
              11
              16
              21
(5 rows)

Generate a series with date

RedShift, as of OCT-2019 is based on Postgres 8.0. Passing timestamps as intervals is not supported with Redshift. So datetime ranges will not work on Redshift.

SELECT generate_series(
  '2019-10-1',
  '2019-10-05',
  '1 day'::interval
);

      generate_series      
---------------------------
 2019-10-01 00:00:00+05:30
 2019-10-02 00:00:00+05:30
 2019-10-03 00:00:00+05:30
 2019-10-04 00:00:00+05:30
 2019-10-05 00:00:00+05:30
(5 rows)

The third argument for step interval has to be passed, without which Postgres would not know how much to increment the step for date ranges. We got a list of timestamps. Let us coerce that into a date type instead.

SELECT generate_series(
  '2019-10-1',
  '2019-10-05',
  '1 day'::interval
)::date;

 generate_series 
-----------------
 2019-10-01
 2019-10-02
 2019-10-03
 2019-10-04
 2019-10-05
(5 rows)

We could also do weekly intervals instead of days.

SELECT generate_series(
  '2019-12-1',
  '2019-12-31',
  '1 week'::interval
);

      generate_series      
---------------------------
 2019-12-01 00:00:00+05:30
 2019-12-08 00:00:00+05:30
 2019-12-15 00:00:00+05:30
 2019-12-22 00:00:00+05:30
 2019-12-29 00:00:00+05:30
(5 rows)

Try playing with other intervals - 5 weeks, 10 days, 3 months, etc.

Want more columns?

# SELECT generate_series(1, 5), 'hello', '123';

 generate_series | ?column? | ?column? 
-----------------+----------+----------
               1 | hello    | 123
               2 | hello    | 123
               3 | hello    | 123
               4 | hello    | 123
               5 | hello    | 123
(5 rows)

Do more?

SELECT
  number,
  concat('hello', number)
FROM
  generate_series(1, 5) AS number


 number | concat 
--------+--------
      1 | hello1
      2 | hello2
      3 | hello3
      4 | hello4
      5 | hello5
(5 rows)

Forem: John Carrot