The Data Studio

Hive: Set Operators Missing

The set operators are:

These are very useful in many kinds of queries. I use them extensively in Detecting Changed Data and in Data Quality queries.

In all the implementations I know, the set operatos are particularly efficient, being consistently faster than joins (which perform a similar, but importantly different, function).

Hive does provide a "union" operator, but it does not provide the other set level operators: intersect and except (or minus).

This omission makes it significantly more difficult to write certain queries, and therefore increases development costs.