GINQ (Groovy-Integrated Query) lets you query in-memory collections using familiar SQL-like syntax. It also works with parsed XML, JSON, YAML, and other formats that produce collections.
The heart of GINQ is its query-like expressions, which can be very simple:
from n in [0, 1, 2]
select n
Or involve multiple clauses:
from s in sales
where s.amount > 50
groupby s.customer as customer into g
having g.count() > 1
orderby g.customer
select g.customer, g.count() as items, g.sum(s -> s.amount) as total
A GINQ expression can include the following clauses:
-
from— the data source (required). Accepts anyIterable,Stream, array, or GINQ result set. -
join— combines data from additional sources (join,leftjoin,rightjoin,fulljoin,crossjoin), matched using anoncondition. -
where— filters rows before grouping, ass.amount > 50does above. -
groupby— groups rows by one or more expressions. Theintoclause binds the group to a variable (likegabove) that supports aggregate methods such ascount(),sum(),min(),max(),avg(), andtoList(). Named keys (viaas) can be accessed as properties, e.g.g.customer. -
having— filters groups after aggregation, asg.count() > 1does above. -
orderby— sorts results in ascending (in asc, the default) or descending (in desc) order, withnullslast/nullsfirstcontrol. -
limit— restricts output to a given size, with optional offset for pagination. -
select— the projection (required). Defines the output columns, with optionalasaliases. -
union,unionall,intersect,minus— set operations that combine two completefrom…selectexpressions.
GINQ also supports window functions and nested subqueries — see Advanced Topics for details.
1. Basics
This section covers how to embed GINQ in your code and walks through each clause: data sources, projection, filtering, joining, grouping with aggregation, sorting, and pagination.
1.1. Integration
Other code in your scripts and classes may use the same names as the SQL-like syntax, such as from, select, where, etc. To avoid conflicts, GINQ expressions must be wrapped in a GQ block, which serves as a marker for the GINQ parser to recognize and transform the SQL-like syntax into the underlying fluent API calls.
A GINQ expression is wrapped in a GQ block and returns a lazy Queryable result:
def result = GQ {
/* GINQ SQL-LIKE EXPRESSION */
}
def stream = result.stream() // get the stream from GINQ result
def list = result.toList() // get the list from GINQ result
Since the result is a standard Queryable/Stream, it composes naturally with other Groovy and JDK library calls:
import java.util.stream.Collectors
def numbers = [0, 1, 2]
assert '0#1#2' == GQ {
from n in numbers
select n
}.stream()
.map(e -> String.valueOf(e))
.collect(Collectors.joining('#'))
Two other markers, GQL and @GQ, are also supported.
GQL is a shorthand that returns a List directly (equivalent to GQ {…}.toList()):
def list = GQL {
/* GINQ SQL-LIKE EXPRESSION */
}
GINQ can also be used as a method annotation — see <<`@GQ` Annotation>> in Advanced Topics:
@GQ
def queryMethod() {
/* GINQ SQL-LIKE EXPRESSION */
}
1.2. Data Source
The data source for GINQ could be specified by from clause, which is equivalent to SQL’s FROM.
Currently GINQ supports Iterable, Stream, array and GINQ result set as its data source:
1.2.1. Iterable Data Source
from n in [1, 2, 3] select n
1.2.2. Stream Data Source
from n in [1, 2, 3].stream() select n
1.2.3. Array Data Source
from n in new int[] {1, 2, 3} select n
1.2.4. GINQ Result Set Data Source
def vt = GQ {from m in [1, 2, 3] select m}
assert [1, 2, 3] == GQ {
from n in vt select n
}.toList()
1.3. Projection
The column names could be renamed with as clause:
def result = GQ {
from n in [1, 2, 3]
select Math.pow(n, 2) as powerOfN
}
assert [[1, 1], [4, 4], [9, 9]] == result.stream().map(r -> [r[0], r.powerOfN]).toList()
The renamed column could be referenced by its new name, e.g. r.powerOfN.
Also, it could be referenced by its index, e.g. r[0]
|
assert [[1, 1], [2, 4], [3, 9]] == GQ {
from v in (
from n in [1, 2, 3]
select n, Math.pow(n, 2) as powerOfN
)
select v.n, v.powerOfN
}.toList()
select P1, P2, …, Pn is a simplified syntax of select new NamedRecord(P1, P2, …, Pn) when and only when n >= 2.
Also, NamedRecord instance will be created if as clause is used.
The values stored in the NamedRecord could be referenced by their names.
|
Construct new objects as column values:
@groovy.transform.EqualsAndHashCode
class Person {
String name
Person(String name) {
this.name = name
}
}
def persons = [new Person('Daniel'), new Person('Paul'), new Person('Eric')]
assert persons == GQ {
from n in ['Daniel', 'Paul', 'Eric']
select new Person(n)
}.toList()
1.3.1. Distinct
distinct is equivalent to SQL’s DISTINCT
def result = GQ {
from n in [1, 2, 2, 3, 3, 3]
select distinct(n)
}
assert [1, 2, 3] == result.toList()
def result = GQ {
from n in [1, 2, 2, 3, 3, 3]
select distinct(n, n + 1)
}
assert [[1, 2], [2, 3], [3, 4]] == result.toList()
1.4. Filtering
where is equivalent to SQL’s WHERE
from n in [0, 1, 2, 3, 4, 5]
where n > 0 && n <= 3
select n * 2
1.4.1. In
from n in [0, 1, 2]
where n in [1, 2]
select n
from n in [0, 1, 2]
where n in (
from m in [1, 2]
select m
)
select n
import static groovy.lang.Tuple.tuple
assert [0, 1] == GQ {
from n in [0, 1, 2]
where tuple(n, n + 1) in (
from m in [1, 2]
select m - 1, m
)
select n
}.toList()
1.4.2. Not In
from n in [0, 1, 2]
where n !in [1, 2]
select n
from n in [0, 1, 2]
where n !in (
from m in [1, 2]
select m
)
select n
import static groovy.lang.Tuple.tuple
assert [2] == GQ {
from n in [0, 1, 2]
where tuple(n, n + 1) !in (
from m in [1, 2]
select m - 1, m
)
select n
}.toList()
1.4.3. Exists
from n in [1, 2, 3]
where (
from m in [2, 3]
where m == n
select m
).exists()
select n
1.4.4. Not Exists
from n in [1, 2, 3]
where !(
from m in [2, 3]
where m == n
select m
).exists()
select n
1.5. Joining
More data sources for GINQ could be specified by join clauses.
from n1 in [1, 2, 3]
join n2 in [1, 3] on n1 == n2
select n1, n2
join is preferred over innerjoin and innerhashjoin as it has better readability,
and it is smart enough to choose the correct concrete join(i.e. innerjoin or innerhashjoin) by its on clause.
|
from n1 in [1, 2, 3]
innerjoin n2 in [1, 3] on n1 == n2
select n1, n2
from n1 in [1, 2, 3]
leftjoin n2 in [2, 3, 4] on n1 == n2
select n1, n2
from n1 in [2, 3, 4]
rightjoin n2 in [1, 2, 3] on n1 == n2
select n1, n2
from n1 in [1, 2, 3]
fulljoin n2 in [2, 3, 4] on n1 == n2
select n1, n2
from n1 in [1, 2, 3]
crossjoin n2 in [3, 4, 5]
select n1, n2
hash join is especially efficient when data sources contain lots of objects
from n1 in [1, 2, 3]
innerhashjoin n2 in [1, 3] on n1 == n2
select n1, n2
from n1 in [1, 2, 3]
lefthashjoin n2 in [2, 3, 4] on n1 == n2
select n1, n2
from n1 in [2, 3, 4]
righthashjoin n2 in [1, 2, 3] on n1 == n2
select n1, n2
from n1 in [1, 2, 3]
fullhashjoin n2 in [2, 3, 4] on n1 == n2
select n1, n2
Only binary expressions(==, &&) are allowed in the on clause of hash join
|
1.6. Grouping
groupby is equivalent to SQL’s GROUP BY, and having is equivalent to SQL’s HAVING.
The into clause binds the grouped result to a named variable, enabling direct
method calls for key access and aggregates. The variable is a GroupResult which
extends Queryable, so all aggregate methods (count(), sum(), toList(), etc.)
are available as real method calls:
assert [[1, 2], [3, 2], [6, 3]] == GQ {
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n into g
select g.key, g.count()
}.toList()
The group variable supports all Queryable aggregate methods:
assert [[1, 2], [3, 6], [6, 18]] == GQ {
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n into g
select g.key, g.sum(n -> n)
}.toList()
For multi-key grouping, individual keys can be named with as and accessed
as properties or via subscript:
def result = GQ {
from e in employees
groupby e.dept as department, e.role as role into g
select g.department, g.role, g.count()
}.toList()
Subscript access (g['department']) and explicit g.get('department') are also supported.
having works with into — the group variable can be used directly in the condition:
assert [[6, 3]] == GQ {
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n into g
having g.count() > 2
select g.key, g.count()
}.toList()
The where clause after groupby…into is reserved for future use;
use having for now.
|
1.6.1. Aggregate Functions
GINQ provides some built-in aggregate functions:
| Function | Argument Type(s) | Return Type | Description |
|---|---|---|---|
count() |
java.lang.Long |
number of rows, similar to |
|
count(expression) |
any |
java.lang.Long |
number of rows for which the value of expression is not |
min(expression) |
java.lang.Comparable |
same as argument type |
minimum value of expression across all non-null values |
max(expression) |
java.lang.Comparable |
same as argument type |
maximum value of expression across all non-null values |
sum(expression) |
java.lang.Number |
java.math.BigDecimal |
sum of expression across all non-null values |
avg(expression) |
java.lang.Number |
java.math.BigDecimal |
the average (arithmetic mean) of all non-null values |
list(expression) |
any |
java.util.List |
the aggregated list of all non-null values |
median(expression) |
java.lang.Number |
java.math.BigDecimal |
value such that the number of non-null values above and below it is the same ("middle" value, not necessarily same as average or mean) |
stdev(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical standard deviation of all non-null values |
stdevp(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical standard deviation for the population for all non-null values |
var(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical variance of all non-null values |
varp(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical variance for the population for all non-null values |
agg(expression) |
any |
any |
customizes the aggregation logic in expression and returns single value |
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, count()
from s in ['a', 'b', 'cd', 'ef']
groupby s.size() as length
select length, min(s)
from s in ['a', 'b', 'cd', 'ef']
groupby s.size() as length
select length, max(s)
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, sum(n)
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, avg(n)
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, median(n)
assert [['A', ['APPLE', 'APRICOT']],
['B', ['BANANA']],
['C', ['CANTALOUPE']]] == GQL {
from fruit in ['Apple', 'Apricot', 'Banana', 'Cantaloupe']
groupby fruit[0] as firstChar
select firstChar, list(fruit.toUpperCase()) as fruit_list
}
def persons = [new Person('Linda', 100, 'Female'),
new Person('Daniel', 135, 'Male'),
new Person('David', 122, 'Male')]
assert [['Male', ['Daniel', 'David']], ['Female', ['Linda']]] == GQL {
from p in persons
groupby p.gender
select p.gender, list(p.name)
}
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, agg(_g.stream().map(r -> r.n).reduce(BigDecimal.ZERO, BigDecimal::add))
_g is an implicit variable for agg aggregate function,
it represents the grouped Queryable object and its record(e.g. r) could reference the data source by alias(e.g. n)
|
from fruit in ['Apple', 'Apricot', 'Banana', 'Cantaloupe']
groupby fruit.substring(0, 1) as firstChar
select firstChar, agg(_g.stream().map(r -> r.fruit).toList()) as fruit_list
Also, we could apply the aggregate functions for the whole GINQ result, i.e. no groupby clause is needed:
assert [3] == GQ {
from n in [1, 2, 3]
select max(n)
}.toList()
assert [[1, 3, 2, 2, 6, 3, 3, 6]] == GQ {
from n in [1, 2, 3]
select min(n), max(n), avg(n), median(n), sum(n), count(n), count(),
agg(_g.stream().map(r -> r.n).reduce(BigDecimal.ZERO, BigDecimal::add))
}.toList()
assert [0.816496580927726] == GQ {
from n in [1, 2, 3]
select stdev(n)
}.toList()
assert [1] == GQ {
from n in [1, 2, 3]
select stdevp(n)
}.toList()
assert [0.6666666666666667] == GQ {
from n in [1, 2, 3]
select var(n)
}.toList()
assert [1] == GQ {
from n in [1, 2, 3]
select varp(n)
}.toList()
1.7. Sorting
orderby is equivalent to SQL’s ORDER BY
from n in [1, 5, 2, 6]
orderby n
select n
in asc is optional when sorting in ascending order
|
from n in [1, 5, 2, 6]
orderby n in asc
select n
from n in [1, 5, 2, 6]
orderby n in desc
select n
from s in ['a', 'b', 'ef', 'cd']
orderby s.length() in desc, s in asc
select s
from s in ['a', 'b', 'ef', 'cd']
orderby s.length() in desc, s
select s
from n in [1, null, 5, null, 2, 6]
orderby n in asc(nullslast)
select n
nullslast is equivalent to SQL’s NULLS LAST and applied by default.
nullsfirst is equivalent to SQL’s NULLS FIRST.
|
from n in [1, null, 5, null, 2, 6]
orderby n in asc(nullsfirst)
select n
from n in [1, null, 5, null, 2, 6]
orderby n in desc(nullslast)
select n
from n in [1, null, 5, null, 2, 6]
orderby n in desc(nullsfirst)
select n
1.8. Pagination
limit is similar to the limit clause of MySQL, which could specify the offset(first argument) and size(second argument) for paginating,
or just specify the only one argument as size
from n in [1, 2, 3, 4, 5]
limit 3
select n
from n in [1, 2, 3, 4, 5]
limit 1, 3
select n
1.9. Set Operations
Set operations combine the results of two complete GINQ expressions.
union returns distinct rows from both queries (duplicates removed):
assert [1, 2, 3, 4, 5] == GQ {
from n in [1, 2, 3]
select n
union
from m in [3, 4, 5]
select m
}.toList()
unionall returns all rows from both queries (duplicates preserved):
assert [1, 2, 3, 3, 4, 5] == GQ {
from n in [1, 2, 3]
select n
unionall
from m in [3, 4, 5]
select m
}.toList()
intersect returns only rows that appear in both queries:
assert [3, 4] == GQ {
from n in [1, 2, 3, 4]
select n
intersect
from m in [3, 4, 5, 6]
select m
}.toList()
minus returns rows from the first query that do not appear in the second:
assert [1, 2] == GQ {
from n in [1, 2, 3, 4]
select n
minus
from m in [3, 4, 5, 6]
select m
}.toList()
Each side of a set operation is a complete from…select expression and can include
where, groupby, orderby, limit, and any other clauses.
The set operation keyword must appear on its own line.
Set operations are also available as methods on any Queryable result:
q1.union(q2), q1.intersect(q2), q1.minus(q2).
|
2. Common Recipes
Recipes for everyday tasks: row numbering, list comprehensions, querying JSON, updating collections, and workarounds for SQL features not yet in GINQ.
2.1. Row Number
_rn is the implicit variable representing row number for each record in the result set. It starts with 0
from n in [1, 2, 3]
select _rn, n
2.2. List Comprehension
List comprehension is an elegant way to define and create lists based on existing lists:
assert [4, 16, 36, 64, 100] == GQ {from n in 1..<11 where n % 2 == 0 select n ** 2}.toList()
assert [4, 16, 36, 64, 100] == GQ {from n in 1..<11 where n % 2 == 0 select n ** 2} as List
assert [4, 16, 36, 64, 100] == GQL {from n in 1..<11 where n % 2 == 0 select n ** 2}
GQL {…} is the abbreviation of GQ {…}.toList()
|
GINQ could be used as list comprehension in the loops directly:
def result = []
for (def x : GQ {from n in 1..<11 where n % 2 == 0 select n ** 2}) {
result << x
}
assert [4, 16, 36, 64, 100] == result
2.3. Query JSON
def json = new JsonSlurper().parseText('''
{
"fruits": [
{"name": "Orange", "price": 11},
{"name": "Apple", "price": 6},
{"name": "Banana", "price": 4},
{"name": "Mango", "price": 28},
{"name": "Durian", "price": 32}
]
}
''')
assert [['Mango', 28], ['Orange', 11], ['Apple', 6], ['Banana', 4]] == GQ {
from f in json.fruits
where f.price < 30
orderby f.price in desc
select f.name, f.price
}.toList()
2.4. Query & Update
This is like update statement in SQL
import groovy.transform.*
@TupleConstructor
@EqualsAndHashCode
@ToString
class Person {
String name
String nickname
}
def linda = new Person('Linda', null)
def david = new Person('David', null)
def persons = [new Person('Daniel', 'ShanFengXiaoZi'), linda, david]
def result = GQ {
from p in persons
where p.nickname == null
select p
}.stream()
.peek(p -> { p.nickname = 'Unknown' }) // update `nickname`
.toList()
def expected = [new Person('Linda', 'Unknown'), new Person('David', 'Unknown')]
assert expected == result
assert ['Unknown', 'Unknown'] == [linda, david]*.nickname // ensure the original objects are updated
2.5. Alternative for with clause
GINQ does not support with clause for now, but we could define a temporary variable to workaround:
def v = GQ { from n in [1, 2, 3] where n < 3 select n }
def result = GQ {
from n in v
where n > 1
select n
}
assert [2] == result.toList()
2.6. Alternative for case-when
case-when of SQL could be replaced with switch expression:
assert ['a', 'b', 'c', 'c'] == GQ {
from n in [1, 2, 3, 4]
select switch (n) {
case 1 -> 'a'
case 2 -> 'b'
default -> 'c'
}
}.toList()
3. Advanced Topics
This section covers window functions, nested subqueries, the classic (pre-into) groupby style,
the @GQ annotation for declaring GINQ methods, parallel querying, and options for customizing
and optimizing GINQ. It also includes the full syntax reference.
3.1. Window Functions
Window can be defined by partitionby, orderby, rows and range:
over(
[partitionby <expression> (, <expression>)*]
[orderby <expression> (, <expression>)*
[rows <lower>, <upper> | range <lower>, <upper>]]
)
-
0used as bound ofrowsandrangeclause is equivalent to SQL’sCURRENT ROW, and negative meansPRECEDING, positive meansFOLLOWING -
nullused as the lower bound ofrowsandrangeclause is equivalent to SQL’sUNBOUNDED PRECEDING -
nullused as the upper bound ofrowsandrangeclause is equivalent to SQL’sUNBOUNDED FOLLOWING
Also, GINQ provides some built-in window functions:
| Function | Argument Type(s) | Return Type | Description |
|---|---|---|---|
rowNumber() |
java.lang.Long |
number of the current row within its partition, counting from |
|
rank() |
java.lang.Long |
rank of the current row with gaps |
|
denseRank() |
java.lang.Long |
rank of the current row without gaps |
|
percentRank() |
java.math.BigDecimal |
relative rank of the current row: (rank - 1) / (total rows - 1) |
|
cumeDist() |
java.math.BigDecimal |
relative rank of the current row: (number of rows preceding or peer with current row) / (total rows) |
|
ntile(expression) |
java.lang.Long |
java.lang.Long |
bucket index ranging from |
lead(expression [, offset [, default]]) |
any [, java.lang.Long [, same as expression type]] |
same as expression type |
returns expression evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as expression). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to |
lag(expression [, offset [, default]]) |
any [, java.lang.Long [, same as expression type]] |
same as expression type |
returns expression evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as expression). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to |
firstValue(expression) |
any |
same type as expression |
returns expression evaluated at the row that is the first row of the window frame |
lastValue(expression) |
any |
same type as expression |
returns expression evaluated at the row that is the last row of the window frame |
nthValue(expression, n) |
any, java.lang.Long |
same type as expression |
returns expression evaluated at the row that is the nth row of the window frame |
count() |
java.lang.Long |
number of rows, similar to |
|
count(expression) |
any |
java.lang.Long |
number of rows for which the value of expression is not |
min(expression) |
java.lang.Comparable |
same as argument type |
minimum value of expression across all non-null values |
max(expression) |
java.lang.Comparable |
same as argument type |
maximum value of expression across all non-null values |
sum(expression) |
java.lang.Number |
java.math.BigDecimal |
sum of expression across all non-null values |
avg(expression) |
java.lang.Number |
java.math.BigDecimal |
the average (arithmetic mean) of all non-null values |
median(expression) |
java.lang.Number |
java.math.BigDecimal |
value such that the number of non-null values above and below it is the same ("middle" value, not necessarily same as average or mean) |
stdev(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical standard deviation of all non-null values |
stdevp(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical standard deviation for the population for all non-null values |
var(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical variance of all non-null values |
varp(expression) |
java.lang.Number |
java.math.BigDecimal |
the statistical variance for the population for all non-null values |
agg(expression) |
any |
any |
INCUBATING: customizes the aggregation logic in expression and returns single value |
3.1.1. rowNumber
assert [[2, 1, 1, 1], [1, 0, 0, 2], [null, 3, 3, 3], [3, 2, 2, 0]] == GQ {
from n in [2, 1, null, 3]
select n, (rowNumber() over(orderby n)),
(rowNumber() over(orderby n in asc)),
(rowNumber() over(orderby n in desc))
}.toList()
assert [[1, 0, 1, 2, 3], [2, 1, 2, 1, 2], [null, 3, 0, 3, 0], [3, 2, 3, 0, 1]] == GQ {
from n in [1, 2, null, 3]
select n, (rowNumber() over(orderby n in asc(nullslast))),
(rowNumber() over(orderby n in asc(nullsfirst))),
(rowNumber() over(orderby n in desc(nullslast))),
(rowNumber() over(orderby n in desc(nullsfirst)))
}.toList()
| The parentheses around the window function is required. |
3.1.2. rank, denseRank, percentRank, cumeDist and ntile
assert [['a', 1, 1], ['b', 2, 2], ['b', 2, 2],
['c', 4, 3], ['c', 4, 3], ['d', 6, 4],
['e', 7, 5]] == GQ {
from s in ['a', 'b', 'b', 'c', 'c', 'd', 'e']
select s,
(rank() over(orderby s)),
(denseRank() over(orderby s))
}.toList()
assert [[60, 0, 0.4], [60, 0, 0.4], [80, 0.5, 0.8], [80, 0.5, 0.8], [100, 1, 1]] == GQ {
from n in [60, 60, 80, 80, 100]
select n,
(percentRank() over(orderby n)),
(cumeDist() over(orderby n))
}.toList()
assert [[1, 0], [2, 0], [3, 0],
[4, 1], [5, 1],
[6, 2], [7, 2],[8, 2],
[9, 3], [10, 3]] == GQ {
from n in 1..10
select n, (ntile(4) over(orderby n))
}.toList()
3.1.3. lead and lag
assert [[2, 3], [1, 2], [3, null]] == GQ {
from n in [2, 1, 3]
select n, (lead(n) over(orderby n))
}.toList()
assert [[2, 3], [1, 2], [3, null]] == GQ {
from n in [2, 1, 3]
select n, (lead(n) over(orderby n in asc))
}.toList()
assert [['a', 'bc'], ['ab', null], ['b', 'a'], ['bc', 'ab']] == GQ {
from s in ['a', 'ab', 'b', 'bc']
select s, (lead(s) over(orderby s.length(), s in desc))
}.toList()
assert [['a', null], ['ab', null], ['b', 'a'], ['bc', 'ab']] == GQ {
from s in ['a', 'ab', 'b', 'bc']
select s, (lead(s) over(partitionby s.length() orderby s.length(), s in desc))
}.toList()
assert [[2, 1], [1, null], [3, 2]] == GQ {
from n in [2, 1, 3]
select n, (lag(n) over(orderby n))
}.toList()
assert [[2, 3], [1, 2], [3, null]] == GQ {
from n in [2, 1, 3]
select n, (lag(n) over(orderby n in desc))
}.toList()
assert [['a', null], ['b', 'a'], ['aa', null], ['bb', 'aa']] == GQ {
from s in ['a', 'b', 'aa', 'bb']
select s, (lag(s) over(partitionby s.length() orderby s))
}.toList()
assert [[2, 3, 1], [1, 2, null], [3, null, 2]] == GQ {
from n in [2, 1, 3]
select n, (lead(n) over(orderby n)), (lag(n) over(orderby n))
}.toList()
The offset can be specified other than the default offset 1:
assert [[2, null, null], [1, 3, null], [3, null, 1]] == GQ {
from n in [2, 1, 3]
select n, (lead(n, 2) over(orderby n)), (lag(n, 2) over(orderby n))
}.toList()
The default value can be returned when the index specified by offset is out of window, e.g. 'NONE':
assert [[2, 'NONE', 'NONE'], [1, 3, 'NONE'], [3, 'NONE', 1]] == GQ {
from n in [2, 1, 3]
select n, (lead(n, 2, 'NONE') over(orderby n)), (lag(n, 2, 'NONE') over(orderby n))
}.toList()
3.1.4. firstValue, lastValue and nthValue
assert [[2, 1], [1, 1], [3, 2]] == GQ {
from n in [2, 1, 3]
select n, (firstValue(n) over(orderby n rows -1, 1))
}.toList()
assert [[2, 3], [1, 2], [3, 3]] == GQ {
from n in [2, 1, 3]
select n, (lastValue(n) over(orderby n rows -1, 1))
}.toList()
assert [[2, 2], [1, 1], [3, 3]] == GQ {
from n in [2, 1, 3]
select n, (firstValue(n) over(orderby n rows 0, 1))
}.toList()
assert [[2, 1], [1, null], [3, 1]] == GQ {
from n in [2, 1, 3]
select n, (firstValue(n) over(orderby n rows -2, -1))
}.toList()
assert [[2, 1], [1, null], [3, 2]] == GQ {
from n in [2, 1, 3]
select n, (lastValue(n) over(orderby n rows -2, -1))
}.toList()
assert [[2, 3], [1, 3], [3, null]] == GQ {
from n in [2, 1, 3]
select n, (lastValue(n) over(orderby n rows 1, 2))
}.toList()
assert [[2, 3], [1, 2], [3, null]] == GQ {
from n in [2, 1, 3]
select n, (firstValue(n) over(orderby n rows 1, 2))
}.toList()
assert [[2, 2], [1, 1], [3, 3]] == GQ {
from n in [2, 1, 3]
select n, (lastValue(n) over(orderby n rows -1, 0))
}.toList()
assert [[2, 1], [1, 1], [3, 1]] == GQ {
from n in [2, 1, 3]
select n, (firstValue(n) over(orderby n rows null, 1))
}.toList()
assert [[2, 3], [1, 3], [3, 3]] == GQ {
from n in [2, 1, 3]
select n, (lastValue(n) over(orderby n rows -1, null))
}.toList()
assert [['a', 'a', 'b'], ['aa', 'aa', 'bb'], ['b', 'a', 'b'], ['bb', 'aa', 'bb']] == GQ {
from s in ['a', 'aa', 'b', 'bb']
select s, (firstValue(s) over(partitionby s.length() orderby s)),
(lastValue(s) over(partitionby s.length() orderby s))
}.toList()
assert [[1, 1, 2, 3, null], [2, 1, 2, 3, null], [3, 1, 2, 3, null]] == GQ {
from n in 1..3
select n, (nthValue(n, 0) over(orderby n)),
(nthValue(n, 1) over(orderby n)),
(nthValue(n, 2) over(orderby n)),
(nthValue(n, 3) over(orderby n))
}.toList()
3.1.5. min, max, count, sum, avg, median, stdev, stdevp, var ,varp and agg
assert [['a', 'a', 'b'], ['b', 'a', 'b'], ['aa', 'aa', 'bb'], ['bb', 'aa', 'bb']] == GQ {
from s in ['a', 'b', 'aa', 'bb']
select s, (min(s) over(partitionby s.length())), (max(s) over(partitionby s.length()))
}.toList()
assert [[1, 2, 2, 2, 1, 1], [1, 2, 2, 2, 1, 1],
[2, 2, 2, 4, 2, 2], [2, 2, 2, 4, 2, 2],
[3, 2, 2, 6, 3, 3], [3, 2, 2, 6, 3, 3]] == GQ {
from n in [1, 1, 2, 2, 3, 3]
select n, (count() over(partitionby n)),
(count(n) over(partitionby n)),
(sum(n) over(partitionby n)),
(avg(n) over(partitionby n)),
(median(n) over(partitionby n))
}.toList()
assert [[2, 6, 3, 1, 3, 4], [1, 6, 3, 1, 3, 4],
[3, 6, 3, 1, 3, 4], [null, 6, 3, 1, 3, 4]] == GQ {
from n in [2, 1, 3, null]
select n, (sum(n) over()),
(max(n) over()),
(min(n) over()),
(count(n) over()),
(count() over())
}.toList()
assert [[1, 1, 1], [2, 2, 3], [5, 2, 10], [5, 2, 10]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n range -2, 0)),
(sum(n) over(orderby n range -2, 0))
}.toList()
assert [[1, 2, 3], [2, 1, 2], [5, 2, 10], [5, 2, 10]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n range 0, 1)),
(sum(n) over(orderby n range 0, 1))
}.toList()
assert [[1, 2, 3], [2, 2, 3], [5, 2, 10], [5, 2, 10]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n range -1, 1)),
(sum(n) over(orderby n range -1, 1))
}.toList()
assert [[1, 1, 2], [2, 0, 0], [5, 0, 0], [5, 0, 0]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n in desc range 1, 2)),
(sum(n) over(orderby n in desc range 1, 2))
}.toList()
assert [[1, 0, 0], [2, 1, 1], [5, 0, 0], [5, 0, 0]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n in desc range -2, -1)),
(sum(n) over(orderby n in desc range -2, -1))
}.toList()
assert [[1, 3, 12], [2, 2, 10], [5, 0, 0], [5, 0, 0]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n range 1, null)),
(sum(n) over(orderby n range 1, null))
}.toList()
assert [[1, 2, 3], [2, 2, 3], [5, 4, 13], [5, 4, 13]] == GQ {
from n in [1, 2, 5, 5]
select n, (count() over(orderby n range null, 1)),
(sum(n) over(orderby n range null, 1))
}.toList()
assert [[1, 0.816496580927726],
[2, 0.816496580927726],
[3, 0.816496580927726]] == GQ {
from n in [1, 2, 3]
select n, (stdev(n) over())
}.toList()
assert [[1, 1], [2, 1], [3, 1]] == GQ {
from n in [1, 2, 3]
select n, (stdevp(n) over())
}.toList()
assert [[1, 0.6666666666666667],
[2, 0.6666666666666667],
[3, 0.6666666666666667]] == GQ {
from n in [1, 2, 3]
select n, (var(n) over())
}.toList()
assert [[1, 1], [2, 1], [3, 1]] == GQ {
from n in [1, 2, 3]
select n, (varp(n) over())
}.toList()
assert [[1, 4], [2, 2], [3, 4]] == GQ {
from n in [1, 2, 3]
select n,
(agg(_g.stream().map(r -> r.n).reduce(BigDecimal.ZERO, BigDecimal::add)) over(partitionby n % 2))
}.toList()
3.2. Nested GINQ
3.2.1. Nested GINQ in from clause
from v in (
from n in [1, 2, 3]
select n
)
select v
3.2.2. Nested GINQ in where clause
from n in [0, 1, 2]
where n in (
from m in [1, 2]
select m
)
select n
from n in [0, 1, 2]
where (
from m in [1, 2]
where m == n
select m
).exists()
select n
3.2.3. Nested GINQ in select clause
assert [null, 2, 3] == GQ {
from n in [1, 2, 3]
select (
from m in [2, 3, 4]
where m == n
limit 1
select m
)
}.toList()
It’s recommended to use limit 1 to restrict the count of sub-query result
because TooManyValuesException will be thrown if more than one values returned
|
We could use as clause to name the sub-query result
assert [[1, null], [2, 2], [3, 3]] == GQ {
from n in [1, 2, 3]
select n, (
from m in [2, 3, 4]
where m == n
select m
) as sqr
}.toList()
3.3. Classic groupby style
GINQ also supports an older style without the into keyword which looks simpler
for some cases but has some limitations — aggregate functions use a special syntax
rather than real method calls, and the group cannot be accessed as a composable collection.
Each field in any nonaggregate expression in the select clause must be included
in the groupby clause:
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
select n, count(n)
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
having n >= 3
select n, count(n)
from n in [1, 1, 3, 3, 6, 6, 6]
groupby n
having count() < 3
select n, count()
The group columns could be renamed with as clause:
from s in ['ab', 'ac', 'bd', 'acd', 'bcd', 'bef']
groupby s.size() as length, s[0] as firstChar
select length, firstChar, max(s)
from s in ['ab', 'ac', 'bd', 'acd', 'bcd', 'bef']
groupby s.size() as length, s[0] as firstChar
having length == 3 && firstChar == 'b'
select length, firstChar, max(s)
3.4. Using the Queryable API directly
The groupByInto method is also available on the Queryable API directly,
which can be useful when building queries programmatically:
def nums = [1, 2, 2, 3, 3, 4, 4, 5]
def result = from(nums)
.groupByInto(e -> e, g -> g.count() > 1)
.select((g, q) -> Tuple.tuple(g.key, g.count()))
.toList()
assert [[2, 2], [3, 2], [4, 2]] == result
3.5. @GQ Annotation
GINQ could be written in a method marked with @GQ:
@GQ
def someGinqMethod() {
/* GINQ CODE */
}
For example,
-
Mark the
ginqmethod as a GINQ method with@GQannotation:
@groovy.ginq.transform.GQ
def ginq(list, b, e) {
from n in list
where b < n && n < e
select n
}
assert [3, 4] == ginq([1, 2, 3, 4, 5, 6], 2, 5).toList()
-
Specify the result type as
List:
import groovy.ginq.transform.GQ
@GQ(List)
def ginq(b, e) {
from n in [1, 2, 3, 4, 5, 6]
where b < n && n < e
select n
}
assert [3, 4] == ginq(2, 5)
GINQ supports many result types, e.g. List, Set, Collection, Iterable, Iterator, java.util.stream.Stream and array types.
|
-
Enable parallel querying:
import groovy.ginq.transform.GQ
@GQ(parallel=true)
def ginq(x) {
from n in [1, 2, 3]
where n < x
select n
}
assert [1] == ginq(2).toList()
3.6. Parallel Querying
Parallel querying is especially efficient when querying big data sources. It is disabled by default, but we could enable it by hand:
assert [[1, 1], [2, 2], [3, 3]] == GQ(parallel: true) {
from n1 in 1..1000
join n2 in 1..10000 on n2 == n1
where n1 <= 3 && n2 <= 5
select n1, n2
}.toList()
As parallel querying will use a shared thread pool, the following code can release resources after all GINQ statements execution are completed, and it will wait util all tasks of threads are completed.
GQ {
shutdown
}
Once shutdown is issued, parallel querying can not work anymore.
|
The following code is equivalent to the above code, in other words, immediate is optional:
GQ {
shutdown immediate
}
Shutdown without waiting tasks to complete:
GQ {
shutdown abort
}
3.7. Customize GINQ
For advanced users, you could customize GINQ behaviour by specifying your own target code generator.
For example, we could specify the qualified class name org.apache.groovy.ginq.provider.collection.GinqAstWalker as the target code generator to generate GINQ method calls for querying collections,
which is the default behaviour of GINQ:
assert [0, 1, 2] == GQ(astWalker: 'org.apache.groovy.ginq.provider.collection.GinqAstWalker') {
from n in [0, 1, 2]
select n
}.toList()
3.8. Optimize GINQ
GINQ optimizer is enabled by default for better performance. It will transform the GINQ AST to achieve better execution plan. We could disable it by hand:
assert [[2, 2]] == GQ(optimize: false) {
from n1 in [1, 2, 3]
join n2 in [1, 2, 3] on n1 == n2
where n1 > 1 && n2 < 3
select n1, n2
}.toList()
3.9. GINQ Syntax Reference
The full GINQ clause structure for reference:
GQ, i.e. abbreviation for GINQ
|__ <query>
| |__ from
| | |__ <data_source_alias> in <data_source>
| |__ [join/innerjoin/leftjoin/rightjoin/fulljoin/crossjoin]*
| | |__ <data_source_alias> in <data_source>
| | |__ on <condition> ((&& | ||) <condition>)* (NOTE: `crossjoin` does not need `on` clause)
| |__ [where]
| | |__ <condition> ((&& | ||) <condition>)*
| |__ [groupby]
| | |__ <expression> [as <alias>] (, <expression> [as <alias>])* [into <group_alias>]
| | |__ [having]
| | |__ <condition> ((&& | ||) <condition>)*
| |__ [orderby]
| | |__ <expression> [in (asc|desc)] (, <expression> [in (asc|desc)])*
| |__ [limit]
| | |__ [<offset>,] <size>
| |__ select
| |__ <expression> [as <alias>] (, <expression> [as <alias>])*
|__ [union/unionall/intersect/minus]*
|__ <query>
[] means the related clause is optional, * means zero or more times, and + means one or more times. Also, the clauses within each query are order sensitive,
so the order of clauses should be kept as the above structure.
ONLY ONE from clause is required per query. Multiple data sources are supported through from and the related joins.
Set operations chain left-to-right, so Q1 union Q2 minus Q3 is evaluated as (Q1 union Q2) minus Q3.
|
3.10. Known Limitations
| Currently GINQ can not work well when STC is enabled. |
4. GINQ Examples
Complete worked examples showing GINQ in action.
4.1. Generate Multiplication Table
from v in (
from a in 1..9
join b in 1..9 on a <= b
select a as f, b as s, "$a * $b = ${a * b}".toString() as r
)
groupby v.s
select max(v.f == 1 ? v.r : '') as v1,
max(v.f == 2 ? v.r : '') as v2,
max(v.f == 3 ? v.r : '') as v3,
max(v.f == 4 ? v.r : '') as v4,
max(v.f == 5 ? v.r : '') as v5,
max(v.f == 6 ? v.r : '') as v6,
max(v.f == 7 ? v.r : '') as v7,
max(v.f == 8 ? v.r : '') as v8,
max(v.f == 9 ? v.r : '') as v9
4.2. More examples
link: the latest GINQ examples
| Some examples in the above link require the latest SNAPSHOT version of Groovy to run. |