Application of Sequence in SQL Learning

Sequence is one of the most common forms of data, and in actual data development scenarios, finite sequences are commonly encountered.

ⒹⒺⓈⒸⓇⒾⓅⓉⒾⓄⓃ

Sequence is one of the most common forms of data, and in actual data development scenarios, finite sequences are commonly encountered. This section will start with the simplest sequence of increasing numbers, identify general methods, and extend them to more generalized scenarios.
1. Common sequence
1) A simple sequence of increasing numbers
Firstly, let's introduce a simple scenario of an increasing integer sequence:
Starting from value 0;
Increment each subsequent value by 1;
End at value 3;
How to generate a sequence that satisfies the above three conditions? Namely [0,1,2,3].
In fact, there are various ways to generate this sequence, and here we introduce one simple and universal solution.
-- SQL - 1
    select
    t.pos as a_n
    from (
    select posexplode(split(space(3), space(1), false))
    ) t;

From the above SQL fragment, it can be inferred that generating an increasing sequence only requires three steps:
Generate an array of appropriate length, where the elements do not need to have actual meanings; Generate index indices for each element in the array using the UDTF function posexplore; Retrieve the index index of each element. The above three steps can be extended to more general sequence scenarios: arithmetic differences
Column, proportional sequence. Based on this, the following text will directly provide the final implementation template.
2) Arithmetic sequence


SQL implementation:
 -- SQL - 2
    select
    a + t.pos * d as a_n
    from (
    select posexplode(split(space(n - 1), space(1), false))
    ) t;
3) Proportional sequence

SQL implementation:
--SQL -3
Select
A * pow (q, t. pos) as a_ N
From(
Select pose (split (space (n-1), space (1), false))
)T;
Tip: MaxCompute (ODPS) system function sequence can also be used directly to quickly generate sequences.
--SQL -4
Select sequence (1, 3, 1);
--Result
[1, 2, 3]
2. Application scenario examples
1) Restore dimension column cluster names under any combination of dimensions
In multidimensional analysis scenarios, high-order aggregation functions such as cube, roll up, grouping sets, etc. may be used to aggregate and statistically analyze data under different dimensional combinations.
Scenario Description
Existing user access log table visit_ Log, each row of data represents a user access log.
--SQL -5
With visit_ Log as(
Select stack(
6,
'2024-01-01', '101', 'Hubei', 'Wuhan', 'Android',
2024-01-01 ',' 102 ',' Hunan ',' Changsha ',' IOS',
2024-01-01, 103, Sichuan, Chengdu, Windows,
2024-01-02, 101, Hubei, Xiaogan, Mac,
'2024-01-02', '102', 'Hunan', 'Shaoyang', 'Android',
2024-01-03, 101, Hubei, Wuhan, IOS
--Fields: Date, User, Province, City, Device Type
As (dt, user_id, province, city, device_type)
Select * from visit_ Log;
Currently targeting provinces, cities, and device types_ The user access volume under different dimension combinations was obtained by aggregating the three dimension columns of type through grouping sets. Question:
How to know which dimension columns a statistical result is aggregated based on?
What should be done to output the name of the aggregated dimension column for downstream report display and other scenarios?
Solution:
You can leverage the GROUPING provided by MaxCompute (ODPS)__ The core method is to use ID to solve the problem of GROUPING__ Reverse implementation of ID.

That is, the number of combinations of all dimensions, and each value represents a GROUPING__ ID.
② Prepare all dimension names.
Generate a string sequence and save the names of dimension columns in sequence, i.e
{dim name_1, dim name_2,..., dim name_1}
③ GROUPING__ Map ID to dimension column name.
For GROUPING__ For each value in the ID increment sequence, map each binary bit of the value to the index of the dimension name sequence, and output all dimension names corresponding to bit 0. For example:
GROUPING__ ID: 3=>{0, 1, 1}
Dimension name sequence: {Province, City, Device Type}
Mapping: {0: Province, 1: City, 1: Device Type}
GROUPING__ The data row aggregation dimension with ID 3 is: Province
SQL implementation
--SQL -6
With group_ Dimension as(
Select - The dimension field corresponding to each grouping
Gb.group_ ID, concat_ Ws ("," collect'list (case when gb. placeholder_bit=0 then dim_col. val else null end)) as dimension_ Name
From(
Select groups. pos as group_ ID, PE*
From(
Select pose (split (space (cast (pow (2,3) as int) -1), space (1), false))
)Groups - All groups
Lateral view posexplore (regexp_extract_all (lpad (conv (groups. pos, 10,2), 3, "0"), '(0 | 1)'))) pe as placeholder_ Idx, placeholder_ Bit - bit information for each group
)GB
Left join (-- all dimension fields
Select postexplore (split ("Province, City, Device Type", ','))
) dim_col on gb.placeholder_idx = dim_col.pos
    group by gb.group_id
    select
    group_dimension.dimension_name,
    province, city, device_type,
    visit_count
    from (
    select
    grouping_id(province, city, device_type) as group_id,
    province, city, device_type,
    count(1) as visit_count
    from visit_log b
    group by province, city, device_type
    GROUPING SETS(
    (province),
    (province, city),
    (province, city, device_type)
    ) t
    join group_dimension on t.group_id = group_dimension.group_id
    order by group_dimension.dimension_name;


 

Our Strengths
Would You Please Leave a Message