ORACLE日常统计分析函数

2021/04/12

以下文章来源于cc_0101

正文

一、分析函数语法

function_name(<argument>,<argument>...) over(<partition_Clause><order by_Clause><windowing_Clause>);

function_name():函数名称

argument:参数

over( ):开窗函数

partition_Clause:分区子句,数据记录集分组,group by…

order by_Clause:排序子句,数据记录集排序,order by…

windowing_Clause:开窗子句,定义分析函数在操作行的集合,三种开窗方式:rows、range、Specifying

注:使用开窗子句时一定要有排序子句!!! 并且排序字段能确定唯一性数据!如果根据排序字段得到的数据不具有唯一性,开窗时会随机取数。

本篇未涉及开窗子句,开窗子句在另外的文章中单独说明

二、分析函数汇总

1、count() over() :统计分区中各组的行数,partition by 可选,order by 可选

select ename,esex,eage,count(*) over() from emp; --总计数
select ename,esex,eage,count(*) over(order by eage) from emp; --递加计数
select ename,esex,eage,count(*) over(partition by esex) from emp; --分组计数
select ename,esex,eage,count(*) over(partition by esex order by eage) from emp;--分组递加计数

2、sum() over() :统计分区中记录的总和,partition by 可选,order by 可选

select ename,esex,eage,sum(salary) over() from emp; --总累计求和
select ename,esex,eage,sum(salary) over(order by eage) from emp; --递加累计求和
select ename,esex,eage,sum(salary) over(partition by esex) from emp; --分组累计求和
select ename,esex,eage,sum(salary) over(partition by esex order by eage) from emp; --分组递加累计求和

3、avg() over() :统计分区中记录的平均值,partition by 可选,order by 可选

select ename,esex,eage,avg(salary) over() from emp; --总平均值
select ename,esex,eage,avg(salary) over(order by eage) from emp; --递加求平均值
select ename,esex,eage,avg(salary) over(partition by esex) from emp; --分组求平均值
select ename,esex,eage,avg(salary) over(partition by esex order by eage) from emp; --分组递加求平均值

4、min() over() :统计分区中记录的最小值,partition by 可选,order by 可选

max() over() :统计分区中记录的最大值,partition by 可选,order by 可选

select ename,esex,eage,salary,min(salary) over() from emp; --求总最小值
select ename,esex,eage,salary,min(salary) over(order by eage) from emp; --递加求最小值
select ename,esex,eage,salary,min(salary) over(partition by esex) from emp; --分组求最小值
select ename,esex,eage,salary,min(salary) over(partition by esex order by eage) from emp; --分组递加求最小值


select ename,esex,eage,salary,max(salary) over() from emp; --求总最大值
select ename,esex,eage,salary,max(salary) over(order by eage) from emp; --递加求最大值
select ename,esex,eage,salary,max(salary) over(partition by esex) from emp; --分组求最大值
select ename,esex,eage,salary,max(salary) over(partition by esex order by eage) from emp; --分组递加求最大值

5、rank() over() :跳跃排序,partition by 可选,order by 必选

select ename,eage,rank() over(partition by job order by eage) from emp;
select ename,eage,rank() over(order by eage) from emp;

6、dense_rank() :连续排序,partition by 可选,order by 必选

select ename,eage,dense_rank() over(partition by job order by eage) from emp;
select ename,eage,dense_rank() over(order by eage) from emp;

7、row_number() over() :排序,无重复值,partition by 可选,order by 必选

select ename,eage,row_number() over(partition by job order by eage) from emp;
select ename,eage,row_number() over(order by eage) from emp;

8、ntile(n) over() :partition by 可选,order by 必选

n表示将分区内记录平均分成n份,多出的按照顺序依次分给前面的组

select ename,salary,ntile(3) over(order by salary desc) from emp;
select ename,salary,ntile(3) over(partition by job order by salary desc) from emp;

9、first_value() over() :取出分区中第一条记录的字段值,partition by 可选,order by 可选

last_value() over() :取出分区中最后一条记录的字段值,partition by 可选,order by 可选

select ename,first_value(salary) over() from emp;
select ename,first_value(salary) over(order by salary desc) from emp;
select ename,first_value(salary) over(partition by job) from emp;                                                           
select ename,first_value(salary) over(partition by job order by salary desc) from emp;
select ename,last_value(ename) over() from emp;
select ename,last_value(ename) over(order by salary desc) from emp;
select ename,last_value(ename) over(partition by job) from emp;
select ename,last_value(ename) over(partition by job order by salary desc) from emp;

10、first :从DENSE_RANK返回的集合中取出排在最前面的一个值的行

last :从DENSE_RANK返回的集合中取出排在最后面的一个值的行

select job,max(salary) keep(dense_rank first order by salary desc),
max(salary) keep(dense_rank last order by salary desc) from emp
group by job;

11、lag() over() :取出前n行数据,partition by 可选,order by 必选

lead() over() :取出后n行数据,partition by 可选,order by 必选

select ename,eage,lag(eage,1,0) over(order by salary),
lead(eage,1,0) over(order by salary) from emp;

select ename,eage,lag(eage,1) over(partition by esex order by salary),
lead(eage,1) over(partition by esex order by salary) from emp;

12、ratio_to_report(a) over(partition by b) :求按照b分组后a的值在所属分组中总值的占比,a的值必须为数值或数值型字段

partition by 可选,order by 不可选

select ename,job,salary,ratio_to_report(1) over() from emp; --给每一行赋值1,求当前行在总值的占比,总是0.1
select ename,job,salary,ratio_to_report(salary) over() from emp; --当前行的值在所有数据中的占比
select ename,job,salary,ratio_to_report(1) over(partition by job) from emp; --给每一行赋值1,求当前行在分组后的组内总值的占比
select ename,job,salary,ratio_to_report(salary) over(partition by job) from emp; --当前行的值在分组后组内总值占比

13、percent_rank() over() :partition by 可选,order by 必选

所在组排名序号-1除以该组所有的行数-1,排名跳跃排序

select ename,job,salary,percent_rank() over(order by salary) from emp;
select ename,job,salary,percent_rank() over(partition by job order by salary) from emp;

14、cume_dist() over() :partition by 可选,order by必选

所在组排名序号除以该组所有的行数,注意对于重复行,计算时取重复行中的最后一行的位置

select ename,job,salary,cume_dist() over(order by salary) from emp;
select ename,job,salary,cume_dist() over(partition by job order by salary) from emp;

15、precentile_cont( x ) within group(order by …) over() :over()中partition by可选,order by 不可选

x为输入的百分比,是0-1之间的一个小数,返回该百分比位置的数据,若没有则返回以下计算值(r):

a=1+( x *(N-1) ) x为输入的百分比,N为分区内的记录的行数

b=ceil ( a ) 向上取整

c = floor( a ) 向下取整

r=a * 百分比位置上一条数据 + b * 百分比位置下一条数据

select ename,job,salary,percentile_cont(0.5) within group(order by salary) over() from emp;

-- 按照job分组,按照salary降序。取出占占5%的salary数据
select ename,job,salary,percentile_cont(0.5) within group(order by salary desc) over(partition by job) from emp; 

16、precentile_disc( x ) within group(order by …) over() :over()中partition by可选,order by 不可选

x为输入的百分比,是0-1之间的一个小数,返回百分比位置对应位置上的数据值,若没有对应数据值,就取大于该分布值的下一个值

select ename,job,salary,percentile_disc(0.5) within group(order by salary) over()from emp;
select ename,job,salary,percentile_disc(0.5) within group(order by salary) over(partition by job) from emp;

15.16、precentile函数详解样例数据:

– 准备数据:

create table EMP
(
EMPNO    NUMBER(4) not null,
ENAME    VARCHAR2(10),
JOB      VARCHAR2(9),
MGR      NUMBER(4),
HIREDATE DATE,
SAL      NUMBER(7,2),
COMM     NUMBER(7,2),
DEPTNO   NUMBER(2)
);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7369, 'SMITH', 'CLERK', 7902, to_date('17-12-1980', 'dd-mm-yyyy'), 800.00, null, 20);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7499, 'ALLEN', 'SALESMAN', 7698, to_date('20-02-1981', 'dd-mm-yyyy'), 1600.00, 300.00, 30);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7521, 'WARD', 'SALESMAN', 7698, to_date('22-02-1981', 'dd-mm-yyyy'), 1250.00, 500.00, 30);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7566, 'JONES', 'MANAGER', 7839, to_date('02-04-1981', 'dd-mm-yyyy'), 2975.00, null, 20);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7654, 'MARTIN', 'SALESMAN', 7698, to_date('28-09-1981', 'dd-mm-yyyy'), 1250.00, 1400.00, 30);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7698, 'BLAKE', 'MANAGER', 7839, to_date('01-05-1981', 'dd-mm-yyyy'), 2850.00, null, 30);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7782, 'CLARK', 'MANAGER', 7839, to_date('09-06-1981', 'dd-mm-yyyy'), 2450.00, null, 10);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7788, 'SCOTT', 'ANALYST', 7566, to_date('19-04-1987', 'dd-mm-yyyy'), 3000.00, null, 20);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7839, 'KING', 'PRESIDENT', null, to_date('17-11-1981', 'dd-mm-yyyy'), 5000.00, null, 10);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7844, 'TURNER', 'SALESMAN', 7698, to_date('08-09-1981', 'dd-mm-yyyy'), 1500.00, 0.00, 30);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7876, 'ADAMS', 'CLERK', 7788, to_date('23-05-1987', 'dd-mm-yyyy'), 1100.00, null, 20);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7900, 'JAMES', 'CLERK', 7698, to_date('03-12-1981', 'dd-mm-yyyy'), 950.00, null, 30);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7902, 'FORD', 'ANALYST', 7566, to_date('03-12-1981', 'dd-mm-yyyy'), 3000.00, null, 20);
insert into emp (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
values (7934, 'MILLER', 'CLERK', 7782, to_date('23-01-1982', 'dd-mm-yyyy'), 1300.00, null, 10);
commit;

– 使用样例说明: 需求是“按薪水的倒序,取薪水的排名,及一定百分比的薪水值”

select e.ename,e.sal,e.deptno,
percent_rank() over(partition by deptno order by sal desc) p_rank, --以deptno分组,以sal倒序,取出百分比排名
PERCENTILE_CONT(0) within group(order by sal desc)
over(partition by deptno) max_sal ,                               --以deptno分组,以sal倒序,取出占0%的薪水
PERCENTILE_CONT(0.25) within group(order by sal desc)
over(partition by deptno) max_sal_25,                             --以deptno分组,以sal倒序,取出25%的薪水
PERCENTILE_CONT(0.5) within group(order by sal desc)
over(partition by deptno) max_sal_50,                             --以deptno分组,以sal倒序,取出50%的薪水
PERCENTILE_CONT(0.75) within group(order by sal desc)              --以deptno分组,以sal倒序,取出70%的薪水
over(partition by deptno) max_sal_75                             
from emp e;

17、stddev() over():计算样本标准差,只有一行数据时返回0,partition by 可选,order by 可选

stddev_samp() over():计算样本标准差,只有一行数据时返回null,partition by 可选,order by 可选

stddev_pop() over():计算总体标准差,partition by 可选,order by 可选

select stddev(stu_age) over() from student; --计算所有记录的样本标准差
select stddev(stu_age) over(order by stu_age) from student; --计算递加的样本标准差
select stddev(stu_age) over(partition by stu_major) from student; --计算分组的样本标准差
select stddev(stu_age) over(partition by stu_major order by stu_age) from student; --计算分组递加的样本标准差

select stddev_samp(stu_age) over() from student; --计算所有记录的样本标准差
select stddev_samp(stu_age) over(order by stu_age) from student; --计算递加的样本标准差
select stddev_samp(stu_age) over(partition by stu_major) from student; --计算分组的样本标准差
select stddev_samp(stu_age) over(partition by stu_major order by stu_age) from student; --计算分组递加的样本标准差

select stddev_pop(stu_age) over() from student; --计算所有记录的总体标准差
select stddev_pop(stu_age) over(order by stu_age) from student; --计算递加的总体标准差
select stddev_pop(stu_age) over(partition by stu_major) from student; --计算分组的总体标准差
select stddev_pop(stu_age) over(partition by stu_major order by stu_age) from student;--计算分组递加的总体标准差

18、variance() over():计算样本方差,只有一行数据时返回0,partition by 可选,order by 可选

var_samp() over():计算样本方差,只有一行数据时返回null,partition by 可选,order by 可选

var_pop() over():计算总体方差,partition by 可选,order by 可选

select variance(stu_age) over() from student; --计算所有记录的样本方差
select variance(stu_age) over(order by stu_age) from student; --计算递加的样本方差
select variance(stu_age) over(partition by stu_major) from student; --计算分组的样本方差
select variance(stu_age) over(partition by stu_major order by stu_age) from student; --计算分组递加的样本方差

select var_samp(stu_age) over() from student; --计算所有记录的样本方差
select var_samp(stu_age) over(order by stu_age) from student; --计算递加的样本方差
select var_samp(stu_age) over(partition by stu_major) from student; --计算分组的样本方差
select var_samp(stu_age) over(partition by stu_major order by stu_age) from student; --计算分组递加的样本方差

select var_pop(stu_age) over() from student; --记录所有就的总体方差
select var_pop(stu_age) over(order by stu_age) from student; --计算递加的总体方差
select var_pop(stu_age) over(partition by stu_major) from student; --计算分组的总体方差
select var_pop(stu_age) over(partition by stu_major order by stu_age) from student;--计算分组递加的样本方差

stddev()=sqrt( variance() ) sqrt()–求开方

stddev_samp()=sqrt( var_samp() )

stddec_pop=sqrt( var_pop() )

19、covar_samp over():返回一对表达式的样本协方差,partition by 可选,order by 可选

covar_pop over(): 返回一堆表达式的总体协方差,partition by 可选,order by 可选

select covar_samp(stu_age,line) over() from student; --计算所有记录的样本协方差
select covar_samp(stu_age,line) over(order by stu_age) from student; --计算递加的样本协方差
select covar_samp(stu_age,line) over(partition by stu_major) from student; --计算分组的样本协方差
select covar_samp(stu_age,line) over(partition by stu_major order by stu_age) from student; --计算分组递加的样本协方差

select covar_pop(stu_age,line) over() from student; --计算所有记录的总体协方差
select covar_pop(stu_age,line) over(order by stu_age) from student; --计算递加的总体协方差
select covar_pop(stu_age,line) over(partition by stu_major) from student; --计算分组的总体协方差
select covar_pop(stu_age,line) over(partition by stu_major order by stu_age) from student; --计算分组递加的总体协方差

20、corr() over() :返回一对表达式的相关系数,partition by 可选,order by 可选

select corr(stu_age,line) over() from student; --计算所有记录的相关系数
select corr(stu_age,line) over(order by stu_age) from student; --计算递加的相关系数
select corr(stu_age,line) over(partition by stu_major) from student; --计算分组的相关系数
select corr(stu_age,line) over(partition by stu_major order by stu_age) from student; --计算分组递加的相关系数

21、REGR_ (Linear Regression) Functions:这些线性回归函数适合最小二乘法回归线,有9个不同的回归函数可使用

Post Directory