免费A级毛片无码专区网站-成人国产精品视频一区二区-啊 日出水了 用力乖乖在线-国产黑色丝袜在线观看下-天天操美女夜夜操美女-日韩网站在线观看中文字幕-AV高清hd片XXX国产-亚洲av中文字字幕乱码综合-搬开女人下面使劲插视频

Oracle收集統(tǒng)計(jì)信息的一些思考

一、問題

  1. Oracle在收集統(tǒng)計(jì)信息時(shí)默認(rèn)的采樣比例是DBMS_STATS.AUTO_SAMPLE_SIZE , 那么AUTO_SAMPLE_SIZE的值具體是多少?
  2. 假設(shè)采樣比例為10% , 那么在計(jì)算單個(gè)列的distinct時(shí)與實(shí)際的差別大嗎?
  3. 有哪些采樣算法?
二、實(shí)驗(yàn)準(zhǔn)備三張實(shí)驗(yàn)表 , t1/t2/t3 , 這三張表的數(shù)據(jù)內(nèi)容完全一致 , 我們分別使用100%、10%、AUTO_SAMPLE_SIZE的比例去收集他們的統(tǒng)計(jì)信息 。
SQL> begin2dbms_stats.gather_table_stats(3ownname => 'BAO',4tabname => 'T1',5estimate_percent => 1006);7end;8/PL/SQL procedure successfully completed.SQL> begin2dbms_stats.gather_table_stats(3ownname => 'BAO',4tabname => 'T2',5estimate_percent => 106);7end;8/PL/SQL procedure successfully completed.SQL> begin2dbms_stats.gather_table_stats(3ownname => 'BAO',4tabname => 'T3',5estimate_percent => dbms_stats.auto_sample_size6);7end;8/PL/SQL procedure successfully completed.查看這三張表的統(tǒng)計(jì)信息 , 可以看到采用100%和AUTO_SAMPLE_SIZE這兩種方式收集的統(tǒng)計(jì)信息的SAMPLE_SIZE相同 , 都是全量收集 。
SQL> select table_name, num_rows, sample_size from user_tables where table_name in ('T1', 'T2', 'T3');TABLE_NAMENUM_ROWSSAMPLE_SIZE-----------------------------------T1145334145334T214619014619T3145334145334官方文檔并沒有說明AUTO_SAMPLE_SIZE具體的值是多少 , 但是從實(shí)驗(yàn)結(jié)果來看 , 這個(gè)值就是100 。這就回答了文章的第一個(gè)問題 。
Oracle為什么會(huì)默認(rèn)采用100%的方式來收集統(tǒng)計(jì)信息呢 , 在ASKTOM有同行就提出過這個(gè)問題“DBMS_STATS.AUTO_SAMPLE_SIZE seems to always generate 100%” , 他們的回復(fù)是為了得到精確的distinct列值 。接下來我們就來看下全量采集和部分采集列的distinct區(qū)別 。
SQL> select a.column_name, a.num_distinct "t1.num_distinct", b.num_distinct "t2.num_distinct",2round((a.num_distinct - b.num_distinct) * 100 / a.num_distinct, 1) "diff",3a.sample_size "t1.sample_size", b.sample_size "t2.sample_size"4from (select table_name, column_name, num_distinct, sample_size from user_tab_col_statistics where table_name in ('T1')) a,5(select table_name, column_name, num_distinct, sample_size from user_tab_col_statistics where table_name in ('T2')) b6where a.column_name = b.column_name and a.num_distinct > 0 order by "diff" desc;COLUMN_NAMEt1.num_distinct t2.num_distinctdiff t1.sample_size t2.sample_size------------------------------ --------------- --------------- ---------- -------------- --------------OBJECT_NAME64552103008414533414619SUBOBJECT_NAME101538562.1682516856TIMESTAMP258512405214521214610LAST_DDL_TIME2490125749.514521214610CREATED2312120947.714533414619NAMESPACE211528.614521214610OBJECT_TYPE453913.314533414619OWNER807111.314533414619TEMPORARY22014533414619DUPLICATED11014533414619STATUS22014533414619SHARDED11014533414619GENERATED22014533414619SECONDARY11014533414619SHARING44014533414619EDITIONABLE220254332531ORACLE_MAINTAINED22014533414619APPLICATION11014533414619DEFAULT_COLLATION110168861705DATA_OBJECT_ID7778578100-.4778227813OBJECT_ID145212146100-.614521214610T1表是全量收集 , T2表是按10%的比例收集 , 從上面的結(jié)果可以看到 , 對(duì)于大部分字段通過部分采樣的方式都能估算得很準(zhǔn)確 。但對(duì)于OBJECT_NAME這個(gè)列 , 估算出來的值和全量統(tǒng)計(jì)的差別很大 , 我們來看一下是什么原因?qū)е碌?。
SQL> select count(*), object_name from t1 group by object_name order by count(*) desc;COUNT(*) OBJECT_NAME---------- -----------------690 S_AAA_CCD690 S_ABA_CED690 S_ACA_CCD690 S_ADA_CCD690 PK_AEA_CED...1 GV_$CON_SYSSTAT1 GV_$DATAFILE1 GV_$TABLESPACE1 GV_$ROLLSTAT1 GV_$PARAMETER可以看到OBJECT_NAME這個(gè)列的數(shù)據(jù)分布極不均勻 。因此對(duì)于分布不均勻的列 , 通過部分采樣方式得到的distinct值與實(shí)際的distinct值差別就會(huì)比較大 。這就回答了文章的第二個(gè)問題 。

經(jīng)驗(yàn)總結(jié)擴(kuò)展閱讀