动态性能视图属于数据字典,它们的所有者为SYS,并且多数动态性能视图只能由特权用户和DBA用户查询。当数据库处于不同状态时,可以访问的动态性能视图有所不同。启动例程时,ORACLE会自动建立动态性能视图;停止例程时,ORACLE会自动删除动态性能视图。数据字典信息是从数据文件中获得,而动态性能视图信息是从SGA和控制文件取得。所以,两者所反映的信息还是有很大差异的。数据库管理员利用这些动态性能视图,可以了解数据库运行的一些基本信息,为我们进行数据库维护以及数据库性能优化提供一些数据上的支持。所有动态性能视图都是以
V_$
开始的,Oracle 为每个动态性能视图提供了相应的同义词(V$开头)
通过查询 V$FIXED_TABLE
,可以列出所有可用的动态性能视图和动态性能表。
SQL> select * from V$FIXED_TABLE where name like 'V$%';
NAME OBJECT_ID TYPE TABLE_NUM
------------------------------ ---------- ----- ----------
V$WAITSTAT 4294950915 VIEW 65537
V$BH 4294951406 VIEW 65537
V$GC_ELEMENT 4294951794 VIEW 65537
V$CR_BLOCK_SERVER 4294951796 VIEW 65537
V$CURRENT_BLOCK_SERVER 4294952095 VIEW 65537
V$POLICY_HISTORY 4294953128 VIEW 65537
V$ENCRYPTED_TABLESPACES 4294952996 VIEW 65537
V$GC_ELEMENTS_WITH_COLLISIONS 4294951798 VIEW 65537
V$FILE_CACHE_TRANSFER 4294951800 VIEW 65537
V$TEMP_CACHE_TRANSFER 4294951802 VIEW 65537
V$CLASS_CACHE_TRANSFER 4294951804 VIEW 65537
V$INSTANCE_CACHE_TRANSFER 4294952151 VIEW 65537
V$LOCK_ELEMENT 4294951408 VIEW 65537
V$BSP 4294951594 VIEW 65537
V$LOCKS_WITH_COLLISIONS 4294951410 VIEW 65537
V$FILE_PING 4294951412 VIEW 65537
V$TEMP_PING 4294951532 VIEW 65537
V$CLASS_PING 4294951414 VIEW 65537
V$LOCK_ACTIVITY 4294951437 VIEW 65537
V$ROWCACHE 4294950916 VIEW 65537
以下是不同类型的指标视图的快速表格比较:
该表的第一行是经典的等待事件和统计视图。以下几行是度量标准视图。度量标准视图是在 Oracle10g
中引入的。
度量视图计算增量和速率,这极大地简化了解决简单问题的能力,比如 “现在我的数据库的I/O速率是多少?” 这个问题,在10g之前,处理起来出奇的乏味。要回答这个问题,你必须查询 v$sysstat
,例如:
Select value from v$sysstat where name='physical reads';
但是仅查询一次 v$sysstat
不能解决问题,而是“自数据库启动以来已完成了多少I / O?”的问题。要回答原始问题,必须两次查询 v$sysstat
并接受两个值之间的增量:
获得这些差值和速率可能是一项艰巨的工作。然后 10gOracle
引入了度量标准表,这些度量表可以在一个查询中解决问题。
等待事件视图为(系统级别)
V$SYSTEM_EVENT
– 自启动以来累积的等待事件V$EVENTMETRIC
- 等待事件增量持续60秒DBA_HIST_SYSTEM_EVENT
– 自启动以来累计的上周按快照(小时)的等待事件等待事件汇总到称为等待类的组中。对于等待类,有以下视图:
V$SYSTEM_WAIT_CLASS
– 自启动以来累积V$WAITCLASSMETRIC
– 持续60秒增量V$WAITCLASSMETRIC_HISTORY
– 最后一小时的60秒增量注意: DBA_HIST_WAITCLASSMETRIC_HISTORY
用于警报或基准,而不是日常值。
其他的就不一一展开了,具体可以参考下文:
http://datavirtualizer.com/wait-event-and-wait-class-metrics-vs-vsystem_event/
cx_Oracle 是一个 Python 扩展模块,可以访问 Oracle 数据库。它符合 Python 数据库API 2.0 规范。
要在 Python 和 Oracle 数据库中使用 cx_Oracle7
,需要满足以下条件:
cx_Oracle
连接到较旧和较新的数据库。(推荐)在 Linux 上安装 cx_Oracle 的一般方法是使用 Python 的 Pip 包从 PyPI 安装 cx_Oracle
:
从 PyPI 安装 cx_Oracle:
python -m pip install cx_Oracle --upgrade
将 Oracle 客户端库添加到操作系统库搜索路径,例如 Linux 的 LDLIBRARYPATH
如果你的数据库位于远程计算机上,请下 适用于你的操作系统体系结构的免费Oracle Instant Client “Basic” 或 “Basic Light” 包
至于具体的 OracleClient
安装,可以参考下文:
https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html#installing-cx-oracle-on-linux
具体的安装可以参考官方文档:
在 InfluxDB 中创建一个 Telegraf 数据库:
[root@zuozewei ~]# influxConnected to http://localhost:8086 version 1.6.2InfluxDB shell version: 1.6.2> create user "telegraf" with password 'telegraf'> create database telegraf> show databasesname: databasesname----_internaltelegraf
编写 python+sql 脚本以收集 oracle 指标。脚本的输出内容很重要,必须是 InfluxDB line-protocol。该脚本查询 v$ SYSMETRIC
和 v$eventmetric
,获得最后一分钟时,等待类和等待事件指标。
python代码是:
import socket,argparse,subprocess,re,cx_Oracle
fqdn = socket.getfqdn()
class OraStats():
def __init__(self, user, passwd, sid): self.user = user self.passwd = passwd self.sid = sid self.delengine = "none" connstr=self.user+'/'+self.passwd+'@'+self.sid self.connection = cx_Oracle.connect(connstr) cursor = self.connection.cursor() cursor.execute("select distinct(SVRNAME) from v$dnfs_servers") rows = cursor.fetchall()
for i in range(0, cursor.rowcount): self.dengine_ip = rows[i][0] proc = subprocess.Popen(["nslookup", self.dengine_ip], stdout=subprocess.PIPE) lookupresult = proc.communicate()[0].split('\n')
for line in lookupresult: if 'name=' in re.sub(r'\s', '', line): self.delengine = re.sub('\..*$', '', re.sub(r'^.*name=', '', re.sub(r'\s', '', re.sub(r'.$', '', line))))
# 等待类别 def waitclassstats(self, user, passwd, sid, format): cursor = self.connection.cursor() cursor.execute(""" select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' union select 'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter, ( select 'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas """) for wait in cursor: wait_name = wait[0] wait_value = wait[1] print ("oracle_wait_class,fqdn={0},delphix={1},db={2},wait_class={3} wait_value={4}".format(fqdn, self.delengine, sid, re.sub(' ', '_', wait_name), wait_value))
# 系统指标 def sysmetrics(self, user, passwd, sid, format): cursor = self.connection.cursor() cursor.execute(""" select METRIC_NAME,VALUE,METRIC_UNIT from v$sysmetric where group_id=2 """) for metric in cursor: metric_name = metric[0] metric_value = metric[1] print ("oracle_sysmetric,fqdn={0},delphix={1},db={2},metric_name={3} metric_value={4}".format(fqdn,self.delengine,sid,re.sub(' ', '_', metric_name),metric_value))
# 在闪回恢复区中有关磁盘配额和当前磁盘使用情况 def fraused(self, user, passwd, sid, format): cursor = self.connection.cursor() cursor.execute(""" select round((SPACE_USED-SPACE_RECLAIMABLE)*100/SPACE_LIMIT,1) from V$RECOVERY_FILE_DEST """) for frau in cursor: fra_used = frau[0] print ("oracle_fra_pctused,fqdn={0},delphix={1},db={2} fra_pctused={3}".format(fqdn,self.delengine,sid,fra_used))
# 磁盘使用状态 def fsused(self): fss = ['/oracle', '/data'] for fs in fss: df = subprocess.Popen(["df","-P",fs], stdout=subprocess.PIPE) output = df.communicate()[0] total = re.sub('%','',output.split("\n")[1].split()[1]) used = re.sub('%','',output.split("\n")[1].split()[2]) pctused = re.sub('%','',output.split("\n")[1].split()[4]) print("oracle_fs_pctused,fqdn={0},fs_name={1} oraclefs_pctused={2},oraclefs_alloc={3},oraclefs_used={4}".format(fqdn,fs,pctused,total,used))
# 等待状态 def waitstats(self, user, passwd, sid, format): cursor = self.connection.cursor() cursor.execute(""" select /*+ ordered use_hash(n) */ n.wait_class wait_class, n.name wait_name, m.wait_count cnt, nvl(round(10*m.time_waited/nullif(m.wait_count,0),3) ,0) avg_ms from v$eventmetric m, v$event_name n where m.event_id=n.event_id and n.wait_class <> 'Idle' and m.wait_count > 0 order by 1""") for wait in cursor: wait_class = wait[0] wait_name = wait[1] wait_cnt = wait[2] wait_avgms = wait[3] print ("oracle_wait_event,fqdn={0},delphix={1},db={2},wait_class={3},wait_event={4} count={5},latency={6}".format(fqdn, self.delengine,sid,re.sub(' ', '_', wait_class), re.sub(' ','_',wait_name),wait_cnt,wait_avgms))
# 表空间使用状态 def tbsstats(self, user, passwd, sid, format): cursor = self.connection.cursor() cursor.execute(""" select /*+ ordered */ tablespace_name, round(used_space), round(max_size-used_space) free_space, round(max_size), round(used_space*100/max_size,2) percent_used from ( select m.tablespace_name, m.used_space*t.block_size/1024/1024 used_space, (case when t.bigfile='YES' then power(2,32)*t.block_size/1024/1024 else tablespace_size*t.block_size/1024/1024 end) max_size from dba_tablespace_usage_metrics m, dba_tablespaces t where m.tablespace_name=t.tablespace_name) """) for tbs in cursor: tbs_name = tbs[0] used_space_mb = tbs[1] free_space_mb = tbs[2] max_size_mb = tbs[3] percent_used = tbs[4] print ("oracle_tablespaces,fqdn={0},delphix={1},db={2},tbs_name={3} used_space_mb={4},free_space_mb={5},percent_used={6},max_size_mb={7}".format(fqdn, self.delengine, sid, re.sub(' ', '_', tbs_name), used_space_mb,free_space_mb,percent_used,max_size_mb))
if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument('-f', '--format', help="Output format, default influx", choices=['kafka', 'influx'], default='influx') subparsers = parser.add_subparsers(dest='stat') parser_all = subparsers.add_parser('ALL', help="Get all database stats") parser_all.add_argument('-u', '--user', help="Username with sys views grant", required=True) parser_all.add_argument('-p', '--passwd', required=True) parser_all.add_argument('-s', '--sid', help="tnsnames SID to connect", required=True)
args = parser.parse_args()
if args.stat == "ALL": stats = OraStats(args.user, args.passwd, args.sid) stats.waitclassstats(args.user, args.passwd, args.sid, args.format) stats.waitstats(args.user, args.passwd, args.sid, args.format) stats.sysmetrics(args.user, args.passwd, args.sid, args.format) stats.tbsstats(args.user, args.passwd, args.sid, args.format) stats.fraused(args.user, args.passwd, args.sid, args.format) stats.fsused()
输出格式化为 InfluxDB line-protocol
[root@localhost tools]# ./oracle.shoracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Application wait_value=0oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=CPU wait_value=0.003oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=CPU_OS wait_value=0.778oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Commit wait_value=0oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Concurrency wait_value=0.001oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Configuration wait_value=0oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Network wait_value=0oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Other wait_value=0oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O wait_value=0.001oracle_wait_class,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=User_I/O wait_value=0oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Commit,wait_event=log_file_sync count=2,latency=0.122oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Concurrency,wait_event=os_thread_startup count=2,latency=21.595oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Network,wait_event=SQL*Net_message_to_client count=17,latency=0.001oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=Other,wait_event=asynch_descriptor_resize count=4,latency=0.001oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=db_file_parallel_write count=2,latency=0.081oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=control_file_parallel_write count=24,latency=0.268oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=control_file_sequential_read count=71,latency=0.716oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=System_I/O,wait_event=log_file_parallel_write count=7,latency=0.076oracle_wait_event,fqdn=localhost.localdomain,delphix=none,db=172.16.106.251:1521/orcl,wait_class=User_I/O,wait_event=Disk_file_operations_I/O count=16,laten
定义一个 shell 脚本执行 Python 脚本
#!/usr/bin/env bashpython /home/oracle/scripts/oracle_metrics.sh -f "influx" "ALL" "-u" "system" "-p" "xxxx" "-s" "172.16.106.251:1521/orcl"
在 oracle主机上,配置 telegraf 以60秒的间隔执行 python sh,然后将输出发送到 InfluxDB。编辑 /etc/telegraf/telegraf.conf
配置文件:
# Telegraf configuration
# Telegraf is entirely plugin driven. All metrics are gathered from the# declared inputs, and sent to the declared outputs.
# Plugins must be declared in here to be active.# To deactivate a plugin, comment out the name and any variables.
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config# file would generate.
# Global tags can be specified here in key="value" format.[global_tags] # dc = "us-east-1" # will tag all metrics with dc=us-east-1 # rack = "1a" host="Dprima" collector="telegraf"
# Configuration for telegraf agent[agent] ## Default data collection interval for all inputs interval = "10s" ## Rounds collection interval to 'interval' ## ie, if interval="10s" then always collect on :00, :10, :20, etc. round_interval = true
## Telegraf will cache metric_buffer_limit metrics for each output, and will ## flush this buffer on a successful write. metric_buffer_limit = 10000 ## Flush the buffer whenever full, regardless of flush_interval. flush_buffer_when_full = true
## Collection jitter is used to jitter the collection by a random amount. ## Each plugin will sleep for a random time within jitter before collecting. ## This can be used to avoid many plugins querying things like sysfs at the ## same time, which can have a measurable effect on the system. collection_jitter = "0s"
## Default flushing interval for all outputs. You shouldn't set this below ## interval. Maximum flush_interval will be flush_interval + flush_jitter flush_interval = "60s" ## Jitter the flush interval by a random amount. This is primarily to avoid ## large write spikes for users running a large number of telegraf instances. ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s flush_jitter = "0s"
## Run telegraf in debug mode debug = false ## Run telegraf in quiet mode quiet = false ## Override default hostname, if empty use os.Hostname() hostname = "Dprima"
################################################################################ OUTPUTS ################################################################################
# Configuration for influxdb server to send metrics to[[outputs.influxdb]] urls = ["http://influxgraf:8086"] # required database = "telegraf" # required precision = "s" timeout = "5s"
[[outputs.influxdb]] urls = ["http://localhost:9092"] # required database = "kapacitor" # required precision = "s" retention_policy = "default" timeout = "5s"
#[[outputs.file]]# files=["/home/oracle/scripts/telegraf_debug.txt"]################################################################################ INPUTS ################################################################################
# Oracle metrics[[inputs.exec]] # Shell/commands array commands = ["/home/oracle/scripts/oracle_metrics.sh"] # Data format to consume. This can be "json", "influx" or "graphite" (line-protocol) # NOTE json only reads numerical measurements, strings and booleans are ignored. data_format = "influx" interval = "60s"
################################################################################ SERVICE INPUTS ################################################################################
启动 telegraf:
telegraf -config /etc/telegraf/telegraf.conf
查询 InfluxDB 数据库
[root@localhost log]# influxConnected to http://localhost:8086 version 1.7.4InfluxDB shell version: 1.7.4Enter an InfluxQL query> show databasesname: databasesname----_internaltelegraf> use telegrafUsing database telegraf> show measurementsname: measurementsname----oracle_fra_pctusedoracle_sysmetricoracle_tablespacesoracle_wait_classoracle_wait_event> select * from oracle_sysmetric limit 5name: oracle_sysmetrictime db delphix fqdn host metric_name metric_value---- -- ------- ---- ---- ----------- ------------1554277680000000000 172.16.14.251:1521/orcl none localhost.localdomain localhost.localdomain Active_Parallel_Sessions 01554277680000000000 172.16.14.251:1521/orcl none localhost.localdomain localhost.localdomain Active_Serial_Sessions 11554277680000000000 172.16.14.251:1521/orcl none localhost.localdomain localhost.localdomain Average_Active_Sessions 0.01380294950841554277680000000000 172.16.14.251:1521/orcl none localhost.localdomain localhost.localdomain Average_Synchronous_Single-Block_Read_Latency 0.58751554277680000000000 172.16.14.251:1521/orcl none localhost.localdomain localhost.localdomain Background_CPU_Usage_Per_Sec 0.104149308449>
Grafana 效果图如下:
通过结合 Python 脚本开发的方式,我们可以扩展部分 Telegraf 不支持的监控项,本文简单提供了一种思路。