Oracle copy命令详解

2015-07-07 18:22:18 分类：数据库

概述

在数据表间复制数据是Oracle DBA经常面对的任务之一，Oracle为这一任务提供了多种解决方案，SQL*Plus Copy 命令便是其中之一。SQL*Plus Copy 命令通过SQL*Net在不同的表(同一服务器或是不同服务器)之间复制数据或移动数据。在实际运行环境中若能恰当地选择使用SQL*Plus Copy 命令可以有效地提高数据复制的性能。

Copy命令语法

在解释SQL*Plus Copy 命令的语法之前，我们必须要明确SQL*Plus Copy 命令不是一个方法或是函数，也不是一个SQL语句，它是一个命令(command)，当然这个命令必须在SQL*Plus里运行。

SQL*Plus Copy 命令的语法：

COPY {FROM database | TO database | FROM database TO database} {APPEND|CREATE|INSERT|REPLACE} destination_table [(column, column, column, ...)]

USING query

COPY – 这个不太需要解释，主命令，声明要执行COPY操作

From Database – 源数据库

To Database – 目标数据库

此处注意花括号中有三种可选的写法(以”|”隔开)，如果源数据表和目标数据表在同一个Schema中，则可以只写From Database，也可以只写To Database，当然还可以是第三种写法，把From Database和To Database写全。但如果源数据表和目标数据表不在同一个Schema中，则必须用第三种写法，即把From Database和To Database都写全。From Database和To Database的格式是一样的：USERID/PASSWORD@SID。

需要注意的是from或to必须指定其一,

否则出现："SP2-0495: FROM and TO clauses both missing; specify at least one"的提示,

若不指定from或者to,则默认其为SQLPLUS当前连接的数据库及模式

using子句表示使用SELECT语句指定了所需复制的数据,因为是查询语句所以数据可以来源于多个表的复杂查询结果

{APPEND|CREATE|INSERT|REPLACE} – 声明操作数据的方式，下面分别解释一下：

Append – 向已有的目标表中追加记录，如果目标表不存在，自动创建，这种情况下和Create等效。

Create – 创建目标表并且向其中追加记录，如果目标表已经存在，则会返回错误。

Insert – 向已有的目标表中插入记录，与Append不同的是，如果目标表不存在，不自动创建而是返回错误。

Replace – 用查询出来的数据覆盖已有的目标表中的数据，如果目标表不存在，自动创建。

见下面几个例子

同一数据库相同schema之间数据复制

create 方式，仅指定from子句

注，下面的示例中,符号"-"表示是连接符号，用于换行书写

scott@SYBO2SZ> copy from scott/tiger@sybo2sz -
> create tb_emp -
> using select * from emp;

同一数据库不同schema之间数据复制

下面使用了append方式，同时指定from及to子句

scott@SYBO2SZ> copy from scott/tiger@sybo2sz to goex_admin/xxx@sybo2sz -
> append tb_emp using select * from emp;

不同数据库之间的数据复制

不同数据库之间的复制一定要指定目的数据库连接字符串

scott@SYBO2SZ> copy from scott/tiger@sybo2sz to goex_admin/xxx@cnmmbo -
> append tb_emp using select * from emp;

不同oracle版本之间的数据复制

下面是oracle 10g到oracle 11g之间的数据复制

cott@SYBO2SZ> copy from scott/tiger@sybo2sz to scott/tiger@ora11g -
> create tb_emp using select * from emp where deptno=30;

COPY supports the following datatypes:

CHAR

DATE

LONG

NUMBER

VARCHAR2

Tips: if you execute copy command by any java connection client, you can copy more data type beyond above, such as timestamp,blob etc.

To enable the copying of data between Oracle and non-Oracle databases, NUMBER columns are changed to DECIMAL columns in the destination table. Hence, if you are copying between Oracle databases, a NUMBER column with no precision will be changed to a DECIMAL(38) column. When copying between Oracle databases, you should use SQL commands (CREATE TABLE AS and INSERT) or you should ensure that your columns have a precision specified.

The SQL*Plus SET LONG variable limits the length of LONG columns that you copy. If any LONG columns contain data longer than the value of LONG, COPY truncates the data.

Warning:

Including your password in plain text is a security risk. You can avoid this risk by omitting the password, and entering it only when the system prompts for it.

性能优化

arraysize

arraysize参数用于SQL*Plus 每一次fetch数据的行数，缺省值为15，有效值是1到5000。当扫描了arraysize 行后，停止扫描，返回数据到sqlplus客户端，然后继续扫描。

这个过程就是统计信息中的SQL*Net roundtrips to/from client。因为arraysize 默认是15行，那么就有一个问题，因为我们一个block 中的记录数一般都会超过15行，所以如果按照15行扫描一次，那么每次扫描要多扫描一个数据块，一个数据块也可能就会重复扫描多次。重复的扫描会增加consistent gets和 physical reads。增加physical reads，这个很好理解，扫描的越多，物理的可能性就越大。

consistent gets，这个是从undo里读的数量，Oracle 为了保证数据的一致性，当一个查询很长，在查询之后，数据块被修改，还未提交，再次查询时候，Oracle根据Undo 来构建CR块，这个CR块，可以理解成数据块在之前某个时间的状态。这样通过查询出来的数据就是一致的。那么如果重复扫描的块越多，需要构建的CR块就会越多，这样读Undo 的机会就会越多，consistent gets 就会越多。如果数据每次传到客户端有中断，那么这些数据会重新扫描，这样也就增加逻辑读，所以调整arraysize可以减少传的次数，减少逻辑读。

所以通过上面的说明，arraysize 参数如果过低，会影响如physical reads，consistent gets 还有SQL*Net roundtrips to/from client次数。

查看默认值

SYS@anqing2(rac2)> show arraysize

arraysize 15

手工修改arraysize

SYS@anqing2(rac2)> set arraysize 100
SYS@anqing2(rac2)> show arraysize

arraysize 100

永久保存arraysize 参数：

可以该参数保存到glogin.sql 或者login.sql 文件里，这样可以永久生效，不必每次都去set 指定。

修改glogin.sql

[oracle@rac2 admin]$ pwd

/u01/app/oracle/product/10.2.0/db_1/sqlplus/admin

[oracle@rac2 admin]$ ls

glogin.sql help iplus libisqlplus.def libsqlplus.def plustrce.sql pupbld.sql

在glogin.sql里添加：

set arraysize 5000

--重新登陆，查询

SYS@anqing2(rac2)> show arraysize

arraysize 5000

相关测试

查看table占用blocks 数量

DAVE@anqing2(rac2)> select owner,extents,segment_name,blocks from dba_segments where segment_name='DAVE' and owner='DAVE';

OWNER EXTENTS SEGMENT_NAME BLOCKS

---------- ---------- -------------------- ----------

DAVE 3 DAVE 24

从这个数据算一个，1000行数据24个数据块。平均下来每个数据块里有417条记录. 但事情情况可能不是这样.

表结构很简单

DAVE@anqing2(rac2)> desc dave;

Name Null? Type

----------------------------------------- -------- ----------------------------

ID NUMBER

NAME VARCHAR2(10)

查看每个数据块中有多少记录：

DAVE@anqing2(rac2)> select  prerid,count(rid) rid from (select  substr(rowid,1,15) prerid,rowid rid from dave) group by  prerid;

PRERID RID

------------------------------ ----------

AAANXzAAHAAAAAa 517

AAANXzAAHAAAAAf 517

AAANXzAAHAAAAAP 517

.......................................

AAANXzAAHAAAAAd 517

20 rows selected.

这里只有20行，即实际只使用了20个数据块，每个数据块的记录如上查询结果，因为表的记录很简单，所以每个块中的记录很多。

但是之前我们查询表占用了24个数据块，那么通过以下查询，可以理解为什么是24个blocks：

DAVE@anqing2(rac2)> select extent_id,block_id,blocks from dba_extents where owner='DAVE' and segment_name='DAVE';

EXTENT_ID BLOCK_ID BLOCKS

---------- ---------- ----------

0 9 8

1 17 8

2 25 8

因为这里分配了3个extents，每个extent 由8个blocks组成。如果按照默认的情况，arraysize 为15，那么每个块要查询的次数是：517/15 = 35次。那么这个就会带来更多的consistents gets 和 physical read。我们验证一下。

DAVE@anqing2(rac2)> set autot traceonly stat
DAVE@anqing2(rac2)> select * from dave where rownum<518;

因为一个数据块中有517条记录，所以这里只查询一个数据块的次数。

517 rows selected.

Statistics

----------------------------------------------------------

7 recursive calls

0 db block gets

87 consistent gets

0 physical reads

0 redo size

9354 bytes sent via SQL*Net to client

774 bytes received via SQL*Net from client

36 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

517 rows processed

注意这里的SQL*Net roundtrips to/from client，在之前，我们估计是按照arraysize 的默认值，读完这个数据块需要roundtrips 35次，这里实际用了36次。

我们设置下arraysize，在查询：

arraysize 最大5000

DAVE@anqing2(rac2)> set arraysize 5000
DAVE@anqing2(rac2)> select * from dave where rownum<518;

517 rows selected.

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

5 consistent gets

0 physical reads

0 redo size

5036 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

517 rows processed

比较：

consistent gets ：从87 变成了5.

SQL*Net roundtrips to/from client ：从36 变成了2。如果数据量越大，那么这种优化的性能提升就越明显，该参数值copy中命令中可以理解为每次fetch的行数，。

copycommit

该参数用于copy完多少array之后执行commit，如果该值为0,则表示所有数据复制完毕后再执行commit，如果该参数为非零值，则将在copy了arraysize*copycommit行后进行commit；

性能实验结果

实验数据：

记录数：5,082,500

数据量：504M

实验结果

方案------------------------执行时间(秒) ---------Undo(M) ------Redo(M)

Copy command -------------520.51----------------------0 ---------------- 592

Insert into…select …---- 631.64 ------------------345 -------------1720

Create Table…------------- 244.79 --------------------0 ----------------515

Create Table…as select…是最快的，而且生成的Undo和Redo信息最少，所以只要可能，请尽量使用这种方案。但这种方案有一定的限制，即目标表必须是不存在的，不能用它向已有的目标表中追加记录。

Insert into … select … 是最慢的，而且生成最多的Undo和Redo信息，对I/O的压力最大，优势在于大家对它比较熟悉，使用起来比较简单，适合于处理少量的数据，若要处理大量的数据，不推荐使用这种方案。

Copy Command可以处理Create Table不能处理的情况，即向已有的数据表中追加记录，相对于insert来说，效率更高一些，生成更少的Redo信息，不生成Undo信息，所以在执行大量的数据追加时，推荐使用Copy Command命令。

fetch size 参数

fetch size和arraysize类似都是客户端参数，需要在客户段来设置，arraysize是在sqlplus 中设置的，如果我们通过程序去连数据库，那么这个参数就是Fetch size。它的作用和arraysize 一样。 Fetch size 默认是10，一般改成50就ok了，太大会消耗内存。由于该参数不是本文重点，有兴趣的自己看下下面oracle的解释。

The JDBC fetch size gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed. For large queries that return a large number of objects you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria. Most JDBC drivers (including Oracle) default to a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Usually, a fetch size of one half or one quarter of the total expected result size is optimal. Note that if you are unsure of the result set size, incorrectly setting a fetch size too large or too small can decrease performance.

In this example application, I print out the default fetch size and then increase it to 50 using the setFetchSize(int) method of a Statement object. When you execute the query, the JDBC driver retrieves the first 50 rows from the database (or all rows if less than 50 rows satisfy the selection criteria). As you iterate over the first 50 rows, each time you call rset.next(), the JDBC driver returns a row from local memory – it does not need to retrieve the row from the database. When you try to access the fifty first row (assuming there are more than 50 rows that satisfy the selection criteria), the JDBC driver again goes to the database and retrieves another 50 rows. In this way, 100 rows are returned with only two database hits.

Alternatively, you can use the method setMaxRows() to set the limit for the maximum number of rows that any ResultSet can contain. If you specify a value of zero, then the hint is ignored: the JDBC driver returns one row at a time. The default value is zero.

如下连接是一个Jdbc 中配置Fetch size的示例。

http://www.idevelopment.info/data/Programming/java/jdbc/FetchSize.java

参考至：http://www.askmaclean.com/archives/%E4%BA%86%E8%A7%A3sqlplus%E4%B8%AD%E7%9A%84copy%E5%91%BD%E4%BB%A4.html

http://www.csdn123.com/html/blogs/20130502/8394.htm

http://blog.csdn.net/tianlesoftware/article/details/6579913

http://www.cnblogs.com/xd502djj/archive/2010/08/06/1794207.html

http://docs.oracle.com/cd/E11882_01/server.112/e16604/apb.htm#CHDEAEDE

转载：http://czmmiao.iteye.com/blog/1889438

Oracle copy命令详解

在线测试

参考手册