100万条的数据如何导入到数据库?!!

有一100万条左右的数据文件,现在要插入到我系统的数据库中来,如果插入失败,再插入时已经插入过的数据不能再插入,请问有什么办法比较快速的解决这问题?

23 个解决方案

#1

什么数据库，oracle的话，可以选择覆盖数据与否

#2

DB是oracle,多谢楼上的,能否祥谈下呢谢谢

#3

补充下我的数据文件不是*.dmp的,而是*.csv.

#4

建索引，记住，要在目标表里见索引，这样速度可以是没建索引的10倍，100万条数据大概要15分钟左右

#5

晕倒~
导入导出有索引的时候更慢
先导入再建索引的好

#6

学习.帮顶.

#7

focus

#8

在学习中,

#9

我的csv文件格式是这样的
第一行字段名
name,sex,age
wang,1,20
li,0,30
zhang,1,18
........

如何快速的插入数据库?跟索引无关啦

#10

用ORACLE的SQL LODER

#11

#12

[文章来源:]http://www.xiaomai.org/read.php?id=173

文/范生
　　SQL*LOADER是ORACLE的数据加载工具，通常用来将操作系统文件迁移到ORACLE数据库中。SQL*LOADER是大型数据
仓库选择使用的加载方法，因为它提供了最快速的途径（DIRECT，PARALLEL）。现在，我们抛开其理论不谈，用实例来使
您快速掌握SQL*LOADER的使用方法。
　　首先，我们认识一下SQL*LOADER。
　　在NT下，SQL*LOADER的命令为SQLLDR，在UNIX下一般为sqlldr/sqlload。
　　如执行：d:\oracle>sqlldr
SQL*Loader: Release 8.1.6.0.0 - Production on 星期二 1月 8 11:06:42 2002
(c) Copyright 1999 Oracle Corporation. All rights reserved.
用法: SQLLOAD 关键字 = 值 [,keyword=value,...]
有效的关键字:
userid -- ORACLE username/password
control -- Control file name
log -- Log file name
bad -- Bad file name
data -- Data file name
discard -- Discard file name
discardmax -- Number of discards to allow (全部默认)
skip -- Number of logical records to skip (默认0)
load -- Number of logical records to load (全部默认)
errors -- Number of errors to allow (默认50)
rows -- Number of rows in conventional path bind array or between direct p
ath data saves
（默认: 常规路径 64, 所有直接路径）
bindsize -- Size of conventional path bind array in bytes(默认65536)
silent -- Suppress messages during run (header,feedback,errors,discards,part
itions)
direct -- use direct path (默认FALSE)
parfile -- parameter file: name of file that contains parameter specification
s
parallel -- do parallel load (默认FALSE)
file -- File to allocate extents from
skip_unusable_indexes -- disallow/allow unusable indexes or index partitions(默
认FALSE)
skip_index_maintenance -- do not maintain indexes, mark affected indexes as unus
able(默认FALSE)
commit_discontinued -- commit loaded rows when load is discontinued(默认FALSE)
readsize -- Size of Read buffer (默认1048576)
PLEASE NOTE: 命令行参数可以由位置或关键字指定
。前者的例子是 'sqlload
scott/tiger foo';后者的例子是 'sqlload control=foo
userid=scott/tiger'.位置指定参数的时间必须早于
但不可迟于由关键字指定的参数。例如,
'SQLLOAD SCott/tiger control=foo logfile=log', 但
'不允许 sqlload scott/tiger control=foo log',即使允许
参数 'log' 的位置正确。
d:\oracle>
我们可以从中看到一些基本的帮助信息，这里，我用到的是中文的WIN2000　ADV　SERVER。
　　我们知道，SQL*LOADER只能导入纯文本，所以我们现在开始以实例来讲解其用法。
　　一、已存在数据源result.csv，欲倒入ORACLE中FANCY用户下。
　　　　result.csv内容：
　　1,默认 Web 站点,192.168.2.254:80:,RUNNING
　　2,other,192.168.2.254:80:test.com,STOPPED
　　3,third,192.168.2.254:81:thirdabc.com,RUNNING
　　从中，我们看出4列，分别以逗号分隔，为变长字符串。
　　二、制定控制文件result.ctl
result.ctl内容：
load data
infile 'result.csv'
into table resultxt
(resultid char terminated by ',',
website char terminated by ',',
ipport char terminated by ',',
status char terminated by whitespace)
　　说明：
　　infile　指数据源文件　这里我们省略了默认的　discardfile result.dsc badfile result.bad
　　into table resultxt 默认是INSERT，也可以into table resultxt APPEND为追加方式，或REPLACE
　　terminated by ','　指用逗号分隔
　　terminated by whitespace　结尾以空白分隔
　　三、此时我们执行加载：
D:\>sqlldr userid=fancy/testpass control=result.ctl log=resulthis.out
SQL*Loader: Release 8.1.6.0.0 - Production on 星期二 1月 8 10:25:42 2002
(c) Copyright 1999 Oracle Corporation. All rights reserved.
SQL*Loader-941: 在描述表RESULTXT时出现错误
ORA-04043: 对象 RESULTXT 不存在
　　提示出错，因为数据库没有对应的表。
　　四、在数据库建立表
　 create table resultxt
(resultid varchar2(500),
website varchar2(500),
ipport varchar2(500),
status varchar2(500))
/
　　五、重新执行加载
　　D:\>sqlldr userid=fancy/k1i7l6l8 control=result.ctl log=resulthis.out
SQL*Loader: Release 8.1.6.0.0 - Production on 星期二 1月 8 10:31:57 2002
(c) Copyright 1999 Oracle Corporation. All rights reserved.
达到提交点，逻辑记录计数2
达到提交点，逻辑记录计数3
　　已经成功！我们可以通过日志文件来分析其过程：resulthis.out内容如下：
SQL*Loader: Release 8.1.6.0.0 - Production on 星期二 1月 8 10:31:57 2002
(c) Copyright 1999 Oracle Corporation. All rights reserved.
控制文件: result.ctl
数据文件: result.csv
错误文件: result.bad
废弃文件: 未作指定
:
(可废弃所有记录)
装载数: ALL
跳过数: 0
允许的错误: 50
绑定数组: 64 行，最大 65536 字节
继续: 未作指定
所用路径: 常规
表RESULTXT
已载入从每个逻辑记录
插入选项对此表INSERT生效
列名位置长度中止包装数据类型
------------------------------ ---------- ----- ---- ---- ---------------------
RESULTID FIRST * , CHARACTER
WEBSITE NEXT * , CHARACTER
IPPORT NEXT * , CHARACTER
STATUS NEXT * WHT CHARACTER
表RESULTXT:
3 行载入成功
由于数据错误, 0 行没有载入。
由于所有 WHEN 子句失败, 0 行没有载入。
由于所有字段都为空的, 0 行没有载入。
为结合数组分配的空间: 65016字节（63行）
除绑定数组外的内存空间分配: 0字节
跳过的逻辑记录总数: 0
读取的逻辑记录总数: 3
拒绝的逻辑记录总数: 0
废弃的逻辑记录总数: 0
从星期二 1月 08 10:31:57 2002开始运行
在星期二 1月 08 10:32:00 2002处运行结束
经过时间为: 00: 00: 02.70
CPU 时间为: 00: 00: 00.10(可
　　六、并发操作
　　sqlldr userid=/ control=result1.ctl direct=true parallel=true
sqlldr userid=/ control=result2.ctl direct=true parallel=true
sqlldr userid=/ control=result2.ctl direct=true parallel=true
当加载大量数据时（大约超过10GB），最好抑制日志的产生：
　　SQL>ALTER TABLE RESULTXT nologging;
这样不产生REDO　LOG，可以提高效率。然后在CONTROL文件中load data上面加一行：unrecoverable
此选项必须要与DIRECT共同应用。
　　在并发操作时，ORACLE声称可以达到每小时处理100GB数据的能力！其实，估计能到1－10G就算不错了，开始可用结构
相同的文件，但只有少量数据，成功后开始加载大量数据，这样可以避免时间的浪费。

　　综上所述，SQL*LOADER的速成教程已经结束，条件分支等用法需要大家在日后工作中积累，我的观点是“用起来”，
然后再深造。兴趣是人类的第一教师，此话一点不错。始终让自己保持对技术的热情比埋头苦读更重要，技术的道路未必艰辛，
却是孤独的，能够忍受寂寞的人，才可能实现事业与人生的丰收。

SQLLOAD keyword=value [,keyword=value,...]

Valid Keywords:

userid -- ORACLE username/password
control -- Control file name
log -- Log file name
bad -- Bad file name
data -- Data file name
discard -- Discard file name
discardmax -- Number of discards to allow (Default all)
skip -- Number of logical records to skip (Default 0)
load -- Number of logical records to load (Default all)
errors -- Number of errors to allow (Default 50)
rows -- Number of rows in conventional path bind array or between direct s
(Default: Conventional path 64, Direct path all)
bindsize -- Size of conventional path bind array in bytes (Default 65536)
silent -- Suppress messages during run (header,feedback,errors,discards,par)
direct -- use direct path (Default FALSE)
parfile -- parameter file: name of file that contains parameter specificatios
parallel -- do parallel load (Default FALSE)
file -- File to allocate extents from
skip_unusable_indexes -- disallow/allow unusable indexes or index partitions ()
skip_index_maintenance -- do not maintain indexes, mark affected indexes as unu)
commit_discontinued -- commit loaded rows when load is discontinued (Default F)
readsize -- Size of Read buffer (Default 65535)

PLEASE NOTE: Command-line parameters may be specified either by
position or by keywords. An example of the former case is 'sqlload
scott/tiger foo'; an example of the latter is 'sqlload control=foo
userid=scott/tiger'. One may specify parameters by position before
but not after parameters specified by keywords. For example,
'sqlload scott/tiger control=foo logfile=log' is allowed, but
'sqlload scott/tiger control=foo log' is not, even though the
position of the parameter 'log' is correct.

#13

对,用SQL loader就可以,很方便,但是一定要保证数据的正确,
要不然查错是很辛苦的!

#14

SQL loader，studying..

#15

晕,sql loader不能满足要求啊,怪我没说清楚需求.
客户原先有一个系统,但也不抛弃.
现在该用我们的系统,原先系统部分数据就要导入我们系统中
而且不定期的要倒数据过来.
我们的系统他只有使用权没有管理权,更不可能接触到服务器让客户直接管理数据库.
所以说sql loader不能满足要求.

#16

晕,sql loader不能满足要求啊。

为什么不能满足要求，又不是让客户去使用sql loader命令来做事情。

你可以做一个批处理，放在你服务器上，让客户定期的数据上传上去，然后通过web来执行这个批处理。

这样的话，客户只要上传数据，点一下导入按纽，就可以把数据导进去。

#17

写日志，倒入失败的话，纪录当前记录的条数。再次启动的时候，从出错的条数开始

#18

javagems(月是故乡明) 说的方法不错,但是如果我要根据导入的数据又要作为另一导入的sql语句的条件怎么办呢?

#19

用sql loader能满足你的要求
我们公司原来做银行的卡分析系统跟你的要求差不多,我们根本就不可能操作银行的业务系统的数据库,而是由银行的数据库管理人员将约定好的数据导出成文本文件,通过过ftp上传到我们服务器的指定目录,我们的程序通过定时扫描ftp的目录通过sql loader将数据导入到我们的系统

#20

还不如自己写个程序来，还好控制些

#21

javagems(月是故乡明) 说的方法不错,但是如果我要根据导入的数据又要作为另一导入的sql语句的条件怎么办呢?

用一个临时表来存储你导入的数据，从这个临时表来获取你所需要的数据。

#22

暂时没有遇到这个问题。但我觉得很实用。
所以正在学习ing。

#23

暂时没有遇到这个问题。但我觉得很实用。
UP

#1