I have a directory scan program with single thread. When scanning a file, I have to read attribute information and insert it to database.
我有一个单线程的目录扫描程序。扫描文件时,我必须读取属性信息并将其插入数据库。
I have 2 questions. In order to improve the performance:
我有两个问题。为了提高性能:
- How to scan with multi-thread? (scan sd card of Android phone)
- 如何用多线程扫描? (扫描Android手机的SD卡)
- How to optimize the batch insert to DB?
- 如何优化批量插入到DB?
Below is the code listing:
以下是代码清单:
void scan() {
File file = new File("/mnt/sdcard");
fun(file);
}
void fun(File file) {
if (!file.exists()) {
return;
}
if (!file.isDirectory()) {
// read attribute information and insert to db
return;
} else {
File[] arr = file.listFiles();
for (int i = 0; i < arr.length; i++) {
fun(arr[i]);
}
}
}
3 个解决方案
#1
1
Yes you can increase the performance using multithreads, while one is making disk I/O another one is doing network I/O, I will write a small example of this.
是的,您可以使用多线程提高性能,而另一个是制作磁盘I / O,另一个是进行网络I / O,我会写一个小例子。
here is the example, better then read before sleep :) with the constructor of the class ReadThenAll(5);
you create 5 threads to explore the folders and subfolders. have fun !!
这里是一个例子,更好的是在睡觉前阅读:)使用类ReadThenAll(5)的构造函数;您创建5个线程来浏览文件夹和子文件夹。玩的开心 !!
package foo;
import java.io.File;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
public class ReadThenAll {
// subfolders to explore
private final Queue exploreList = new ConcurrentLinkedQueue();
private long counter = 0;
public void count() {
counter++;
}
public static void main(String[] args) {
ReadThenAll me = new ReadThenAll(5);
me.scan("/tmp");
}
int[] threads;
public ReadThenAll(int numberOfThreads) {
threads = new int[numberOfThreads];
for (int i = 0; i < threads.length; i++) {
threads[i] = -1;
}
}
void scan(String fileName) {
final long start = System.currentTimeMillis();
// add the first one to the list
File file = new File(fileName);
exploreList.add(file);
for (int i = 0; i < threads.length; i++) {
FileExplorer explorer = new FileExplorer(i, this);
Thread t = new Thread(explorer);
t.start();
}
Thread waitToFinish = new Thread(new Runnable() {
@Override
public void run() {
boolean working = true;
while (working) {
working = false;
for (int i = 0; i < threads.length; i++) {
if (threads[i] == -1) {
working = true;
break;
}
}
try {
Thread.sleep(2);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
long elapsed = System.currentTimeMillis() - start;
System.out.println("total time (ms) : " + elapsed);
}
});
waitToFinish.start();
}
public void done(int id, int counter) {
threads[id] = counter;
}
class FileExplorer implements Runnable {
public int counter = 0;
public ReadThenAll owner;
private int id;
public FileExplorer(int id, ReadThenAll owner) {
this.id = id;
this.owner = owner;
}
@Override
public void run() {
while (!owner.exploreList.isEmpty()) {
// get the first from the list
try {
File file = (File) owner.exploreList.remove();
if (file.exists()) {
if (!file.isDirectory()) {
doThemagic(file);
} else {
// add the files to the queue
File[] arr = file.listFiles();
if (arr != null) {
for (int i = 0; i < arr.length; i++) {
owner.exploreList.add(arr[i]);
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
// silent kill :)
}
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
owner.done(id, counter);
System.out.println("total of files : " + counter);
}
private void doThemagic(File file) {
System.out.println(file.toString());
counter++;
}
}
}
#2
3
I don't think using multithread is going to help here. Scanning directory is IO bounded. Even if you use multiple thread, they are all going to wait for the IO operation to finish in a working thread. So at anytime there is only one thread scanning.
我不认为使用多线程会在这里提供帮助。扫描目录是IO限制的。即使您使用多个线程,它们也将等待IO操作在工作线程中完成。所以在任何时候只有一个线程扫描。
It will help unless the IO operation on your directory can be parallelized, e.g. multiple disks..
除非您的目录上的IO操作可以并行化,例如,它将有所帮助。多个磁盘..
#3
1
you can follow below one of the design
你可以按照下面的设计之一
1 - Create a queue in which supports multiple read and single write.
2- Get the number of cpu in the system in which you need to run the program because you can not run more threads simultaneously.
3- I/O is always blocking if you have 2 threads which are writing on Disk then they have to be serialized or you have multiple physical storage devices so you can access those.
4- The Queue you created in step 1 , you can write into the queue and simultaneously read.
5- Again database operation is blocking one that means your thread has to wait until it got the response from the db server rather than blocking the thread you can think of asynchronous processing and callback mechanism.
#1
1
Yes you can increase the performance using multithreads, while one is making disk I/O another one is doing network I/O, I will write a small example of this.
是的,您可以使用多线程提高性能,而另一个是制作磁盘I / O,另一个是进行网络I / O,我会写一个小例子。
here is the example, better then read before sleep :) with the constructor of the class ReadThenAll(5);
you create 5 threads to explore the folders and subfolders. have fun !!
这里是一个例子,更好的是在睡觉前阅读:)使用类ReadThenAll(5)的构造函数;您创建5个线程来浏览文件夹和子文件夹。玩的开心 !!
package foo;
import java.io.File;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
public class ReadThenAll {
// subfolders to explore
private final Queue exploreList = new ConcurrentLinkedQueue();
private long counter = 0;
public void count() {
counter++;
}
public static void main(String[] args) {
ReadThenAll me = new ReadThenAll(5);
me.scan("/tmp");
}
int[] threads;
public ReadThenAll(int numberOfThreads) {
threads = new int[numberOfThreads];
for (int i = 0; i < threads.length; i++) {
threads[i] = -1;
}
}
void scan(String fileName) {
final long start = System.currentTimeMillis();
// add the first one to the list
File file = new File(fileName);
exploreList.add(file);
for (int i = 0; i < threads.length; i++) {
FileExplorer explorer = new FileExplorer(i, this);
Thread t = new Thread(explorer);
t.start();
}
Thread waitToFinish = new Thread(new Runnable() {
@Override
public void run() {
boolean working = true;
while (working) {
working = false;
for (int i = 0; i < threads.length; i++) {
if (threads[i] == -1) {
working = true;
break;
}
}
try {
Thread.sleep(2);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
long elapsed = System.currentTimeMillis() - start;
System.out.println("total time (ms) : " + elapsed);
}
});
waitToFinish.start();
}
public void done(int id, int counter) {
threads[id] = counter;
}
class FileExplorer implements Runnable {
public int counter = 0;
public ReadThenAll owner;
private int id;
public FileExplorer(int id, ReadThenAll owner) {
this.id = id;
this.owner = owner;
}
@Override
public void run() {
while (!owner.exploreList.isEmpty()) {
// get the first from the list
try {
File file = (File) owner.exploreList.remove();
if (file.exists()) {
if (!file.isDirectory()) {
doThemagic(file);
} else {
// add the files to the queue
File[] arr = file.listFiles();
if (arr != null) {
for (int i = 0; i < arr.length; i++) {
owner.exploreList.add(arr[i]);
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
// silent kill :)
}
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
owner.done(id, counter);
System.out.println("total of files : " + counter);
}
private void doThemagic(File file) {
System.out.println(file.toString());
counter++;
}
}
}
#2
3
I don't think using multithread is going to help here. Scanning directory is IO bounded. Even if you use multiple thread, they are all going to wait for the IO operation to finish in a working thread. So at anytime there is only one thread scanning.
我不认为使用多线程会在这里提供帮助。扫描目录是IO限制的。即使您使用多个线程,它们也将等待IO操作在工作线程中完成。所以在任何时候只有一个线程扫描。
It will help unless the IO operation on your directory can be parallelized, e.g. multiple disks..
除非您的目录上的IO操作可以并行化,例如,它将有所帮助。多个磁盘..
#3
1
you can follow below one of the design
你可以按照下面的设计之一
1 - Create a queue in which supports multiple read and single write.
2- Get the number of cpu in the system in which you need to run the program because you can not run more threads simultaneously.
3- I/O is always blocking if you have 2 threads which are writing on Disk then they have to be serialized or you have multiple physical storage devices so you can access those.
4- The Queue you created in step 1 , you can write into the queue and simultaneously read.
5- Again database operation is blocking one that means your thread has to wait until it got the response from the db server rather than blocking the thread you can think of asynchronous processing and callback mechanism.