将Hadoop中的文件导入Web应用程序

时间:2022-04-24 16:05:09

I am new to Hadoop. Right now I am trying to do an application in eclipse in which I want to use data present in HDFS. If we want to connect to database with Java, we have JDBC connection. Like that, what do I need to do to connect to HDFS directly?

我是Hadoop的新手。现在我正在尝试在eclipse中执行一个应用程序,我想在其中使用HDFS中存在的数据。如果我们想用Java连接到数据库,我们有JDBC连接。像那样,我需要做什么才能直接连接到HDFS?

2 个解决方案

#1


In Hadoop, firstly, you would have to make sure that Hadoop is up and running. Apache Hadoop provides Java classes - FileSystem to access the files in HDFS from the Java application. One example is below, I am accessing /books/pg5000.txt using FileSystem and IOUtils.

在Hadoop中,首先,您必须确保Hadoop已启动并正在运行。 Apache Hadoop提供Java类 - FileSystem,用于从Java应用程序访问HDFS中的文件。下面是一个例子,我正在使用FileSystem和IOUtils访问/books/pg5000.txt。

import java.io.InputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;


public class FileSystemCat {

        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
            conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
          String uri = "/books/pg5000.txt";
        FileSystem fs = FileSystem.get(URI.create(uri), conf);
        InputStream in = null;
        try {
            in = fs.open(new Path(uri));
            IOUtils.copyBytes(in, System.out, 4096, false);
            } finally {
            IOUtils.closeStream(in);
            }
        }
}

#2


Another alternate solution to access the HDFS files as records (rows) as like any other database. You can configure Hive with Hadoop and start HiveServer2 and then utilize Thrift API in any application to access the data reside in HDFS as Tables.

作为记录(行)访问HDFS文件的另一种替代解决方案,就像任何其他数据库一样。您可以使用Hadoop配置Hive并启动HiveServer2,然后在任何应用程序中使用Thrift API来访问驻留在HDFS中的数据作为表。

Reference link: https://cwiki.apache.org/confluence/display/Hive/HiveClient

参考链接:https://cwiki.apache.org/confluence/display/Hive/HiveClient

Also HIVE ODBC Driver is available from several popular Hadoop distributors (Cloudera, Microsoft HDInsight, Hortonworks) as well.

此外,几个流行的Hadoop分销商(Cloudera,Microsoft HDInsight,Hortonworks)也提供HIVE ODBC驱动程序。

#1


In Hadoop, firstly, you would have to make sure that Hadoop is up and running. Apache Hadoop provides Java classes - FileSystem to access the files in HDFS from the Java application. One example is below, I am accessing /books/pg5000.txt using FileSystem and IOUtils.

在Hadoop中,首先,您必须确保Hadoop已启动并正在运行。 Apache Hadoop提供Java类 - FileSystem,用于从Java应用程序访问HDFS中的文件。下面是一个例子,我正在使用FileSystem和IOUtils访问/books/pg5000.txt。

import java.io.InputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;


public class FileSystemCat {

        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
            conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
          String uri = "/books/pg5000.txt";
        FileSystem fs = FileSystem.get(URI.create(uri), conf);
        InputStream in = null;
        try {
            in = fs.open(new Path(uri));
            IOUtils.copyBytes(in, System.out, 4096, false);
            } finally {
            IOUtils.closeStream(in);
            }
        }
}

#2


Another alternate solution to access the HDFS files as records (rows) as like any other database. You can configure Hive with Hadoop and start HiveServer2 and then utilize Thrift API in any application to access the data reside in HDFS as Tables.

作为记录(行)访问HDFS文件的另一种替代解决方案,就像任何其他数据库一样。您可以使用Hadoop配置Hive并启动HiveServer2,然后在任何应用程序中使用Thrift API来访问驻留在HDFS中的数据作为表。

Reference link: https://cwiki.apache.org/confluence/display/Hive/HiveClient

参考链接:https://cwiki.apache.org/confluence/display/Hive/HiveClient

Also HIVE ODBC Driver is available from several popular Hadoop distributors (Cloudera, Microsoft HDInsight, Hortonworks) as well.

此外,几个流行的Hadoop分销商(Cloudera,Microsoft HDInsight,Hortonworks)也提供HIVE ODBC驱动程序。