如何将Apache Spark与Spring MVC web应用程序集成到交互式用户会话中?

I am trying to build a Movie Recommender System Using Apache Spark MLlib. I have written a code for recommender in java and its working fine when run using spark-submit command.

我正在尝试使用Apache Spark MLlib构建一个电影推荐系统。我为java的推荐人写了一段代码，使用spark-submit命令运行时，代码运行良好。

My run command looks like this

我的运行命令是这样的。

bin/spark-submit --jars /opt/poc/spark-1.3.1-bin-hadoop2.6/mllib/spark-mllib_2.10-1.0.0.jar --class "com.recommender.MovieLensALSExtended" --master local[4] /home/sarvesh/Desktop/spark-test/recommender.jar /home/sarvesh/Desktop/spark-test/ml-latest-small/ratings.csv /home/sarvesh/Desktop/spark-test/ml-latest-small/movies.csv

bin / spark-submit - jar / opt / poc / spark-1.3.1-bin-hadoop2.6 / mllib / spark-mllib_2.10-1.0.0。jar——类“com.recommender。MovieLensALSExtended当地[4]/home/sarvesh/Desktop/spark-test/recommender”——的主人。jar /home/sarvesh/Desktop/spark-test/ml-latest-small /评级。csv /home/sarvesh/Desktop/spark-test/ml-latest-small / movies.csv

Now I want to use my recommender in real world scenario, as a web application in which I can query recommender to give some result.

现在我想在现实世界场景中使用我的推荐人，作为一个web应用程序，在这个应用程序中我可以查询推荐人以获得一些结果。

I want to build a Spring MVC web application which can interact with Apache Spark Context and give me results when asked.

我想构建一个Spring MVC web应用程序，它可以与Apache Spark上下文交互，并在被请求时给出结果。

My question is that how I can build an application which interacts with Apache Spark which is running on a cluster. So that when a request comes to controller it should take user query and fetch the same result as the spark-submit command outputs on console.

我的问题是如何构建与运行在集群上的Apache Spark交互的应用程序。因此，当请求到达控制器时，它应该接受用户查询并获取与控制台上的spark-submit命令输出相同的结果。

As far as I have searched, I found that we can use Spark SQL, integrate with JDBC. But I did not find any good example.

在我搜索的过程中，我发现我们可以使用Spark SQL，与JDBC集成。但我没有找到任何好的例子。

Thanks in advance.

提前谢谢。

5 个解决方案

#1

just pass the spark context and session as a bean in Spring

只需在Spring中作为bean传递spark上下文和会话

@Bean
public SparkConf sparkConf() {
    SparkConf sparkConf = new SparkConf()
            .setAppName(appName)
            .setSparkHome(sparkHome)
            .setMaster(masterUri);

    return sparkConf;
}

@Bean
public JavaSparkContext javaSparkContext() {
    return new JavaSparkContext(sparkConf());
}

@Bean
public SparkSession sparkSession() {
    return SparkSession
            .builder()
            .sparkContext(javaSparkContext().sc())
            .appName("Java Spark Ravi")
            .getOrCreate();
}

Similarly for xml based configuration

基于xml的配置也是如此

Fully working code with spring and spark is present here

这里有使用spring和spark的完整工作代码

https://github.com/ravi-code-ranjan/spark-spring-seed-project

#2

To interact with data model (call its invoke method?), you could build a rest service inside the driver. This service listens for requests, and invokes model's predict method with input from the request, and returns result.

要与数据模型交互(调用它的调用方法?)，您可以在驱动程序中构建一个rest服务。该服务侦听请求，并根据请求的输入调用模型的预测方法，并返回结果。

http4s (https://github.com/http4s/http4s) could be used for this purpose.

可以为此目的使用https://github.com/http4s/http4s。

Spark SQL is not relevant, as it is to handle data analytics (which you have done already), with sql capabilities.

Spark SQL不相关，因为它是用来处理数据分析(您已经完成了)，具有SQL功能。

Hope this helps.

希望这个有帮助。

#3

For this kind of situation was developed a REST interface for lunching and sharing the context of spark jobs

对于这种情况，开发了一个REST接口，用于午餐和共享spark作业的上下文

Give a look at the documentation here :

请看一下这里的文档:

https://github.com/spark-jobserver/spark-jobserver

#4

For isolating the user sessions and showing the results in an isolated manner, you may need to use queues with a binded user identity. Incase the the results takes time, with this identity you can show the respective results to the user.

为了隔离用户会话并以隔离的方式显示结果，您可能需要使用具有绑定用户标识的队列。如果结果需要时间，使用这个标识，您可以向用户显示相应的结果。

#5

I am a bit late , but this can help other users. If the requirement is to fetch data from Spark remotely, then you can consider using HiveThriftServer2. This server exposes the Spark SQL (cached and temporary tables) as JDBC/ODBC Database.

我有点晚了，但这可以帮助其他用户。如果需要从Spark远程获取数据，那么可以考虑使用HiveThriftServer2。此服务器将Spark SQL(缓存和临时表)公开为JDBC/ODBC数据库。

So, you can connect to Spark by using a JDBC/ODBC driver, and access data from the SQL tables.

因此，您可以使用JDBC/ODBC驱动程序连接Spark，并从SQL表访问数据。

To do the above:

做上面的:

Include this code in your Spark application:

在您的Spark应用程序中包含以下代码:

A. Create Spark conf with following properties:

A.创建具有以下属性的火花conf:
```
config.set("hive.server2.thrift.port","10015");
config.set("spark.sql.hive.thriftServer.singleSession", "true");
```
B.Then , pass the SQL context to the thrift server , and start it as below:

B。然后，将SQL上下文传递给节俭服务器，并将其启动如下:
```
 HiveThriftServer2.startWithContext(session.sqlContext());
```

This will start the Thrift server with the SQL context of your application. So it will be able to return data from the tables created in this context

这将使用应用程序的SQL上下文启动储蓄服务器。因此，它将能够从在此上下文中创建的表中返回数据

On the client side, you can use below code to connect to Spark SQL:

在客户端，您可以使用下面的代码连接到Spark SQL:

Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10015/default", "", "");

Statement stmt = con.createStatement();            
ResultSet rs = stmt.executeQuery("select count(1) from ABC");

#1