I have a class as this:
我有一个班级:
public class Test {
private static String name;
public static String getName() {
return name;
}
public static void setName(String name) {
Test.name = name;
}
public static void print() {
System.out.println(name);
}
}
in my Spark driver, I'm setting the name like this and calling the print()
command:
在我的Spark驱动程序中,我正在设置这样的名称并调用print()命令:
public final class TestDriver{
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf().setAppName("TestApp");
// ...
// ...
Test.setName("TestName")
Test.print();
// ...
}
}
However, I'm getting a NullPointerException
. How do I pass a value to the global variable and use it?
但是,我得到一个NullPointerException。如何将值传递给全局变量并使用它?
2 个解决方案
#1
20
Ok, there is basically 2 ways to take a value known to the master to the executors:
好的,基本上有两种方法可以将master知道的值带给执行者:
- Put the value inside a closure to be serialized to the executors to perform a task. This is the most common one and very simple/elegant. Sample and doc here.
- 将值放入要序列化的闭包中以执行任务。这是最常见的,非常简单/优雅。样本和文档在这里。
- Create a broadcast variable with the data. This is good for immutable data of a big size, so you want to guarantee it is send only once. Also good if the same data is used over and over. Sample and doc here.
- 使用数据创建广播变量。这适用于大尺寸的不可变数据,因此您希望保证它只发送一次。如果反复使用相同的数据也很好。样本和文档在这里。
No need to use static variables in either case. But, if you DO want to have static values available on your executor VMs, you need to do one of these:
在任何一种情况下都不需要使用静态变量。但是,如果您希望在执行程序VM上提供静态值,则需要执行以下操作之一:
- If the values are fixed or the configuration is available on the executor nodes (lives inside the jar, etc), then you can have a lazy val, guaranteeing initialization only once.
- 如果值是固定的或配置在执行程序节点上可用(生活在jar中等),那么你可以有一个惰性val,保证只初始化一次。
- You can call mapPartitions() with code that uses one of the 2 options above, then store the values on your static variable/object. mapPartitions is guaranteed to run only once for each partition (much better than once per line) and is good for this kind of thing (initializing DB connections, etc).
- 您可以使用上述2个选项之一的代码调用mapPartitions(),然后将值存储在静态变量/对象上。 mapPartitions保证每个分区只运行一次(比每行一次好多了),并且适合这种事情(初始化DB连接等)。
Hope this helps!
希望这可以帮助!
P.S: As for you exception: I just don't see it on that code sample, my bet is that it is occurring elsewhere.
P.S:至于你的异常:我只是没有在代码示例中看到它,我敢打赌它正在其他地方发生。
Edit for extra clarification: The lazy val solution is simply Scala, no Spark involved...
编辑以获得额外的说明:懒惰的val解决方案只是Scala,没有涉及Spark ...
object MyStaticObject
{
lazy val MyStaticValue = {
// Call a database, read a file included in the Jar, do expensive initialization computation, etc
4
}
}
Since each Executor corresponds to a JVM, once the classes are loaded MyStaticObject
will be initialized. The lazy
keyword guarantees that the MyStaticValue
variable will only be initialized the first time it is actually requested, and hold its value ever since.
由于每个Executor都对应一个JVM,因此一旦加载了类,MyStaticObject将被初始化。 lazy关键字保证MyStaticValue变量仅在第一次实际请求时初始化,并保持其值。
#2
6
The copy of your class in your driver process isn't the copy in your executors. They aren't in the same ClassLoader
, or even the same JVM, or even on the same machine. Setting a static variable on the driver does nothing to the other copies, hence you find it null remotely.
驱动程序进程中类的副本不是执行程序中的副本。它们不在同一个ClassLoader中,甚至不在同一个JVM中,甚至不在同一台机器上。在驱动程序上设置静态变量对其他副本没有任何作用,因此您可以远程找到它。
#1
20
Ok, there is basically 2 ways to take a value known to the master to the executors:
好的,基本上有两种方法可以将master知道的值带给执行者:
- Put the value inside a closure to be serialized to the executors to perform a task. This is the most common one and very simple/elegant. Sample and doc here.
- 将值放入要序列化的闭包中以执行任务。这是最常见的,非常简单/优雅。样本和文档在这里。
- Create a broadcast variable with the data. This is good for immutable data of a big size, so you want to guarantee it is send only once. Also good if the same data is used over and over. Sample and doc here.
- 使用数据创建广播变量。这适用于大尺寸的不可变数据,因此您希望保证它只发送一次。如果反复使用相同的数据也很好。样本和文档在这里。
No need to use static variables in either case. But, if you DO want to have static values available on your executor VMs, you need to do one of these:
在任何一种情况下都不需要使用静态变量。但是,如果您希望在执行程序VM上提供静态值,则需要执行以下操作之一:
- If the values are fixed or the configuration is available on the executor nodes (lives inside the jar, etc), then you can have a lazy val, guaranteeing initialization only once.
- 如果值是固定的或配置在执行程序节点上可用(生活在jar中等),那么你可以有一个惰性val,保证只初始化一次。
- You can call mapPartitions() with code that uses one of the 2 options above, then store the values on your static variable/object. mapPartitions is guaranteed to run only once for each partition (much better than once per line) and is good for this kind of thing (initializing DB connections, etc).
- 您可以使用上述2个选项之一的代码调用mapPartitions(),然后将值存储在静态变量/对象上。 mapPartitions保证每个分区只运行一次(比每行一次好多了),并且适合这种事情(初始化DB连接等)。
Hope this helps!
希望这可以帮助!
P.S: As for you exception: I just don't see it on that code sample, my bet is that it is occurring elsewhere.
P.S:至于你的异常:我只是没有在代码示例中看到它,我敢打赌它正在其他地方发生。
Edit for extra clarification: The lazy val solution is simply Scala, no Spark involved...
编辑以获得额外的说明:懒惰的val解决方案只是Scala,没有涉及Spark ...
object MyStaticObject
{
lazy val MyStaticValue = {
// Call a database, read a file included in the Jar, do expensive initialization computation, etc
4
}
}
Since each Executor corresponds to a JVM, once the classes are loaded MyStaticObject
will be initialized. The lazy
keyword guarantees that the MyStaticValue
variable will only be initialized the first time it is actually requested, and hold its value ever since.
由于每个Executor都对应一个JVM,因此一旦加载了类,MyStaticObject将被初始化。 lazy关键字保证MyStaticValue变量仅在第一次实际请求时初始化,并保持其值。
#2
6
The copy of your class in your driver process isn't the copy in your executors. They aren't in the same ClassLoader
, or even the same JVM, or even on the same machine. Setting a static variable on the driver does nothing to the other copies, hence you find it null remotely.
驱动程序进程中类的副本不是执行程序中的副本。它们不在同一个ClassLoader中,甚至不在同一个JVM中,甚至不在同一台机器上。在驱动程序上设置静态变量对其他副本没有任何作用,因此您可以远程找到它。