Java Streams - 过滤先前过滤的值

时间:2022-04-16 12:03:12

I am experimenting with Java's Streams and trying to figure out what is possible as well as their strengths and weaknesses. Currently I am trying to implement the Sieve of Eratosthenes using a stream, but cannot seem to find a good way to loop through previously filtered values without storing them in a separate collection.

我正在尝试使用Java的Streams并试图找出可能的内容以及它们的优点和缺点。目前我正在尝试使用流来实现Eratosthenes的Sieve,但似乎无法找到循环使用先前过滤的值而不将其存储在单独集合中的好方法。

I am wanting to accomplish something like this:

我想要完成这样的事情:

IntStream myStream = IntStream.range(0,3);
myStream.filter(s -> {
    System.out.print("[filtering "+s+"] ");
    myStream.forEach(q -> System.out.print(q+", "));
    System.out.println();
    return true; //eventually respond to values observed on the line above
});

With a desired output of:

具有所需的输出:

[filtering 0] 
[filtering 1] 0, 
[filtering 2] 0, 1, 
[filtering 3] 0, 1, 2, 

Note that while filtering each new value all previously filtered values are observed. This would allow an easy implementation of the Sieve of Eratosthenes because I could filter out all non-prime values and for each new value check for divisibility against all numbers that have previously passed the prime filter.

请注意,在过滤每个新值时,会观察到所有先前过滤的值。这样可以轻松实现Eratosthenes的Sieve,因为我可以过滤掉所有非素数值,并为每个新值检查对所有先前通过素数过滤器的数字的可除性。

However, the above example gives me an error in NetBeans:

但是,上面的示例在NetBeans中给出了一个错误:

local variables referenced from a lambda expression must be final or effectively final

This appears to be because I am referencing myStream within a filter that is already acting on myStream. Is there any good way of working around this error (ie. making a final copy of the stream containing only the values that have been filtered so far), or is there a better approach to this sort of problem without using a separate collection to store values?

这似乎是因为我在已经作用于myStream的过滤器中引用myStream。是否有任何解决此错误的好方法(即,制作仅包含到目前为止已过滤的值的流的最终副本),或者是否有更好的方法解决此类问题而不使用单独的集合来存储值?

4 个解决方案

#1


3  

I managed to create an infinite Stream of prime numbers using the Sieve of Eratosthenes, but it actually does not use past values. Instead, it removes the multiples of a prime in the tail (in a lazy way, because the tail is infinite), like the original Sieve of Eratosthenes algorithm. For that, I used an Iterator as auxiliary (because the Stream can only be used once) and implemented a lazyConcat for streams.

我设法使用Eratosthenes的Sieve创建了一个无限的素数流,但它实际上并没有使用过去的值。相反,它删除了尾部中的素数的倍数(以懒惰的方式,因为尾部是无限的),就像原始的Eratosthenes算法的Sieve一样。为此,我使用Iterator作为辅助(因为Stream只能使用一次)并为流实现了lazyConcat。

class StreamUtils {
    public static IntStream fromIterator(PrimitiveIterator.OfInt it) {
        return StreamSupport.intStream(
                Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED), false);
    }

    public static IntStream lazyConcat(Supplier<IntStream> a, Supplier<IntStream> b) {
        return StreamSupport.intStream(new Spliterator.OfInt() {
            boolean beforeSplit = true;
            Spliterator.OfInt spliterator;

            @Override
            public OfInt trySplit() {
                return null;
            }

            @Override
            public long estimateSize() {
                return Long.MAX_VALUE;
            }

            @Override
            public int characteristics() {
                return Spliterator.ORDERED;
            }

            @Override
            public boolean tryAdvance(IntConsumer action) {
                boolean hasNext;
                if (spliterator == null) {
                    spliterator = a.get().spliterator();
                }
                hasNext = spliterator.tryAdvance(action);
                if (!hasNext && beforeSplit) {
                    beforeSplit = false;
                    spliterator = b.get().spliterator();
                    hasNext = spliterator.tryAdvance(action);
                }
                return hasNext;
            }
        }, false);
    }
}

My Sieve of Eratosthenes stream looks like this:

我的Eratosthenes流筛选器看起来像这样:

class Primes {
    public static IntStream stream() {
        return sieve(IntStream.iterate(2, n -> n + 1));
    }

    private static IntStream sieve(IntStream s) {
        PrimitiveIterator.OfInt it = s.iterator();
        int head = it.nextInt();
        IntStream tail = StreamUtils.fromIterator(it);
        return StreamUtils.lazyConcat(
                () -> IntStream.of(head),
                () -> sieve(tail.filter(n -> n % head != 0)));
    }
}

Then we can use it this way:

然后我们可以这样使用它:

System.out.println(Primes.stream().limit(20).boxed().collect(Collectors.toList()));

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71]

I think it was a good exercise, but it seems it is quite inefficient and not stack-friendly at all.

我认为这是一个很好的练习,但它似乎效率很低,而且根本不适合堆栈。

#2


2  

You can't process a Stream more than once, therefore calling myStream.forEach inside the filter method is not possible.

您不能多次处理Stream,因此无法在filter方法中调用myStream.forEach。

You could create a new IntStream inside the filter.

您可以在过滤器内创建一个新的IntStream。

Note that you will have to add some terminal operation to the outer Stream pipeline in order for it to be processed :

请注意,您必须向外部Stream管道添加一些终端操作才能进行处理:

IntStream myStream = IntStream.range(0,4);
myStream.filter(s -> {
    System.out.print("[filtering "+s+"] ");
    IntStream.range(0,s).forEach(q -> System.out.print(q+", "));
    System.out.println();
    return true; //eventually respond to values observed on the line above
}).forEach(i->{});

This produces :

这会产生:

[filtering 0] 
[filtering 1] 0, 
[filtering 2] 0, 1, 
[filtering 3] 0, 1, 2, 

#3


1  

It's debatable if a stream is the right tool here, but .filter() definitely isn't. Filters are supposed to be stateless, so the idea shouldn't come up in the first place. Based on the example in your answer a collector might be a feasible solution.

如果一个流是正确的工具,这是有争议的,但.filter()肯定不是。过滤器应该是无状态的,所以这个想法不应该首先出现。根据您的答案中的示例,收集器可能是一个可行的解决方案。

List<Integer> primes = IntStream.range(2, UPPER_BOUND)
  .collect(ArrayList::new,
          (list, number) -> { 
                for(int j=0; j < list.size(); j++) {
                    int prime = list.get(j);

                    if(prime > Math.sqrt(number)) {
                        break;
                    }

                    if(number % prime == 0) {
                        return;
                    }
                }

                list.add(number);
          },
          List::addAll);

ArrayList::new creates a new list which is then referenced by the consumer as list. The consumer is called for every element in the stream with number being the element.

ArrayList :: new创建一个新列表,然后由消费者作为列表引用。消费者被调用流中的每个元素,其中number是元素。

List::addAll would only be relevant for parallel streams which can't be used for this algorithm anyway.

List :: addAll只与无法用于此算法的并行流相关。

#4


0  

Other answers have suggested that the approach I had been trying is not possible, and that a separate collection must be used.

其他答案表明,我一直在尝试的方法是不可能的,必须使用单独的集合。

To provide a more complete answer, I wanted to provide a valid approach to this problem using streams and compare it against a more traditional approach.

为了提供更完整的答案,我想使用流提供一种有效的方法解决这个问题,并将其与更传统的方法进行比较。

Listing primes using streams (using the Sieve of Eratosthenes):

使用流列出素数(使用Eratosthenes的Sieve):

List<Integer> primes = new ArrayList<Integer>();

IntStream.iterate(2, i -> i + 1)
    .limit(UPPER_BOUND)
    .filter(i -> {
        for(int j=0; j<primes.size(); j++) {
            int prime = primes.get(j);

            if(prime > Math.sqrt(i)) {
                break;
            }

            if(i % prime == 0) {
                return false;
            }
        }
        return true;
    })
    .forEach(primes::add);

Traditional, equivalent, approach without using streams:

不使用流的传统等效方法:

List<Integer> primes = new ArrayList<Integer>();

for(int i=2; i < UPPER_BOUND; i++) {
    boolean isPrime = true;

    for(int j=0; j<primes.size(); j++) {
        int prime = primes.get(j);

        if(prime > Math.sqrt(i)) {
            break;
        }

        if(i % prime == 0) {
            isPrime = false;
            break;
        }
    }

    if(isPrime) {
        primes.add(i);
    }
}

Performance Comparison:

Some experimentation with each function consistently demonstrated that the traditional approach is actually faster than using streams in this case. The streams approach consistently took 1.5x longer to find all prime numbers under one million when compared to the traditional approach (average of 106ms and 70ms respectively on my machine).

对每个函数的一些实验一致地证明,传统方法实际上比在这种情况下使用流更快。与传统方法相比,流方法持续花费1.5倍的时间来查找低于100万的所有素数(在我的机器上平均分别为106毫秒和70毫秒)。

This difference in performance could likely be easily made up if the stream's .parallel() function could allow easy parallelization of the problem. However, parallelization is not easy in this case because ArrayList is not thread-safe, and will quickly result in errors and/or inaccurate results.

如果流的.parallel()函数可以轻松地并行化问题,那么这种性能差异可能很容易弥补。但是,在这种情况下并行化并不容易,因为ArrayList不是线程安全的,并且很快会导致错误和/或不准确的结果。

Conclusion:

Assuming the other answers are correct, filtering already-filtered data within a filter on that same stream is not possible in Java.

假设其他答案是正确的,则在Java中不可能在同一流上的过滤器内过滤已经过滤的数据。

Listing primes can be tackled using streams. However, pending a better solution than my own, it is currently better to stick with a traditional stream-less approach.

可以使用流来处理列表素数。然而,等待比我自己更好的解决方案,目前更好的是坚持使用传统的无流方法。

#1


3  

I managed to create an infinite Stream of prime numbers using the Sieve of Eratosthenes, but it actually does not use past values. Instead, it removes the multiples of a prime in the tail (in a lazy way, because the tail is infinite), like the original Sieve of Eratosthenes algorithm. For that, I used an Iterator as auxiliary (because the Stream can only be used once) and implemented a lazyConcat for streams.

我设法使用Eratosthenes的Sieve创建了一个无限的素数流,但它实际上并没有使用过去的值。相反,它删除了尾部中的素数的倍数(以懒惰的方式,因为尾部是无限的),就像原始的Eratosthenes算法的Sieve一样。为此,我使用Iterator作为辅助(因为Stream只能使用一次)并为流实现了lazyConcat。

class StreamUtils {
    public static IntStream fromIterator(PrimitiveIterator.OfInt it) {
        return StreamSupport.intStream(
                Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED), false);
    }

    public static IntStream lazyConcat(Supplier<IntStream> a, Supplier<IntStream> b) {
        return StreamSupport.intStream(new Spliterator.OfInt() {
            boolean beforeSplit = true;
            Spliterator.OfInt spliterator;

            @Override
            public OfInt trySplit() {
                return null;
            }

            @Override
            public long estimateSize() {
                return Long.MAX_VALUE;
            }

            @Override
            public int characteristics() {
                return Spliterator.ORDERED;
            }

            @Override
            public boolean tryAdvance(IntConsumer action) {
                boolean hasNext;
                if (spliterator == null) {
                    spliterator = a.get().spliterator();
                }
                hasNext = spliterator.tryAdvance(action);
                if (!hasNext && beforeSplit) {
                    beforeSplit = false;
                    spliterator = b.get().spliterator();
                    hasNext = spliterator.tryAdvance(action);
                }
                return hasNext;
            }
        }, false);
    }
}

My Sieve of Eratosthenes stream looks like this:

我的Eratosthenes流筛选器看起来像这样:

class Primes {
    public static IntStream stream() {
        return sieve(IntStream.iterate(2, n -> n + 1));
    }

    private static IntStream sieve(IntStream s) {
        PrimitiveIterator.OfInt it = s.iterator();
        int head = it.nextInt();
        IntStream tail = StreamUtils.fromIterator(it);
        return StreamUtils.lazyConcat(
                () -> IntStream.of(head),
                () -> sieve(tail.filter(n -> n % head != 0)));
    }
}

Then we can use it this way:

然后我们可以这样使用它:

System.out.println(Primes.stream().limit(20).boxed().collect(Collectors.toList()));

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71]

I think it was a good exercise, but it seems it is quite inefficient and not stack-friendly at all.

我认为这是一个很好的练习,但它似乎效率很低,而且根本不适合堆栈。

#2


2  

You can't process a Stream more than once, therefore calling myStream.forEach inside the filter method is not possible.

您不能多次处理Stream,因此无法在filter方法中调用myStream.forEach。

You could create a new IntStream inside the filter.

您可以在过滤器内创建一个新的IntStream。

Note that you will have to add some terminal operation to the outer Stream pipeline in order for it to be processed :

请注意,您必须向外部Stream管道添加一些终端操作才能进行处理:

IntStream myStream = IntStream.range(0,4);
myStream.filter(s -> {
    System.out.print("[filtering "+s+"] ");
    IntStream.range(0,s).forEach(q -> System.out.print(q+", "));
    System.out.println();
    return true; //eventually respond to values observed on the line above
}).forEach(i->{});

This produces :

这会产生:

[filtering 0] 
[filtering 1] 0, 
[filtering 2] 0, 1, 
[filtering 3] 0, 1, 2, 

#3


1  

It's debatable if a stream is the right tool here, but .filter() definitely isn't. Filters are supposed to be stateless, so the idea shouldn't come up in the first place. Based on the example in your answer a collector might be a feasible solution.

如果一个流是正确的工具,这是有争议的,但.filter()肯定不是。过滤器应该是无状态的,所以这个想法不应该首先出现。根据您的答案中的示例,收集器可能是一个可行的解决方案。

List<Integer> primes = IntStream.range(2, UPPER_BOUND)
  .collect(ArrayList::new,
          (list, number) -> { 
                for(int j=0; j < list.size(); j++) {
                    int prime = list.get(j);

                    if(prime > Math.sqrt(number)) {
                        break;
                    }

                    if(number % prime == 0) {
                        return;
                    }
                }

                list.add(number);
          },
          List::addAll);

ArrayList::new creates a new list which is then referenced by the consumer as list. The consumer is called for every element in the stream with number being the element.

ArrayList :: new创建一个新列表,然后由消费者作为列表引用。消费者被调用流中的每个元素,其中number是元素。

List::addAll would only be relevant for parallel streams which can't be used for this algorithm anyway.

List :: addAll只与无法用于此算法的并行流相关。

#4


0  

Other answers have suggested that the approach I had been trying is not possible, and that a separate collection must be used.

其他答案表明,我一直在尝试的方法是不可能的,必须使用单独的集合。

To provide a more complete answer, I wanted to provide a valid approach to this problem using streams and compare it against a more traditional approach.

为了提供更完整的答案,我想使用流提供一种有效的方法解决这个问题,并将其与更传统的方法进行比较。

Listing primes using streams (using the Sieve of Eratosthenes):

使用流列出素数(使用Eratosthenes的Sieve):

List<Integer> primes = new ArrayList<Integer>();

IntStream.iterate(2, i -> i + 1)
    .limit(UPPER_BOUND)
    .filter(i -> {
        for(int j=0; j<primes.size(); j++) {
            int prime = primes.get(j);

            if(prime > Math.sqrt(i)) {
                break;
            }

            if(i % prime == 0) {
                return false;
            }
        }
        return true;
    })
    .forEach(primes::add);

Traditional, equivalent, approach without using streams:

不使用流的传统等效方法:

List<Integer> primes = new ArrayList<Integer>();

for(int i=2; i < UPPER_BOUND; i++) {
    boolean isPrime = true;

    for(int j=0; j<primes.size(); j++) {
        int prime = primes.get(j);

        if(prime > Math.sqrt(i)) {
            break;
        }

        if(i % prime == 0) {
            isPrime = false;
            break;
        }
    }

    if(isPrime) {
        primes.add(i);
    }
}

Performance Comparison:

Some experimentation with each function consistently demonstrated that the traditional approach is actually faster than using streams in this case. The streams approach consistently took 1.5x longer to find all prime numbers under one million when compared to the traditional approach (average of 106ms and 70ms respectively on my machine).

对每个函数的一些实验一致地证明,传统方法实际上比在这种情况下使用流更快。与传统方法相比,流方法持续花费1.5倍的时间来查找低于100万的所有素数(在我的机器上平均分别为106毫秒和70毫秒)。

This difference in performance could likely be easily made up if the stream's .parallel() function could allow easy parallelization of the problem. However, parallelization is not easy in this case because ArrayList is not thread-safe, and will quickly result in errors and/or inaccurate results.

如果流的.parallel()函数可以轻松地并行化问题,那么这种性能差异可能很容易弥补。但是,在这种情况下并行化并不容易,因为ArrayList不是线程安全的,并且很快会导致错误和/或不准确的结果。

Conclusion:

Assuming the other answers are correct, filtering already-filtered data within a filter on that same stream is not possible in Java.

假设其他答案是正确的,则在Java中不可能在同一流上的过滤器内过滤已经过滤的数据。

Listing primes can be tackled using streams. However, pending a better solution than my own, it is currently better to stick with a traditional stream-less approach.

可以使用流来处理列表素数。然而,等待比我自己更好的解决方案,目前更好的是坚持使用传统的无流方法。