获取流的最后一个元素的最有效方法


76

流没有last()方法:

Stream<T> stream;
T last = stream.last(); // No such method

获取最后一个元素(或对于空Stream为null)的最优雅和/或最有效的方法是什么?


4
如果您需要找到a的最后一个元素Stream,则可能需要重新考虑您的设计,并且确实要使用a StreamStreams不一定是有序的或有限的。如果您Stream是无序的,无限的或两者兼有,则最后一个元素没有意义。在我看来,a的目的Stream是在数据及其处理方式之间提供抽象层。这样,一个Stream本身不需要了解有关其元素的相对顺序的任何信息。在a中找到最后一个元素Stream是O(n)。如果您具有不同的数据结构,则可能是O(1)。
杰弗里

1
@jeff需求是真实的:这种情况是将商品粗略地添加到购物车中,每次添加都返回错误信息(某些商品组合无效),但是只有最后一次添加的错误信息(当所有商品都已添加且公平时)可以对购物车进行评估)是所需的信息。(是的,我们正在使用的API已损坏,无法修复)。
波西米亚风格

14
@BrianGoetz:无限流也没有明确定义count(),但Stream仍然具有count()方法。实际上,该论点适用于无限流上的任何非短路终端操作。
Jeffrey Bosboom 2014年

@BrianGoetz我认为流应该有last()方法。4月1日可能会有一项调查,该如何定义无限流。我会建议:“它永远不会返回,它至少会以100%的速度使用一个处理器内核。在并行流上,它必须以100%的速度使用所有内核。”
Vojta

如果列表包含自然顺序的对象或可以排序的对象,则可以使用中的max()方法stream()...max(Comparator...)
Erk

Answers:


123

做一个简单地返回当前值的归约:

Stream<T> stream;
T last = stream.reduce((a, b) -> b).orElse(null);

2
您会说这是优雅,高效还是两者兼而有之?
邓肯·琼斯

1
@Duncan我认为两者都是,但是我还不是Java 8的人,这需要在前一天工作-一个少年将流推到堆栈上然后将其弹出,我认为这样看起来更好,但是在那里可能是更简单的东西。
波西米亚风格

19
为了简单和优雅,此答案胜出。以及在一般情况下的合理效率;它将合理地并行化。对于一些知道其大小的流源,有一种更快的方法,但是在大多数情况下,不值得使用多余的代码来节省少量迭代。
Brian Goetz 2014年

1
@BrianGoetz这将如何并行化?使用并行流将无法预测最后一个值
benez

2
@BrianGoetz:它仍然是O(n),即使除以CPU内核数也是如此。由于流不知道约简函数的作用,因此仍必须对每个元素进行评估。
Holger

37

这在很大程度上取决于程序的性质Stream。请记住,“简单”并不一定意味着“有效”。如果您怀疑流很大,正在执行繁重的操作或事先知道其大小的源,则以下方法可能比简单的解决方案效率更高:

static <T> T getLast(Stream<T> stream) {
    Spliterator<T> sp=stream.spliterator();
    if(sp.hasCharacteristics(Spliterator.SIZED|Spliterator.SUBSIZED)) {
        for(;;) {
            Spliterator<T> part=sp.trySplit();
            if(part==null) break;
            if(sp.getExactSizeIfKnown()==0) {
                sp=part;
                break;
            }
        }
    }
    T value=null;
    for(Iterator<T> it=recursive(sp); it.hasNext(); )
        value=it.next();
    return value;
}

private static <T> Iterator<T> recursive(Spliterator<T> sp) {
    Spliterator<T> prev=sp.trySplit();
    if(prev==null) return Spliterators.iterator(sp);
    Iterator<T> it=recursive(sp);
    if(it!=null && it.hasNext()) return it;
    return recursive(prev);
}

您可以通过以下示例说明差异:

String s=getLast(
    IntStream.range(0, 10_000_000).mapToObj(i-> {
        System.out.println("potential heavy operation on "+i);
        return String.valueOf(i);
    }).parallel()
);
System.out.println(s);

它将打印:

potential heavy operation on 9999999
9999999

换句话说,它不对前9999999个元素执行操作,而仅对最后一个元素执行操作。


1
什么是点hasCharacteristics()块?它添加了recursive()方法尚未涵盖的什么值?后者已经导航到最后一个分割点。此外,recursive()由于永远无法退货,null因此您可以删除it != null支票。
吉利2015年

1
递归op可以处理所有情况,但只是一个后备,因为它的递归深度与(未过滤!)元素的数量匹配的情况更糟。理想的情况是SUBSIZED流可以保证非空的拆分半数,因此我们永远不需要回到左侧。请注意,在这种情况下,recursive实际上不会递归,因为trySplit已经证明会返回null
Holger 2015年

2
当然,代码可以用不同的方式编写,实际上是这样。我猜null-check源于早期版本,但是后来我发现,对于非SUBSIZED流,您必须处理可能的空拆分部分,即,您必须进行迭代以查明它是否具有值,因此将Spliterators.iterator(…)调用移至recursive方法中如果右侧为空,则可以备份到左侧。循环仍然是首选操作。
Holger 2015年

2
有趣的解决方案。请注意,根据当前的Stream API实现,您的流必须并行或直接连接到源拆分器。否则,即使基础源拆分器拆分,它也会由于某些原因拒绝拆分。另一方面,您不能盲目使用,parallel()因为这实际上可能会并行执行某些操作(例如排序),而这会意外地消耗更多的CPU内核。
塔吉尔·瓦列夫

2
@Tagir Valeev:是的,示例代码使用.parallel(),但实际上,它可以对sorted()或产生影响distinct()。我认为,其他任何中间操作都不会产生影响……
Holger 2015年

6

这只是Holger答案的一种重构,因为代码虽然很棒,但是却很难阅读/理解,特别是对于那些在Java之前不是C程序员的人。希望对于那些不熟悉分隔符,它们做什么或如何工作的人,我的重构示例类会更容易理解。

public class LastElementFinderExample {
    public static void main(String[] args){
        String s = getLast(
            LongStream.range(0, 10_000_000_000L).mapToObj(i-> {
                System.out.println("potential heavy operation on "+i);
                return String.valueOf(i);
            }).parallel()
        );
        System.out.println(s);
    }

    public static <T> T getLast(Stream<T> stream){
        Spliterator<T> sp = stream.spliterator();
        if(isSized(sp)) {
            sp = getLastSplit(sp);
        }
        return getIteratorLastValue(getLastIterator(sp));
    }

    private static boolean isSized(Spliterator<?> sp){
        return sp.hasCharacteristics(Spliterator.SIZED|Spliterator.SUBSIZED);
    }

    private static <T> Spliterator<T> getLastSplit(Spliterator<T> sp){
        return splitUntil(sp, s->s.getExactSizeIfKnown() == 0);
    }

    private static <T> Iterator<T> getLastIterator(Spliterator<T> sp) {
        return Spliterators.iterator(splitUntil(sp, null));
    }

    private static <T> T getIteratorLastValue(Iterator<T> it){
        T result = null;
        while (it.hasNext()){
            result = it.next();
        }
        return result;
    }

    private static <T> Spliterator<T> splitUntil(Spliterator<T> sp, Predicate<Spliterator<T>> condition){
        Spliterator<T> result = sp;
        for (Spliterator<T> part = sp.trySplit(); part != null; part = result.trySplit()){
            if (condition == null || condition.test(result)){
                result = part;
            }
        }
        return result;      
    }   
}


1

这是另一种解决方案(效率不高):

List<String> list = Arrays.asList("abc","ab","cc");
long count = list.stream().count();
list.stream().skip(count-1).findFirst().ifPresent(System.out::println);

有趣的是...您试运行了吗?因为没有substream方法,即使有,这也是不可行的,因为count它是终端操作。那么,这背后的故事是什么呢?
Lii 2014年

奇怪,我不知道我拥有什么jdk,但是它确实有一个子流。我看了官方的javadoc(docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html),您说对了,它没有出现在这里。
panagdu 2014年

6
当然,您将必须检查是否count==0首先输入Stream.skip不喜欢-1的输入。除此之外,问题并没有说明您可以获得Stream两次。也没有说Stream两次获得保证可以得到相同数量的元素。
Holger 2014年

1

带有“跳过”方法的并行未调整大小的流非常棘手,并且@Holger的实现给出了错误的答案。@Holger的实现也比较慢,因为它使用迭代器。

@Holger答案的优化:

public static <T> Optional<T> last(Stream<? extends T> stream) {
    Objects.requireNonNull(stream, "stream");

    Spliterator<? extends T> spliterator = stream.spliterator();
    Spliterator<? extends T> lastSpliterator = spliterator;

    // Note that this method does not work very well with:
    // unsized parallel streams when used with skip methods.
    // on that cases it will answer Optional.empty.

    // Find the last spliterator with estimate size
    // Meaningfull only on unsized parallel streams
    if(spliterator.estimateSize() == Long.MAX_VALUE) {
        for (Spliterator<? extends T> prev = spliterator.trySplit(); prev != null; prev = spliterator.trySplit()) {
            lastSpliterator = prev;
        }
    }

    // Find the last spliterator on sized streams
    // Meaningfull only on parallel streams (note that unsized was transformed in sized)
    for (Spliterator<? extends T> prev = lastSpliterator.trySplit(); prev != null; prev = lastSpliterator.trySplit()) {
        if (lastSpliterator.estimateSize() == 0) {
            lastSpliterator = prev;
            break;
        }
    }

    // Find the last element of the last spliterator
    // Parallel streams only performs operation on one element
    AtomicReference<T> last = new AtomicReference<>();
    lastSpliterator.forEachRemaining(last::set);

    return Optional.ofNullable(last.get());
}

使用junit 5进行单元测试:

@Test
@DisplayName("last sequential sized")
void last_sequential_sized() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed();
    stream = stream.skip(50_000).peek(num -> count.getAndIncrement());

    assertThat(Streams.last(stream)).hasValue(expected);
    assertThat(count).hasValue(9_950_000L);
}

@Test
@DisplayName("last sequential unsized")
void last_sequential_unsized() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed();
    stream = StreamSupport.stream(((Iterable<Long>) stream::iterator).spliterator(), stream.isParallel());
    stream = stream.skip(50_000).peek(num -> count.getAndIncrement());

    assertThat(Streams.last(stream)).hasValue(expected);
    assertThat(count).hasValue(9_950_000L);
}

@Test
@DisplayName("last parallel sized")
void last_parallel_sized() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed().parallel();
    stream = stream.skip(50_000).peek(num -> count.getAndIncrement());

    assertThat(Streams.last(stream)).hasValue(expected);
    assertThat(count).hasValue(1);
}

@Test
@DisplayName("getLast parallel unsized")
void last_parallel_unsized() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed().parallel();
    stream = StreamSupport.stream(((Iterable<Long>) stream::iterator).spliterator(), stream.isParallel());
    stream = stream.peek(num -> count.getAndIncrement());

    assertThat(Streams.last(stream)).hasValue(expected);
    assertThat(count).hasValue(1);
}

@Test
@DisplayName("last parallel unsized with skip")
void last_parallel_unsized_with_skip() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed().parallel();
    stream = StreamSupport.stream(((Iterable<Long>) stream::iterator).spliterator(), stream.isParallel());
    stream = stream.skip(50_000).peek(num -> count.getAndIncrement());

    // Unfortunately unsized parallel streams does not work very well with skip
    //assertThat(Streams.last(stream)).hasValue(expected);
    //assertThat(count).hasValue(1);

    // @Holger implementation gives wrong answer!!
    //assertThat(Streams.getLast(stream)).hasValue(9_950_000L); //!!!
    //assertThat(count).hasValue(1);

    // This is also not a very good answer better
    assertThat(Streams.last(stream)).isEmpty();
    assertThat(count).hasValue(0);
}

支持这两种情况的唯一解决方案是避免在未调整大小的并行流上检测到最后一个分隔符。结果是该解决方案将对所有元素执行操作,但始终会给出正确的答案。

请注意,在顺序流中,它将始终对所有元素执行操作。

public static <T> Optional<T> last(Stream<? extends T> stream) {
    Objects.requireNonNull(stream, "stream");

    Spliterator<? extends T> spliterator = stream.spliterator();

    // Find the last spliterator with estimate size (sized parallel streams)
    if(spliterator.hasCharacteristics(Spliterator.SIZED|Spliterator.SUBSIZED)) {
        // Find the last spliterator on sized streams (parallel streams)
        for (Spliterator<? extends T> prev = spliterator.trySplit(); prev != null; prev = spliterator.trySplit()) {
            if (spliterator.getExactSizeIfKnown() == 0) {
                spliterator = prev;
                break;
            }
        }
    }

    // Find the last element of the spliterator
    //AtomicReference<T> last = new AtomicReference<>();
    //spliterator.forEachRemaining(last::set);

    //return Optional.ofNullable(last.get());

    // A better one that supports native parallel streams
    return (Optional<T>) StreamSupport.stream(spliterator, stream.isParallel())
            .reduce((a, b) -> b);
}

关于该实现的单元测试,前三个测试完全相同(顺序和大小并行)。未大小的并行测试在这里:

@Test
@DisplayName("last parallel unsized")
void last_parallel_unsized() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed().parallel();
    stream = StreamSupport.stream(((Iterable<Long>) stream::iterator).spliterator(), stream.isParallel());
    stream = stream.peek(num -> count.getAndIncrement());

    assertThat(Streams.last(stream)).hasValue(expected);
    assertThat(count).hasValue(10_000_000L);
}

@Test
@DisplayName("last parallel unsized with skip")
void last_parallel_unsized_with_skip() throws Exception {
    long expected = 10_000_000L;
    AtomicLong count = new AtomicLong();
    Stream<Long> stream = LongStream.rangeClosed(1, expected).boxed().parallel();
    stream = StreamSupport.stream(((Iterable<Long>) stream::iterator).spliterator(), stream.isParallel());
    stream = stream.skip(50_000).peek(num -> count.getAndIncrement());

    assertThat(Streams.last(stream)).hasValue(expected);
    assertThat(count).hasValue(9_950_000L);
}

请注意,单元测试使用assertj库以获得更好的流畅度。
Tet

2
问题是您正在做StreamSupport.stream(((Iterable<Long>) stream::iterator).spliterator(), stream.isParallel())Iterable绕过根本没有任何特征的绕行路,换句话说,创建了无序流。因此,结果与parallel 或using无关skip,而与事实“ last”对于无序流没有意义,因此任何元素都是有效结果。
Holger

1

我们需要last在生产中使用Stream-我仍然不确定我们确实做到了,但是团队中的各个团队成员都说我们这样做是因为各种“原因”。我最终写了这样的东西:

 private static class Holder<T> implements Consumer<T> {

    T t = null;
    // needed to null elements that could be valid
    boolean set = false;

    @Override
    public void accept(T t) {
        this.t = t;
        set = true;
    }
}

/**
 * when a Stream is SUBSIZED, it means that all children (direct or not) are also SIZED and SUBSIZED;
 * meaning we know their size "always" no matter how many splits are there from the initial one.
 * <p>
 * when a Stream is SIZED, it means that we know it's current size, but nothing about it's "children",
 * a Set for example.
 */
private static <T> Optional<Optional<T>> last(Stream<T> stream) {

    Spliterator<T> suffix = stream.spliterator();
    // nothing left to do here
    if (suffix.getExactSizeIfKnown() == 0) {
        return Optional.empty();
    }

    return Optional.of(Optional.ofNullable(compute(suffix, new Holder())));
}


private static <T> T compute(Spliterator<T> sp, Holder holder) {

    Spliterator<T> s;
    while (true) {
        Spliterator<T> prefix = sp.trySplit();
        // we can't split any further
        // BUT don't look at: prefix.getExactSizeIfKnown() == 0 because this
        // does not mean that suffix can't be split even more further down
        if (prefix == null) {
            s = sp;
            break;
        }

        // if prefix is known to have no elements, just drop it and continue with suffix
        if (prefix.getExactSizeIfKnown() == 0) {
            continue;
        }

        // if suffix has no elements, try to split prefix further
        if (sp.getExactSizeIfKnown() == 0) {
            sp = prefix;
        }

        // after a split, a stream that is not SUBSIZED can give birth to a spliterator that is
        if (sp.hasCharacteristics(Spliterator.SUBSIZED)) {
            return compute(sp, holder);
        } else {
            // if we don't know the known size of suffix or prefix, just try walk them individually
            // starting from suffix and see if we find our "last" there
            T suffixResult = compute(sp, holder);
            if (!holder.set) {
                return compute(prefix, holder);
            }
            return suffixResult;
        }


    }

    s.forEachRemaining(holder::accept);
    // we control this, so that Holder::t is only T
    return (T) holder.t;

}

以及它的一些用法:

    Stream<Integer> st = Stream.concat(Stream.of(1, 2), Stream.empty());
    System.out.println(2 == last(st).get().get());

    st = Stream.concat(Stream.empty(), Stream.of(1, 2));
    System.out.println(2 == last(st).get().get());

    st = Stream.concat(Stream.iterate(0, i -> i + 1), Stream.of(1, 2, 3));
    System.out.println(3 == last(st).get().get());

    st = Stream.concat(Stream.iterate(0, i -> i + 1).limit(0), Stream.iterate(5, i -> i + 1).limit(3));
    System.out.println(7 == last(st).get().get());

    st = Stream.concat(Stream.iterate(5, i -> i + 1).limit(3), Stream.iterate(0, i -> i + 1).limit(0));
    System.out.println(7 == last(st).get().get());

    String s = last(
        IntStream.range(0, 10_000_000).mapToObj(i -> {
            System.out.println("potential heavy operation on " + i);
            return String.valueOf(i);
        }).parallel()
    ).get().get();

    System.out.println(s.equalsIgnoreCase("9999999"));

    st = Stream.empty();
    System.out.println(last(st).isEmpty());

    st = Stream.of(1, 2, 3, 4, null);
    System.out.println(last(st).get().isEmpty());

    st = Stream.of((Integer) null);
    System.out.println(last(st).isPresent());

    IntStream is = IntStream.range(0, 4).filter(i -> i != 3);
    System.out.println(last(is.boxed()));

首先是返回类型Optional<Optional<T>>-我同意,它看起来很奇怪。如果第一个Optional为空,则意味着Stream中没有元素;如果第二个Optional为空,则意味着最后一个元素实际上为null,即:(Stream.of(1, 2, 3, null)guavaStreams::findLast,在这种情况下会引发Exception的不同)。

我承认我的主要灵感来自霍尔格(Holger)对我的问题和番石榴的类似回答Streams::findLast

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.