Java8 收集Stream中的元素

所谓收集Stream中的元素，就是将Stream转换成普通的数据集合，如Array，list，Set等。

1. 转换成Array

Sream::toArray()方法，传入一个数组构造器，可以指定数组类型，否则生成一个Object[]数组。

Stream<Integer> nums = Stream.of(1, 2, 3);
//Object[] objectArray = nums.toArray();
Integer[] intArray = nums.toArray(Integer[]::new);

2. 转换成Iterator

iterator是Java8之前用于遍历集合元素的，调用Stream::iterator()方法，可生成Stream中元素的迭代器iterator。

Stream<String> words = Stream.of("city", "country", "world");
Iterator<String> iterator = words.iterator();
while(iterator.hasNext()) {
  System.out.println(iterator.next());
}

3. 转换成任意集合对象

Stream::collect(Supplier, BiConsumer, BiConsumer)方法，接收三个参数：

一个能创建目标类型实例的方法，例如HashSet的构造函数。
添加元素到目标中的方法，例如add方法。
将两个对象整合到一起种的方法。

Stream转换成Map（转换成Set和List类似）。[吐槽] Eclipse对复杂类型的推导有问题，我用的Eclipse版本是最新的，Version: Mars.2 (4.5.2)，但是下面的代码，Eclipse无法推导出lambda表达式中的map和otherMap的类型。如我注释掉的代码，在Eclipse中，要强制类型转换，否则编译不过。NetBeans是可以推导出下面Lambda表达式参数的类型。

Stream<String> words = Stream.of("city", "country", "world");
HashMap<String, Integer> resultMap = words.collect(() -> {
  return new HashMap();
}, (map, word) -> {
  map.put(word, word.length());
  //Map.class.cast(map).put(word, word.length());
}, (map, otherMap) -> {
  map.putAll(otherMap);
  //Map.class.cast(map).putAll((Map) otherMap);
});

目标对象不一定是集合，它可以是一个StringBuilder对象或者一个自己构造的数据集合对象。Stream转换成StringBuilder：

Stream<String> words = Stream.of("city", "country", "world");
StringBuilder builder = words.collect(StringBuilder::new, (strBuilder, s) -> {
  StringBuilder.class.cast(strBuilder).append(s);
}, (strBuilder, otherStrBuilder) -> {
  StringBuilder.class.cast(strBuilder).append(otherStrBuilder.toString());
});
System.out.println(builder.toString());

4. 转换成Set和List

向collect方法传递三个参数，生成集合对象，有点麻烦。实际中，我们不需要这么做，因为Collectors为常用的收集类型提供了各个工厂方法。

生成list、Set：

stream.collect(Collectors.toList());
stream.collect(Collectors.toSet());

默认情况下，生成的是ArrayList和HashSet。要指定List或Set类型，使用如下方式：

stream.collect(Collectors.toCollection(LinkedList::new));
stream.collect(Collectors.toCollection(TreeSet::new));

5. 字符串拼接

类似Guava中的Joiner。当Stream中的元素都是String时，可以拼接成一个字符串，并指定分隔符，拼接后字符串的前缀、后缀。

Stream<String> strs = Stream.of("Hello", "the", "stream");
// No delimiter
String hail1 = strs.collect(Collectors.joining());
// Result: Hellothestream

// Specify a delimiter
String hail2 = strs.collect(Collectors.joining("， "));
// Result: Hello， the， stream

// Specify a delimiter, prefix and suffix.
String hail3 = strs.collect(Collectors.joining(", ", "__", "**"));
// Result: __Hello, the, stream**

6. 普通Stream使用SummaryStatistics

对于原子类型的Stream，调用summaryStatistics()方法，会生成对应的IntSummaryStatistics, DoubleSummaryStatistics, LongSummaryStatictics。

普通的Stream，没有summaryStatistics()方法，我们可以使用Collectors中的summarizingInt, summarizingDouble和summarizingLong，生成对应的SummaryStatistics对象。

Stream<Beef> beefs = ...;
IntSummaryStatistics beefPriceSummary = 
    beefs.collect(Collectors.summarizingInt(Beef::getPrice));
beefPriceSummary.getAverage();
beefPriceSummary.getSum();
beefPriceSummary.getMax();
beefPriceSummary.getMin();
beefPriceSummary.getCount();

7. 转换成Map

Collectors.toMap方法：

参数1：指定map的key。
参数2：指定map的value。
[重载] 参数3：当多个元素有相同的key时，如何处理。默认抛出异常“ java.lang.IllegalStateException”。
[重载] 参数4：生成什么样的Map。默认生成HashMap。

Stream<Beef> beefs = Stream.of(new Beef(1, 12.5), new Beef(1, 12), new Beef(2, 8.8));

// Specify map key and map value.
Map<Integer, Double> map1 = beefs.collect(Collectors.toMap(
		Beef::getId,
		Beef::getPrice));

// Handle two values have same key.
Map<Integer, Double> map2 = beefs.collect(Collectors.toMap(
		Beef::getId,
		Beef::getPrice,
		(existingValue, newVaule) -> existingValue));

// Specify using TreeMap other than default HashMap.
Map<Integer, Double> map3 = beefs.collect(Collectors.toMap(
		Beef::getId,
		Beef::getPrice,
		(existingValue, newValue) -> { throw new IllegalStateException(); },
		TreeMap::new));

上面三个toMap方法，对应有三个toConcurrentMap方法，用于并行Stream，提高效率。Stream只提供了并行生成Map的方法，没有并行生成List和Set的方法。Function.identity()指元素本身，相当于(element) –> element;

Stream words = Stream.of("Sun", "Earth", "Moon");
ConcurrentHashMap<String, Integer> wordMap = 
    words.parallel().collect(Collectors.toConcurrentMap(
        Function.identity(),
        String::length,
        (existingValue, newValue) –> newValue,
        ConcurrentHashMap::new));

8. 对Stream分组

下面所有例子，Stream中的数据都是Locale对象，所以先认识下Locale。

java.util.Locale是JDK自带的一个数据类，构造函数第一个参数是Language Code；第二个参数是Country Code；第三个参数不常用，作为扩展。例如中国大陆，Language Code是ZH，Country Code是CN，new Locale(“ZH”, “CN”)就构造一个属于中国大陆的Locale对象。Locale.getAvailableLocales()返回Java所支持的所有Locale数组。

说明：Language Code 和 Country Code是由国际标准化组织的ISO 3166制定的，方便国际上的交流。台湾是new Locale(“ZH”, “TW”)，香港是new Locale(“ZH”, “HK”)。

8.1 根据国家分组

groupingBy(Function)方法，会产生一个以Function返回值作为Key的Map。下面的例子，生成Map的KEY是Country Code。

Stream<Locale> locales = Stream.of(Locale.getAvailableLocales());
Map<String, List<Locale>> countryToLocales = 
    locales.collect(Collectors.groupingBy(Locale::getCountry));
System.out.println(countryToLocales.get("US"));
// Output: [en_US, es_US]

P.S. groupingByConcurrent()方法，用于并发Stream，提高处理速度。

8.2 根据英语，分成两组：说英语和不说英语

使用Collectors.partitionBy(Predicate)，Predicate函数返回Boolea值，True为一组，False为一组。

Map<Boolean, List<Locale>> enToLocales = locales.collect(Collectors.partitioningBy(
    locale -> locale.getLanguage().equals("en")));
System.out.print(enToLocales.get(true).size());
// Output: 12

8.3 分组的同时，处理分组的数据

上面的例子，分组后的Value是一个List，包含该组的所有元素。可以提供一个转换器，更灵活地控制分组后的Value。为了代码简洁，下面的例子，假设都静态引入了Java.util.stream.Collectors.*。

使用Set收集分组后的Value

默认的Value集合是List，有时Set更好，可以去掉重复。

Map<String, Set<Locale>> countryToLocaleSet = 
    locales.collect(groupingBy(Locale::getCountry, toSet()));

统计每组的个数

counting方法，统计每组元素的个数。下面的例子：统计每个国家，说几种语言。

Map<String, Long> countryToLocaleCounts = 
    locales.collect(groupingBy(Locale::getCountry, counting()));

对每组的元素或元素的属性求和

summing(Int | Long | Double)方法，接收一个Function，会对Function的返回值求和。下面的例子：统计每个State的人口总和。

Map<State, Integer> stateToPopulation = 
    cities.collect(groupingBy(City::getState, summingInt(City::getPopulation)));

找每组的最大值和最小值

maxBy 和 minBy方法，接收一个比较器，找出最大元素和最小元素。下面的例子：找出每个州人口最多的城市。

Map<State, Optional<City>> stateToLargeCity = 
    cities.collect(groupingBy(City::getState, maxBy(Comparator.comparing(City::getPopulation))));

对分组数据，进行mapping

mapping(mapper, downstream)，先将元素进行map（转换），然后将转换后的数据进行收集处理。下面的例子：统计每个State，名字最长的城市。

Map<State, Optional<String>> stateToLongestCityName = 
    cities.collect(
        groupingBy(City::getState, 
        mapping(City::getName, 
            maxBy(Comparator.comparing(String::length)))));

上面统计每个国家所有语言集合的例子，使用mapping是一个更好的解决方案：

Map<String, Set<String>> countryToLanguages = 
    locales.collect(groupingBy(Locale::getCountry, mapping(l -> l.getLanguage(), toSet())));

对每个分组，生成SummaryStatistics

summaringInt, summaringLong, summaringDouble方法，分别生成IntSummaryStatistics, LongSummaryStatistics和DoubleSummaryStatistic。

在groupingBy时，直接对元素或元素某个属性生成SummaryStatistics。下面的例子：生成每个State的各个城市人口数的SummaryStatistics。

Map<State, IntSummaryStatistics> stateToCityPopulationSummary = 
    cities.collect(groupingBy(City::getState, summarizingInt(city -> city.getPopulation())));

也可以在mapping的时候生成，下面的例子和上面的结果一样：

Map<State, IntSummaryStatistics> stateToCityPopulationSummary = 
    cities.collect(groupingBy(City::getState, 
                   mapping(City::getPopulation, summarizingInt(p -> p))));

对分组数据，进行reducing

reducing(binaryOperator)：对元素本身聚合
reducing(identity, binaryOperator)：指定第一个数据，进行聚合
reducing(indentity, mapper, binaryOperator)：指定第一个数据，指定map函数，对map后的数据聚合

下面的例子：根据City所在的State分组，将每组城市名字用逗号（，）拼接成一个字符串。

Map<State, String> stateToCityNames = 
    cities.collect(groupingBy(
        City::getState, reducing("", City::getName, (c1, c2) -> c1 + "," + c2)));

downstream收集器总结

使用downstream收集器可以产生非常复杂的表达式，只有在使用groupingBy或者partitioningBy产生“downstream”map时，才使用它们，其它情况下，直接对Stream进行操作便可。