Java 中 List 集合分批处理的方式汇总

在项目中存在 List 集合数据量过大,需要对这个 List 集合进行分批处理,以下总结几种方式。

测试准备

模拟生成 List

public List<Integer> initList(int num) {
    List<Integer> list = new ArrayList<>();
    for (int i = 0; i < num; i++) {
        list.add(i);
    }
    return list;
}

模拟处理 List

public void print(List<Integer> list) {
    for (int i = 0; i < list.size(); i++) {
        System.out.print(list.get(i) + ",");
    }
    System.out.println();
}

批次最大数量

public int BATCH_NUM = 30;

List 的 subList 分批一

List<Integer> dataList = initList(100);
List<Integer> newList = new ArrayList<Integer>();
for (int i = 0; i < dataList.size(); i++) {// 分批次处理
    newList.add(dataList.get(i));
    if (BATCH_NUM == newList.size() || i == dataList.size() - 1) {
        print(newList);
        newList.clear();
    }
}

List 的 subList 分批二

List<Integer> dataList = initList(100);
Integer size = dataList.size();
if (BATCH_NUM < size) { //判断是否有必要分批
    int part = size / BATCH_NUM;//分批数
    for (int i = 0; i < part; i++) {
        List<Integer> listPage = dataList.subList(0, BATCH_NUM);
        print(listPage);    // 处理List
        dataList.subList(0, BATCH_NUM).clear(); //剔除已处理
    }
    if (!dataList.isEmpty()) {
        print(dataList); //表示最后剩下的数据
    }
} else {
    print(dataList);
}

Stream 流遍历操作

List<Integer> dataList = initList(100);
int limit = (dataList.size() + BATCH_NUM - 1) / BATCH_NUM; // 计算切分次数
List<List<Integer>> splitList = new ArrayList<>();
Stream.iterate(0, n -> n + 1).limit(limit).forEach(i -> {
    splitList.add(dataList.stream().skip(i * BATCH_NUM).limit(BATCH_NUM).collect(Collectors.toList()));
});
for (int i = 0; i < splitList.size(); i++) {
    print(splitList.get(i));
}

Stream 获取分割后的集合

List<Integer> dataList = initList(100);
int limit = (dataList.size() + BATCH_NUM - 1) / BATCH_NUM; // 计算切分次数
List<List<Integer>> splitList = Stream.iterate(0, n -> n + 1).limit(limit).parallel()
        .map(a -> dataList.stream().skip(a * BATCH_NUM).limit(BATCH_NUM).parallel().collect(Collectors.toList()))
        .collect(Collectors.toList());
for (int i = 0; i < splitList.size(); i++) {
    print(splitList.get(i));
}

Google 工具类 Guava

引入依赖:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>30.0-jre</version>
</dependency>

使用 google guava 对 List 进行分割:

List<Integer> dataList = initList(100);
List<List<Integer>> parts = Lists.partition(dataList, BATCH_NUM);
parts.stream().forEach(list -> {
    print(list);
});

Apache 工具类 collections

引入依赖:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-collections4</artifactId>
    <version>4.4</version>
</dependency>

使用 apache common collection 对 List 进行分割:

List<Integer> dataList = initList(100);
List<List<Integer>> parts = ListUtils.partition(dataList, BATCH_NUM);
parts.stream().forEach(list -> {
    print(list);
});