有个朋友跟我吐槽:他负责的运营后台要导出一份百万级用户数据,用POI导出直接OOM了。重启服务、调JVM内存,问题依旧。
后来换了EasyExcel,3行代码搞定,还不占内存。
先说说什么情况下会爆
POI一次性把整个Excel加载到内存,百万数据能占500M往上。服务器内存就那么多,OOM不意外。
EasyExcel不一样,用SAX模式解析XML,读3M文件内存不超过16M。差距就是这么大。
所以当你听到"百万级数据导出",第一反应应该是:别一次性怼进内存。
环境准备
依赖引入:
<dependency> <groupId>com.alibaba</groupId> <artifactId>easyexcel</artifactId> <version>3.3.3</version></dependency>
实体类定义:
@Datapublic class UserData { @ExcelProperty("用户ID") private Long userId; @ExcelProperty("用户名") private String username; @ExcelProperty("手机号") private String phone; @ExcelProperty("邮箱") private String email; @ExcelProperty(value = "注册时间", converter = LocalDateTimeConverter.class) private LocalDateTime registerTime; @ExcelProperty("状态") private String status;}
日期转换器(不写这个,日期读进来直接报错):
public class LocalDateTimeConverter implements Converter<LocalDateTime> { @Override public Class<?> supportJavaTypeKey() { return LocalDateTime.class; } @Override public CellDataTypeEnum supportExcelTypeKey() { return CellDataTypeEnum.STRING; } @Override public LocalDateTime convertToJavaData(ConverterContext context) { return LocalDateTime.parse(context.getCellData().getStringValue(), DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")); } @Override public CellData convertToExcelData(ConverterContext context) { return new CellData<>(context.getValue().format( DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))); }}
三种方案,按数据量选
方案1:直接导出(10万以内)
@Servicepublic class UserExportService { @Autowired private UserMapper userMapper; /** * 同步导出 - 适合小数据量 */ public void exportSync(HttpServletResponse response) { List<UserData> data = userMapper.selectAllUsers(); EasyExcel.write(response.getOutputStream(), UserData.class) .sheet("用户列表") .head(UserData.class) .doWrite(data); }}
就这几行,能跑。但数据大了肯定爆。
方案2:异步分页导出(10万~100万)
用监听器分批处理,避免OOM:
// 监听器 - 每批处理5000条@Componentpublic class UserDataListener extends AnalysisEventListener<UserData> { private static final int BATCH_SIZE = 5000; private final List<UserData> batchData = new ArrayList<>(BATCH_SIZE); @Override public void invoke(UserData data, AnalysisContext context) { batchData.add(data); if (batchData.size() >= BATCH_SIZE) { saveData(); batchData.clear(); } } @Override public void doAfterAllAnalysed(AnalysisContext context) { if (!batchData.isEmpty()) { saveData(); } } private void saveData() { // 批量保存逻辑 System.out.println("写入一批数据:" + batchData.size() + " 条"); } public List<UserData> getBatchData() { return batchData; }}
导出服务:
@Servicepublic class UserExportService { @Autowired private UserMapper userMapper; // 存储任务进度 private final Map<String, ExportProgress> progressMap = new ConcurrentHashMap<>(); /** * 异步导出 - 适合大数据量 * @param totalCount 总数据量 * @return taskId 用于查询进度 */ public String exportAsync(Long totalCount) { String taskId = UUID.randomUUID().toString(); progressMap.put(taskId, new ExportProgress(0, totalCount)); CompletableFuture.runAsync(() -> { int page = 1; int pageSize = 5000; List<UserData> data; do { // 分页查询 data = userMapper.selectUsersPage(page, pageSize); if (!data.isEmpty()) { // 写入逻辑(可以用队列+异步写入) writeData(data); // 更新进度 updateProgress(taskId, page * pageSize); } page++; } while (data.size() == pageSize); // 完成 finishExport(taskId); }); return taskId; } /** * 分批写入数据 */ private void writeData(List<UserData> data) { // 实际项目中可以写入临时文件,最后合并 // 或者直接写入Excel(追加模式) for (UserData item : data) { // 处理单条数据 } } /** * 更新导出进度 */ private void updateProgress(String taskId, long processedCount) { ExportProgress progress = progressMap.get(taskId); if (progress != null) { progress.setProcessedCount(processedCount); } } /** * 完成导出 */ private void finishExport(String taskId) { ExportProgress progress = progressMap.get(taskId); if (progress != null) { progress.setStatus("COMPLETED"); } } /** * 获取导出进度 */ public ExportProgress getProgress(String taskId) { return progressMap.get(taskId); }}/** * 导出进度 */@Data@AllArgsConstructorclass ExportProgress { private long processedCount; private long totalCount; private String status = "PROCESSING";}
方案3:多线程导出(100万+)
@Servicepublic class MultiThreadExportService { private static final int THREAD_COUNT = 4; private static final int BATCH_SIZE = 10000; @Autowired private UserMapper userMapper; /** * 多线程分段导出 - 适合100万以上数据 */ public String exportMultiThread(Long totalCount) { String taskId = UUID.randomUUID().toString(); String tempFile = System.getProperty("java.io.tmpdir") + "/export_" + taskId + ".xlsx"; ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT); CountDownLatch latch = new CountDownLatch(THREAD_COUNT); long perThreadCount = totalCount / THREAD_COUNT; for (int i = 0; i < THREAD_COUNT; i++) { final int threadIndex = i; final long start = i * perThreadCount; final long end = (i == THREAD_COUNT - 1) ? totalCount : (start + perThreadCount); executor.submit(() -> { try { // 每个线程写入不同的sheet List<UserData> data = userMapper.selectUsersByRange(start, end); ExcelWriter excelWriter = EasyExcel.write(tempFile, UserData.class) .withDefaultSheet(threadIndex) .build(); WriteSheet writeSheet = EasyExcel.writerSheet(threadIndex, "数据_" + threadIndex) .head(UserData.class) .build(); excelWriter.write(data, writeSheet); excelWriter.finish(); } finally { latch.countDown(); } }); } executor.shutdown(); return taskId; }}
导入也是同理
百万数据导入,同样用监听器分批处理:
@Servicepublic class UserImportService { @Autowired private UserMapper userMapper; /** * 导入用户数据 * @param file 上传的Excel文件 * @return 导入结果 */ public ImportResult importUsers(MultipartFile file) { List<String> errorMessages = new ArrayList<>(); AtomicInteger successCount = new AtomicInteger(0); AtomicInteger failCount = new AtomicInteger(0); EasyExcel.read(file.getInputStream(), UserData.class, new AnalysisEventListener<UserData>() { private final List<UserData> batchData = new ArrayList<>(500); @Override public void invoke(UserData data, AnalysisContext context) { batchData.add(data); if (batchData.size() >= 500) { saveBatch(batchData, successCount, failCount, errorMessages); batchData.clear(); } } @Override public void doAfterAllAnalysed(AnalysisContext context) { if (!batchData.isEmpty()) { saveBatch(batchData, successCount, failCount, errorMessages); } } }).sheet().doRead(); return new ImportResult(successCount.get(), failCount.get(), errorMessages); } /** * 批量保存数据 */ private void saveBatch(List<UserData> batchData, AtomicInteger successCount, AtomicInteger failCount, List<String> errors) { try { // 批量插入 userMapper.batchInsert(batchData); successCount.addAndGet(batchData.size()); } catch (Exception e) { // 批量插入失败,尝试逐条插入定位错误 for (UserData item : batchData) { try { userMapper.insert(item); successCount.incrementAndGet(); } catch (Exception ex) { failCount.incrementAndGet(); errors.add("导入失败: " + item.getUsername() + ", 原因: " + ex.getMessage()); } } } }}/** * 导入结果 */@Data@AllArgsConstructorclass ImportResult { private int successCount; private int failCount; private List<String> errors;}
容易踩的坑
日期格式转换报错
切记写转换器,不然日期读进来直接崩:
// 实体类要加 converter@ExcelProperty(value = "注册时间", converter = LocalDateTimeConverter.class)private LocalDateTime registerTime;
数字被科学计数法
用户ID、手机号这种长数字,读取时可能被转成科学计数法:
// 解决方案1:指定转换器@ExcelProperty(value = "用户ID", converter = LongConverter.class)private Long userId;// 解决方案2:用字符串接收@ExcelProperty("用户ID")private String userId;
导出时直接查全量OOM
// OOM写法List<User> all = userMapper.selectAll();// 正确做法:分页查询List<User> data = userMapper.selectPage(page, pageSize);
性能对比
说白了,EasyExcel能跑大数据的关键就一个:不要一口气把所有数据怼进内存。分批查、分批写,像流水线一样。
你们项目里导出是怎么处理的?有没有遇到过什么奇葩问题? 👇