Java 获取文件编码格式

发表于 2023-02-14 分类于 Java ， JavaJar 阅读次数： Valine：

Java 在读取文件时容易出现乱码，究其原因是读取文件的编码跟文件本身的编码不一致，那在解析文件前如何获取文件的编码格式呢？

本文主要通过 google 开源的 juniversalchardet 来实现。

介绍

代码：https://code.google.com/archive/p/juniversalchardet/
GitHub：https://github.com/albfernandez/juniversalchardet

引入依赖

<dependency>
    <groupId>com.googlecode.juniversalchardet</groupId>
    <artifactId>juniversalchardet</artifactId>
    <version>1.0.3</version>
</dependency>

或者：

<dependency>
	<groupId>com.github.albfernandez</groupId>
	<artifactId>juniversalchardet</artifactId>
	<version>2.4.0</version>
</dependency>

代码封装

/**
  * 获得文件编码格式
  *
  * @param file
  * @return
  * @throws IOException
  */
 public static String getFileEncode(File file) throws IOException {
     FileInputStream in = new FileInputStream(file);
     String code = "utf-8";
     byte[] buf = new byte[4096];
     UniversalDetector detector = new UniversalDetector(null);

     // (2)
     int nread;
     while ((nread = in.read(buf)) > 0 && !detector.isDone()) {
         detector.handleData(buf, 0, nread);
     }
     // (3)
     detector.dataEnd();

     // (4)
     String encoding = detector.getDetectedCharset();
     if (StringUtils.isNotEmpty(encoding)) {
         code = encoding;
     }

     // (5)
     detector.reset();
     
     return code;
 }