blocks|key|592466|text|您可能希望有一个受支持的编码列表。对于每个文件，依次尝试每个编码，可能从UTF-8开始。每次捕获MalformedInputException时，都尝试下一个编码。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|592467|entityMap^0|1C|N|0^^$0|@$1|2|3|4|5|6|7|H|8|@$9|I|A|J|B|C]]|D|@]|E|$]]|$1|F|3|-4|5|6|7|K|8|@]|D|@]|E|$]]]|G|$]]

You probably want to have a list of supported encodings. For each file, try each encoding in turn, maybe starting with UTF-8. Every time you catch the <code>MalformedInputException</code>, try the next encoding.

blocks|key|475215|text|我也遇到了错误消息异常，|type|unstyled|depth|inlineStyleRanges|entityRanges|data|475216|java.nio.charset.MalformedInputException:+Input+length+=+1
at+java.nio.charset.CoderResult.throwException(Unknown+Source)
at+sun.nio.cs.StreamEncoder.implWrite(Unknown+Source)
at+sun.nio.cs.StreamEncoder.write(Unknown+Source)
at+java.io.OutputStreamWriter.write(Unknown+Source)
at+java.io.BufferedWriter.flushBuffer(Unknown+Source)
at+java.io.BufferedWriter.write(Unknown+Source)
at+java.io.Writer.write(Unknown+Source)|code-block|syntax|javascript|475217|并发现在尝试使用|475218|BufferedWriter+writer+=+Files.newBufferedWriter(Paths.get(filePath));|475219|从类中的泛型类型中写入字符串"orazg+54“。|475220|//key+is+of+generic+type+<Key+extends+Comparable<Key>>
writer.write(item.getKey()+%2B+"\t"+%2B+item.getValue()+%2B+"\n");|475221|此字符串的长度为9，包含具有以下代码点的字符：|475222|111+114+97+122+103+9+53+52+10|475223|但是，如果类中的BufferedWriter被替换为：|475224|FileOutputStream+outputStream+=+new+FileOutputStream(filePath);
BufferedWriter+writer+=+new+BufferedWriter(new+OutputStreamWriter(outputStream));|475225|它可以毫无例外地成功地写入此字符串。此外，如果我从字符中编写相同的字符串创建，它仍然工作正常。|475226|String+string+=+new+String(new+char[]+{111,+114,+97,+122,+103,+9,+53,+52,+10});
BufferedWriter+writer+=+Files.newBufferedWriter(Paths.get("a.txt"));
writer.write(string);
writer.close();|475227|以前，在使用第一个BufferedWriter编写任何字符串时，我从未遇到过任何异常。从BufferedWriter创建的java.nio.file.Files.newBufferedWriter(路径，选项)出现了一个奇怪的错误。|475228|entityMap^0|0|0|0|0|0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|14|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|15|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|16|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|17|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|18|8|@]|9|@]|A|$]]|$1|M|3|N|5|D|7|19|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|1A|8|@]|9|@]|A|$]]|$1|Q|3|R|5|6|7|1B|8|@]|9|@]|A|$]]|$1|S|3|T|5|6|7|1C|8|@]|9|@]|A|$]]|$1|U|3|V|5|D|7|1D|8|@]|9|@]|A|$E|F]]|$1|W|3|X|5|6|7|1E|8|@]|9|@]|A|$]]|$1|Y|3|Z|5|D|7|1F|8|@]|9|@]|A|$E|F]]|$1|10|3|11|5|6|7|1G|8|@]|9|@]|A|$]]|$1|12|3|-4|5|6|7|1H|8|@]|9|@]|A|$]]]|13|$]]

I also encountered this exception with error message, 

<pre><code>java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(Unknown Source)
at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at java.io.BufferedWriter.flushBuffer(Unknown Source)
at java.io.BufferedWriter.write(Unknown Source)
at java.io.Writer.write(Unknown Source)
</code></pre>

and found that some strange bug occurs when trying to use

<pre><code>BufferedWriter writer = Files.newBufferedWriter(Paths.get(filePath));
</code></pre>

to write a String "orazg 54" cast from a generic type in a class. 

<pre><code>//key is of generic type &lt;Key extends Comparable&lt;Key&gt;&gt;
writer.write(item.getKey() + "\t" + item.getValue() + "\n");
</code></pre>

This String is of length 9 containing chars with the following code points:

111
114
97
122
103
9
53
52
10

However, if the BufferedWriter in the class is replaced with: 

<pre><code>FileOutputStream outputStream = new FileOutputStream(filePath);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(outputStream));
</code></pre>

it can successfully write this String without exceptions. In addition, if I write the same String create from the characters it still works OK.

<pre><code>String string = new String(new char[] {111, 114, 97, 122, 103, 9, 53, 52, 10});
BufferedWriter writer = Files.newBufferedWriter(Paths.get("a.txt"));
writer.write(string);
writer.close();
</code></pre>

Previously I have never encountered any Exception when using the first BufferedWriter to write any Strings. It's a strange bug that occurs to BufferedWriter created from java.nio.file.Files.newBufferedWriter(path, options)

blocks|key|593940|text|问题是Files.newBufferedReader(Path+path)是这样实现的：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|593941|public+static+BufferedReader+newBufferedReader(Path+path)+throws+IOException+{
++++return+newBufferedReader(path,+StandardCharsets.UTF_8);
}|code-block|syntax|javascript|593942|因此，基本上没有必要指定UTF-8，除非您希望在代码中具有描述性。如果您想尝试一个“更广泛的”字符集，您可以尝试使用StandardCharsets.UTF_16，但您不能100%25肯定得到每个可能的字符无论如何。|593943|entityMap^0|3|Y|0|0|C|5|1M|N|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|S|8|@$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|M|3|-4|5|6|7|X|8|@]|D|@]|E|$]]]|N|$]]

Well, the problem is that <code>Files.newBufferedReader(Path path)</code> is implemented like this :

<pre><code>public static BufferedReader newBufferedReader(Path path) throws IOException {
 return newBufferedReader(path, StandardCharsets.UTF_8);
}
</code></pre>

so basically there is no point in specifying <code>UTF-8</code> unless you want to be descriptive in your code. 
If you want to try a "broader" charset you could try with <code>StandardCharsets.UTF_16</code>, but you can't be 100% sure to get every possible character anyway.

blocks|key|592496|text|你可以尝试像这样的东西，或者只是复制和过去的片段。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|592497|boolean+exception+=+true;
Charset+charset+=+Charset.defaultCharset();+//Try+the+default+one+first.++++++++
int+index+=+0;

while(exception)+{
++++try+{
++++++++lines+=+Files.readAllLines(f.toPath(),charset);
++++++++++for+(String+line:+lines)+{
++++++++++++++line=+line.trim();
++++++++++++++if(line.contains(keyword))
++++++++++++++++++values.add(line);
++++++++++++++}+++++++++++
++++++++//No+exception,+just+returns
++++++++exception+=+false;+
++++}+catch+(IOException+e)+{
++++++++exception+=+true;
++++++++//Try+the+next+charset
++++++++if(index<Charset.availableCharsets().values().size())
++++++++++++charset+=+(Charset)+Charset.availableCharsets().values().toArray()[index];
++++++++index+%2B%2B;
++++}
}|code-block|syntax|javascript|592498|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

you can try something like this, or just copy and past below piece.

<pre><code>boolean exception = true;
Charset charset = Charset.defaultCharset(); //Try the default one first. 
int index = 0;

while(exception) {
 try {
 lines = Files.readAllLines(f.toPath(),charset);
 for (String line: lines) {
 line= line.trim();
 if(line.contains(keyword))
 values.add(line);
 } 
 //No exception, just returns
 exception = false; 
 } catch (IOException e) {
 exception = true;
 //Try the next charset
 if(index&lt;Charset.availableCharsets().values().size())
 charset = (Charset) Charset.availableCharsets().values().toArray()[index];
 index ++;
 }
}
</code></pre>

blocks|key|475333|text|从BufferedReader创建Files.newBufferedReader|type|unstyled|depth|inlineStyleRanges|entityRanges|data|475334|Files.newBufferedReader(Paths.get("a.txt"),+StandardCharsets.UTF_8);|code-block|syntax|javascript|475335|在运行应用程序时，它可能引发以下异常：|475336|java.nio.charset.MalformedInputException:+Input+length+=+1|475337|但|475338|new+BufferedReader(new+InputStreamReader(new+FileInputStream("a.txt"),"utf-8"));|475339|效果很好。|475340|不同的是，前者使用CharsetDecoder默认操作。|475341|错误输入和不可映射字符错误的默认操作是报告错误。
|blockquote|offset|length|style|BOLD|475342|475343|后者使用替换操作。|475344|cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)|475345|entityMap^0|0|0|0|0|0|0|0|0|J|2|0|0|0|0^^$0|@$1|2|3|4|5|6|7|16|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|17|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|18|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|19|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|1A|8|@]|9|@]|A|$]]|$1|M|3|N|5|D|7|1B|8|@]|9|@]|A|$E|F]]|$1|O|3|P|5|6|7|1C|8|@]|9|@]|A|$]]|$1|Q|3|R|5|6|7|1D|8|@]|9|@]|A|$]]|$1|S|3|T|5|U|7|1E|8|@$V|1F|W|1G|X|Y]]|9|@]|A|$]]|$1|Z|3|-4|5|6|7|1H|8|@]|9|@]|A|$]]|$1|10|3|11|5|6|7|1I|8|@]|9|@]|A|$]]|$1|12|3|13|5|D|7|1J|8|@]|9|@]|A|$E|F]]|$1|14|3|-4|5|6|7|1K|8|@]|9|@]|A|$]]]|15|$]]

Creating BufferedReader from Files.newBufferedReader 

<pre><code>Files.newBufferedReader(Paths.get("a.txt"), StandardCharsets.UTF_8);
</code></pre>

when running the application it may throw the following exception:

<pre><code>java.nio.charset.MalformedInputException: Input length = 1
</code></pre>

But

<pre><code>new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));
</code></pre>

works well.

The different is that, the former uses CharsetDecoder default action.

<blockquote>
 The default action for malformed-input and unmappable-character errors is to report them.
</blockquote>

while the latter uses the REPLACE action.

<pre><code>cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)
</code></pre>

blocks|key|592530|text|我写了下面的文章，根据可用的字符集，打印出一个标准结果列表。请注意，它还会告诉您，如果您正在排除导致问题的字符，则从基于0的行号中将出现哪些行失败。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|592531|public+static+void+testCharset(String+fileName)+{
++++SortedMap<String,+Charset>+charsets+=+Charset.availableCharsets();
++++for+(String+k+:+charsets.keySet())+{
++++++++int+line+=+0;
++++++++boolean+success+=+true;
++++++++try+(BufferedReader+b+=+Files.newBufferedReader(Paths.get(fileName),charsets.get(k)))+{
++++++++++++while+(b.ready())+{
++++++++++++++++b.readLine();
++++++++++++++++line%2B%2B;
++++++++++++}
++++++++}+catch+(IOException+e)+{
++++++++++++success+=+false;
++++++++++++System.out.println(k%2B"+failed+on+line+"%2Bline);
++++++++}
++++++++if+(success)+
++++++++++++System.out.println("*************************++Successs+"%2Bk);
++++}
}|code-block|syntax|javascript|592532|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

I wrote the following to print a list of results to standard out based on available charsets. Note that it also tells you what line fails from a 0 based line number in case you are troubleshooting what character is causing issues.

<pre><code>public static void testCharset(String fileName) {
 SortedMap&lt;String, Charset&gt; charsets = Charset.availableCharsets();
 for (String k : charsets.keySet()) {
 int line = 0;
 boolean success = true;
 try (BufferedReader b = Files.newBufferedReader(Paths.get(fileName),charsets.get(k))) {
 while (b.ready()) {
 b.readLine();
 line++;
 }
 } catch (IOException e) {
 success = false;
 System.out.println(k+" failed on line "+line);
 }
 if (success) 
 System.out.println("************************* Successs "+k);
 }
}
</code></pre>

blocks|key|592556|text|ISO-8859-1是一个包罗万象的字符集，它保证不会抛出MalformedInputException.因此，即使您的输入不在此字符集中，也可用于调试。所以：-|type|unstyled|depth|inlineStyleRanges|entityRanges|data|592557|req.setCharacterEncoding("ISO-8859-1");|code-block|syntax|javascript|592558|我的输入中有一些双右引号/双左引号字符，US和UTF-8都将MalformedInputException扔在了它们上，但ISO-8859-1有效。|592559|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

ISO-8859-1 is an all-inclusive charset, in the sense that it's guaranteed not to throw MalformedInputException. So it's good for debugging, even if your input is not in this charset. So:-

<pre><code>req.setCharacterEncoding("ISO-8859-1");
</code></pre>

I had some double-right-quote/double-left-quote characters in my input, and both US-ASCII and UTF-8 threw MalformedInputException on them, but ISO-8859-1 worked.

blocks|key|475431|text|试试这个..。我有同样的问题，下面的实现为我工作。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|475432|Reader+reader+=+Files.newBufferedReader(Paths.get(<yourfilewithpath>),+StandardCharsets.ISO_8859_1);|code-block|syntax|javascript|475433|那就随心所欲地使用阅读器吧。|475434|福格：|475435|CsvToBean<anyPojo>+csvToBean+=+null;
++++try+{
++++++++Reader+reader+=+Files.newBufferedReader(Paths.get(csvFilePath),+
++++++++++++++++++++++++StandardCharsets.ISO_8859_1);
++++++++csvToBean+=+new+CsvToBeanBuilder(reader)
++++++++++++++++.withType(anyPojo.class)
++++++++++++++++.withIgnoreLeadingWhiteSpace(true)
++++++++++++++++.withSkipLines(1)
++++++++++++++++.build();

++++}+catch+(IOException+e)+{
++++++++e.printStackTrace();
++++}|475436|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@]|9|@]|A|$]]|$1|I|3|J|5|6|7|R|8|@]|9|@]|A|$]]|$1|K|3|L|5|D|7|S|8|@]|9|@]|A|$E|F]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

try this.. i had the same issue, below implementation worked for me

<pre><code>Reader reader = Files.newBufferedReader(Paths.get(&lt;yourfilewithpath&gt;), StandardCharsets.ISO_8859_1);
</code></pre>

then use Reader where ever you want.

foreg:

<pre><code>CsvToBean&lt;anyPojo&gt; csvToBean = null;
 try {
 Reader reader = Files.newBufferedReader(Paths.get(csvFilePath), 
 StandardCharsets.ISO_8859_1);
 csvToBean = new CsvToBeanBuilder(reader)
 .withType(anyPojo.class)
 .withIgnoreLeadingWhiteSpace(true)
 .withSkipLines(1)
 .build();

 } catch (IOException e) {
 e.printStackTrace();
 }
</code></pre>

blocks|key|475462|text|ISO_8859_1为我工作！我在读取带有逗号分隔值的文本文件|type|unstyled|depth|inlineStyleRanges|entityRanges|data|475463|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

ISO_8859_1 Worked for me! I was reading text file with comma separated values

blocks|key|592565|text|UTF-8适合我使用波兰字符。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|592566|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

UTF-8 works for me with Polish characters

I'm creating a simple wordcount program in Java that reads through a directory's text-based files.

However, I keep on getting the error:

<pre><code>java.nio.charset.MalformedInputException: Input length = 1
</code></pre>

from this line of code:

<pre><code>BufferedReader reader = Files.newBufferedReader(file,Charset.forName("UTF-8"));
</code></pre>

I know I probably get this because I used a <code>Charset</code> that didn't include some of the characters in the text files, some of which included characters of other languages. But I want to include those characters.

I later learned at the <a href="http://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#newBufferedReader-java.nio.file.Path-">JavaDocs</a> that the <code>Charset</code> is optional and only used for a more efficient reading of the files, so I changed the code to:

<pre><code>BufferedReader reader = Files.newBufferedReader(file);
</code></pre>

But some files still throw the <code>MalformedInputException</code>. I don't know why.

I was wondering if there is an all-inclusive <code>Charset</code> that will allow me to read text files with many different types of characters?

Thanks.

All inclusive Charset to avoid "java.nio.charset.MalformedInputException: Input length = 1"?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我正在用Java创建一个简单的wordcount程序，它可以读取目录的基于文本的文件。然而，我不断地得到错误：java.nio.charset.MalformedInputException: Input length = 1在这一行代码中：BufferedReader reader = Files.newBuffer...

问所有包含字符集，以避免"java.nio.charset.MalformedInputException:输入长度= 1"？
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问所有包含字符集，以避免"java.nio.charset.MalformedInputException:输入长度= 1"？EN