用fileinput模块parse数据很方便:
1 2 3 4 5 | import fileinput if __name__ == '__main__': for line in fileinput.input(): sys.stdout.write(line) |
---|
但有时候会碰到UnicodeDecodeError:
比如执行:
1 2 3 4 | echo -e "foo\x80bar" |python3 testinput.py ... UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3: invalid start byte |
---|
这种错误还不好用try .. catch
忽略掉,因为它是在fileinput模块中自己parse的;
Python2的时候很罗嗦,需要自己用codecs去判断之后,才能parse;
Python3总算是引入了一个openhook参数,可以自己hook处理了;
最简单的处理方式:
1 2 3 4 5 6 7 8 | import fileinput import io import sys if __name__ == '__main__': sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='replace') for line in fileinput.input(openhook=fileinput.hook_encoded("utf-8")): sys.stdout.write(line) |
---|
参考:
https://stackoverflow.com/questions/24754861/unicode-file-with-python-and-fileinput
https://bugs.python.org/issue26756