python 字符串转换long_python整数、字符串、字节串相互转换

用户7886150

修改于 2021-01-08 10:19:03

2.1K0

文章被收录于专栏：bit哲学院bit哲学院

参考链接： Python字符串string的encode

python的数据转换很灵活，所以用日志记录下他们的用法。

概览

数字

字符串

字节码

函数

功能

记忆口诀

备注

chr

数字转成对应的ascii字符

chr长得很像char，因此转成char

范围为0~255

ord

单个字符转对应ascii序号

digit为最后一个字母

进制转换

10进制转16进制:

hex(16) ==> 0x10

16进制转10进制:

int(STRING,BASE)将字符串STRING转成十进制int，其中STRING的基是base。该函数的第一个参数是字符串

int('0x10', 16) ==> 16

类似的还有八进制oct()，二进制bin()

16进制字符串转成二进制

hex_str='00fe'

bin(int('1'+hex_str, 16))[3:] #含有前导0

# 结果 '0000000011111110'

bin(int(hex_str, 16))[2:] #忽略前导0

# 结果 '11111110'

二进制字符串转成16进制字符串

bin_str='0b0111000011001100'

hex(int(bin_str,2))

# 结果 '0x70cc'

字符to整数

10进制字符串:

int('10') ==> 10

16进制字符串:

int('10', 16) ==> 16

# 或者

int('0x10', 16) ==> 16

字节串to整数

使用网络数据包常用的struct，兼容C语言的数据结构

struct中支持的格式如下表

Format

C-Type

Python-Type

字节数

备注

pad byte

no value

char

string of length 1

signed char

integer

unsigned char

integer

_Bool

bool

short

integer

unsigned short

integer

int

integer

unsigned int

integer or long

long

integer

unsigned long

long

long long

long

仅支持64bit机器

unsigned long long

long

仅支持64bit机器

float

double

float

char[]

string

char[]

string

1(与机器有关)

作为指针

void *

long

作为指针

对齐方式：放在第一个fmt位置

CHARACTER

BYTE ORDER

SIZE

ALIGNMENT

native

standard

none

little-endian

standard

none

big-endian

standard

none

network (= big-endian)

standard

none

转义为short型整数:

struct.unpack(' (1, 0)

转义为long型整数:

struct.unpack(' (1,)

整数to字节串

转为两个字节:

struct.pack(' b'\x01\x00\x02\x00'

转为四个字节:

struct.pack(' b'\x01\x00\x00\x00\x02\x00\x00\x00'

整数to字符串

直接用函数

str(100)

字符串to字节串

bytes、str与unicode的区别

Python3有两种表示字符序列的类型：bytes和str。前者的实例包含原始的8位值，后者的实例包含Unicode字符。

Python2也有两种表示字符序列的类型，分别叫做str和Unicode。与Python3不同的是，str实例包含原始的8位值；而unicode的实例，则包含Unicode字符。

把Unicode字符表示为二进制数据(也就是原始8位值)有许多种办法。最常见的编码方式就是UTF-8。但是，Python3的str实例和Python2的unicode实例都没有和特定的二进制编码形式相关联。要想把Unicode字符转换成二进制数据，就必须使用encode方法。要想把二进制数据转换成Unicode字符，则必须使用decode方法。

编写Python程序的时候，一定要把编码和解码操作放在界面最外围来做。程序的核心部分应该使用Unicode字符类型(也就是Python3中的str、Python2中的unicode)，而且不要对字符编码做任何假设。这种办法既可以令程序接受多种类型的文本编码(如Latin-1、Shift JIS和Big5)，又可以保证输出的文本信息只采用一种编码形式(最好是UTF-8)。

由于字符类型有别，所以Python代码中经常会出现两种常见的使用情境：

开发者需要原始8位值，这些8位值表示以UTF-8格式(或其他编码形式)来编码的字符。

开发者需要操作没有特定编码形式的Unicode字符。

decode和encode区别

字符串编码为字节码:

'12abc'.encode('ascii') ==> b'12abc'

数字或字符数组:

bytes([1,2, ord('1'),ord('2')]) ==> b'\x01\x0212'

16进制字符串:

bytes().fromhex('010210') ==> b'\x01\x02\x10'

16进制字符串:

bytes(map(ord, '\x01\x02\x31\x32')) ==> b'\x01\x0212'

16进制数组:

bytes([0x01,0x02,0x31,0x32]) ==> b'\x01\x0212'

字节串to字符串

字节码解码为字符串:

bytes(b'\x31\x32\x61\x62').decode('ascii') ==> 12ab

字节串转16进制表示,夹带ascii:

str(bytes(b'\x01\x0212'))[2:-1] ==> \x01\x0212

字节串转16进制表示,固定两个字符表示:

str(binascii.b2a_hex(b'\x01\x0212'))[2:-1] ==> 01023132

字节串转16进制数组:

[hex(x) for x in bytes(b'\x01\x0212')] ==> ['0x1', '0x2', '0x31', '0x32']

问题：什么时候字符串前面加上’r’、’b’、’u’，其实官方文档有写。我认为在Python2中，r和b是等效的。

The Python 2.x documentation:

A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix.

‘b’字符加在字符串前面，对于python2会被忽略。加上’b’目的仅仅为了兼容python3，让python3以bytes数据类型(0~255)存放这个字符、字符串。

The Python 3.3 documentation states:

Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

数据类型byte总是以’b’为前缀，该数据类型仅为ascii。

下面是stackflow上面一个回答。我觉得不错，拿出来跟大家分享

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

unicode = u’…’ literals = sequence of Unicode characters = 3.x str

str = ‘…’ literals = sequences of confounded bytes/characters

Usually text, encoded in some unspecified encoding.

But also used to represent binary data like struct.pack output.

Python 3.x makes a clear distinction between the types:

str = ‘…’ literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled)

bytes = b’…’ literals = a sequence of octets (integers between 0 and 255)

Like this: