文章/答案/技术大牛

发布

社区首页 >问答首页 >是否有办法在PDB文件中分离属于每个生物程序集的链？(python脚本)

问是否有办法在PDB文件中分离属于每个生物程序集的链？(python脚本)
EN

Stack Overflow用户

提问于 2019-11-11 06:58:48

回答 2查看 463关注 0票数 0

我希望在PDB文件中分离属于特定生物程序集的链I。作为一个例子，PDB ID 1 1BRS有3个生物组装:生物组装1：-链A和D生物组装2：-链B和E生物组装3：-链C和F

是否有一种方法(python脚本)可以将属于每个生物程序集的链ID分离，如1BRS_A:D，1BRS_B:E，1BRS_C:F，不需要提取链坐标。如果我知道链子的名字，那就足够了。提前感谢

python

python-3.x

bioinformatics

biopython

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-11-11 10:34:44

PDBx/mmCIF文件格式包含_pdbx_struct_assembly_gen类别中的信息。

loop_
_pdbx_struct_assembly_gen.assembly_id 
_pdbx_struct_assembly_gen.oper_expression 
_pdbx_struct_assembly_gen.asym_id_list 
1 1 A,D,G,J 
2 1 B,E,H,K 
3 1 C,F,I,L

这些文件可以阅读例如黑云母(https://www.biotite-python.org/)，我正在开发的软件包。这些类别可以用类似字典的方式阅读：

import biotite.database.rcsb as rcsb
import biotite.structure as struc
import biotite.structure.io.pdbx as pdbx

ID = "1BRS"

# Download structure
file_name = rcsb.fetch(ID, "pdbx", target_path=".")

# Read file
file = pdbx.PDBxFile()
file.read(file_name)
# Get 'pdbx_struct_assembly_gen' category as dictionary
assembly_dict = file["pdbx_struct_assembly_gen"]
for asym_id_list in assembly_dict["asym_id_list"]:
    chain_ids = asym_id_list.split(",")
    print(f"{ID}_{':'.join(chain_ids)}")

输出是

1BRS_A:D:G:J
1BRS_B:E:H:K
1BRS_C:F:I:L

G-L链只含有水分子.

编辑：

要只包含属于聚合物(例如蛋白质或核苷酸)的链ID，可以使用entity_poly类别：

loop_
_entity_poly.entity_id 
_entity_poly.type 
_entity_poly.nstd_linkage 
_entity_poly.nstd_monomer 
_entity_poly.pdbx_seq_one_letter_code 
_entity_poly.pdbx_seq_one_letter_code_can 
_entity_poly.pdbx_strand_id 
_entity_poly.pdbx_target_identifier 
1 'polypeptide(L)' no no 
;AQVINTFDGVADYLQTYHKLPDNYITKSEAQALGWVASKGNLADVAPGKSIGGDIFSNREGKLPGKSGRTWREADINYTS
GFRNSDRILYSSDWLIYKTTDHYQTFTKIR
;
;AQVINTFDGVADYLQTYHKLPDNYITKSEAQALGWVASKGNLADVAPGKSIGGDIFSNREGKLPGKSGRTWREADINYTS
GFRNSDRILYSSDWLIYKTTDHYQTFTKIR
;
A,B,C ? 
2 'polypeptide(L)' no no 
;KKAVINGEQIRSISDLHQTLKKELALPEYYGENLDALWDALTGWVEYPLVLEWRQFEQSKQLTENGAESVLQVFREAKAE
GADITIILS
;
;KKAVINGEQIRSISDLHQTLKKELALPEYYGENLDALWDALTGWVEYPLVLEWRQFEQSKQLTENGAESVLQVFREAKAE
GADITIILS
;
D,E,F ?

这是更新的Python代码：

import biotite.database.rcsb as rcsb
import biotite.structure as struc
import biotite.structure.io.pdbx as pdbx

ID = "1BRS"

# Download structure
file_name = rcsb.fetch(ID, "pdbx", target_path=".")

# Read file
file = pdbx.PDBxFile()
file.read(file_name)

# Get 'entity_poly' category as dictionary
# to find out which chains are polymers
poly_chains = []
for chain_list in file["entity_poly"]["pdbx_strand_id"]:
    poly_chains += chain_list.split(",")

# Get 'pdbx_struct_assembly_gen' category as dictionary
for asym_id_list in file["pdbx_struct_assembly_gen"]["asym_id_list"]:
    chain_ids = asym_id_list.split(",")
    # Filter chains that belong to a polymer
    chain_ids = [chain_id for chain_id in chain_ids if chain_id in poly_chains]
    print(f"{ID}_{':'.join(chain_ids)}")

这是输出：

1BRS_A:D
1BRS_B:E
1BRS_C:F

票数 2

Stack Overflow用户

发布于 2022-07-13 09:57:17

谢谢你的密码！它与服务器程序集一起工作得很好。但是，如果条目只有一个程序集，则无法正确识别它。更新如下：

import biotite.database.rcsb as rcsb
import biotite.structure as struc
import biotite.structure.io.pdbx as pdbx
import json
ID = "3AV2"
#ID= "1BRS"
ID="2k6d"
#ID="1TBE"
ID="1HT2"
#ID="1HTI"
# Download structure
file_name = rcsb.fetch(ID, "pdbx", target_path=".")

# Read file
file = pdbx.PDBxFile()
file.read(file_name)

# Get 'entity_poly' category as dictionary
# to find out which chains are polymers
poly_chains = []
if isinstance(file["entity_poly"]["pdbx_strand_id"],str):
   poly_chains=file["entity_poly"]["pdbx_strand_id"].split(",")
else:
   for chain_list in file["entity_poly"]["pdbx_strand_id"]:
      poly_chains += chain_list.split(",")

biolAssemblyDict={}
if isinstance(file["pdbx_struct_assembly_gen"]["asym_id_list"],str):
    index=0
    asym_id_list=file["pdbx_struct_assembly_gen"]["asym_id_list"]
    chain_ids=asym_id_list.split(",")
    chain_ids = [chain_id for chain_id in chain_ids if chain_id in poly_chains]
    biolAssemblyDict[index+1]= ','.join(chain_ids)
else:
    # Get 'pdbx_struct_assembly_gen' category as dictionary
    for index,asym_id_list in enumerate(file["pdbx_struct_assembly_gen"]["asym_id_list"]):
        chain_ids = asym_id_list.split(",")
        #    print(chain_ids)
        # Filter chains that belong to a polymer
        chain_ids = [chain_id for chain_id in chain_ids if chain_id in poly_chains]
        biolAssemblyDict[index+1]= ','.join(chain_ids)
print(json.dumps(biolAssemblyDict,indent=4, sort_keys=True))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58796813

复制

相似问题

问是否有办法在PDB文件中分离属于每个生物程序集的链？(python脚本)
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问是否有办法在PDB文件中分离属于每个生物程序集的链？(python脚本)EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问是否有办法在PDB文件中分离属于每个生物程序集的链？(python脚本)
EN