前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >开源化学信息学库 :ScaffoldGraph

开源化学信息学库 :ScaffoldGraph

作者头像
DrugAI
发布2021-02-01 10:20:27
6810
发布2021-02-01 10:20:27
举报
文章被收录于专栏:DrugAI

ScaffoldGraph是一个开放源代码化学信息库,使用RDKit和NetworkX构建,用于生成和分析骨架网络和支架树。

1

特点

  • 骨架网络生成(Varin, 2011) 通过迭代删除可用环来探索骨架空间,从而为一组输入分子生成所有可能的子骨架。输出是分子骨架的有向无环图。
  • HierS网络生成(Wilkens,2005年) 通过迭代移除可用环来探索骨架空间,生成所有可能的子骨架,而不会剖析稠密的环系统。
  • 骨架树生成(Schuffenhauer,2007) 通过从分子骨架中反复去除特征最少的环来探索骨架空间。输出是一棵分子骨架树。
  • Murcko片段生成(Bemis,1996年) 通过迭代去除可用环,为分子生成一组murcko片段。
  • 化合物集富集(瓦林,2010,2011) 从初步筛选数据中识别出活性化学系列。

2

与现有软件的比较

  • 骨架网络生成器(SNG)(Matlock 2013)
  • 骨架 Hunter(SH)(Wetzel,2009)
  • 骨架树生成器(STG)(SH CLI predecessor)

3

安装

ScaffoldGraph目前仅支持Python 3

conda安装

conda config --add channels conda-forge conda install -c uclcheminformatics scaffoldgraph

pip安装

pip install scaffoldgraph

4

ScaffoldGraph例:骨架网络与骨架树

导入库

代码语言:javascript
复制
import scaffoldgraph as sg

import networkx as nx
import matplotlib.pyplot as plt
from rdkit.Chem import Draw
from rdkit import Chem
import random
import os

载入数据,绘制分子

代码语言:javascript
复制
sdf_file = os.path.dirname(sg.__file__).replace('scaffoldgraph', 'examples/example.sdf') # Example SDF file (200 PubChem compounds)
supplier = Chem.SDMolSupplier(sdf_file)

peek = 6
Draw.MolsToGridImage([supplier[x] for x in range(peek)])

骨架网络‍

生成骨架网络

代码语言:javascript
复制
network = sg.ScaffoldNetwork.from_sdf(sdf_file, progress=True)
# We can access the number of molecule nodes and scaffold nodes in the graph
n_scaffolds = network.num_scaffold_nodes
n_molecules = network.num_molecule_nodes

print('\nGenerated scaffold network from {} molecules with {} scaffolds\n'.format(n_molecules, n_scaffolds))

绘制骨架网络

代码语言:javascript
复制
scaffolds = list(network.get_scaffold_nodes())
print(scaffolds[0:5])

# Visualize a few of the scaffolds
sample = 6
Draw.MolsToGridImage([Chem.MolFromSmiles(x) for x in scaffolds[:sample]])

骨架分布

代码语言:javascript
复制
counts = network.get_hierarchy_sizes()  # returns a collections Counter object
lists = sorted(counts.items())
x, y = zip(*lists)

# Plot sizes as bar chart
plt.figure(figsize=(8, 6))
plt.bar(x, y)
plt.xlabel('Hierarchy')
plt.ylabel('Scaffold Count')
plt.title('Number of Scaffolds per Hierarchy (Network)')
plt.show()

骨架匹配与高亮

代码语言:javascript
复制
query_smiles = 'c1ccncc1'  # lets use this subscaffold as a query
query_mol = Chem.MolFromSmiles(query_smiles)

next_scaffolds = []
for succ in network.successors(query_smiles):
    if network.nodes[succ]['type'] == 'scaffold':
        next_scaffolds.append(succ)

print('Found {} scaffolds in hierarchy 2 containing {}:'.format(len(next_scaffolds), query_smiles))

mols = [Chem.MolFromSmiles(x) for x in next_scaffolds[:6]]
Draw.MolsToGridImage(mols, highlightAtomLists=[mol.GetSubstructMatch(query_mol) for mol in mols])

分子匹配与高亮

代码语言:javascript
复制
molecules = []
for succ in nx.bfs_tree(network, query_smiles, reverse=False):
    if network.nodes[succ]['type'] == 'molecule':
        molecules.append(succ)

print('Found {} molecules containing scaffold, {}\n'.format(len(molecules), query_smiles))

# Molecules are PubChem IDs so lets get the SMILES and view som of the molecules

smiles = [network.nodes[pid]['smiles'] for pid in molecules]
mols = [Chem.MolFromSmiles(smi) for smi in smiles]

Draw.MolsToGridImage(mols, highlightAtomLists=[mol.GetSubstructMatch(query_mol) for mol in mols],
                    legends=molecules, maxMols=9)

骨架树

代码语言:javascript
复制
tree = sg.ScaffoldTree.from_sdf(sdf_file, progress=True)
# access the number of molecule nodes and scaffold nodes in the graph
n_scaffolds = tree.num_scaffold_nodes
n_molecules = tree.num_molecule_nodes

print('\nGenerated scaffold tree from {} molecules with {} scaffolds\n'.format(n_molecules, n_scaffolds))

# The output is a forest structure (multiple trees)
print('Graph is a Forest:', nx.is_forest(tree))

绘制分子骨架树

代码语言:javascript
复制
random_pubchem_id = random.choice(list(tree.get_molecule_nodes()))
print('PubChem ID:', random_pubchem_id)
predecessors = nx.bfs_tree(tree, random_pubchem_id, reverse=True)

# We can validate that one molecules scaffold set forms a tree structure
print('Predecessors of {} is Tree: {}'.format(random_pubchem_id, nx.is_tree(predecessors)))

# Draw these scaffolds
predecessors_list = list(predecessors)
predecessors_list[0] = tree.nodes[predecessors_list[0]]['smiles'] # [0] is pubchem ID
Draw.MolsToGridImage([Chem.MolFromSmiles(x) for x in predecessors_list])
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-04-07,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 DrugAI 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • conda安装
  • pip安装
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档