3. 获取反应库文件并进行分析
$ git clone git@github.com:open-reaction-database/ord-schema.git
$ pip install -r requirements.txt
$ conda install -c rdkit rdkit
$ python setup.py install
这里选取部分予以展示
4. 这里选取了其中的一个反应库用于分析
# import everything
import ord_schema
from ord_schema import message_helpers, validations
from ord_schema.proto import dataset_pb2
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import wget
from rdkit import Chem
from rdkit.Chem import AllChem
from glob import glob
# 提取化合反应库,并处理反应库
# 这个路径需要注意一下
pb_path = './ord-data/data/48/ord_dataset-488402f6ec0d441ca2f7d6fabea7c220.pb'
data = message_helpers.load_message(pb_path, dataset_pb2.Dataset)
data_l = list(data.reactions)
reactions_l = [i.identifiers[0].value for i in data_l]
# 获取一些合成反应,1+1=2,SMART格式,同时去一下重
reactant_l_2 = list(set([i for i in reactions_l if i.split('>')[0].count('.') == 2]))
reaction_smarts_uni_mols = [AllChem.ReactionFromSmarts(i) for i in reactant_l_2]
参考:
[1] Kearnes S M , Maser M R , Wleklinski M , et al. The Open Reaction Database[J]. 2021.
[2] Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical Science 10, 370–377 (2019). doi: 10.1039/c8sc04228d
[3] https://github.com/open-reaction-database/ord-schema
[4] https://github.com/open-reaction-database/ord-data
[5] https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html