https://visualstudio.microsoft.com/ja/downloads/
Python 3.7.9
创建deepchem虚拟环境
conda create -n deepchem python=3.7
激活虚拟环境并安装依赖包
conda activate deepchem
(deepchem) >conda install tensorflow
(deepchem) >conda install tensorflow-probability
(deepchem) >conda install pandas joblib scikit-learn numpy
安装Git
(deepchem) >conda install git
克隆deepchem
(deepchem) >git clone https://github.com/deepchem/deepchem.git
编译安装deepchem
(deepchem) >cd deepchem
(deepchem) >python setup.py install
(deepchem) >conda list deepchem
# Name Version Build Channel deepchem 2.4.0rc1.dev20210105175316 pypi_0 pypi
import osimport deepchem as dc current_dir = os.path.dirname(os.path.realpath("__file__"))dataset_file = "medium_muv.csv.gz"full_dataset_file = "muv.csv.gz" # We use a small version of MUV to make online rendering of notebooks easy. Replace with full_dataset_file# In order to run the full version of this notebookdc.utils.download_url("https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/%s" % dataset_file, current_dir) dataset = dc.utils.load_from_disk(dataset_file)print("Columns of dataset: %s" % str(dataset.columns.values))print("Number of examples in dataset: %s" % str(dataset.shape[0]))
Columns of dataset: ['MUV-466' 'MUV-548' 'MUV-600' 'MUV-644' 'MUV-652' 'MUV-689' 'MUV-692' 'MUV-712' 'MUV-713' 'MUV-733' 'MUV-737' 'MUV-810' 'MUV-832' 'MUV-846' 'MUV-852' 'MUV-858' 'MUV-859' 'mol_id' 'smiles']Number of examples in dataset: 10000
from rdkit import Chemfrom rdkit.Chem import Drawfrom itertools import islicefrom IPython.display import Image, display, HTML def display_images(filenames): """Helper to pretty-print images.""" for filename in filenames: display(Image(filename)) def mols_to_pngs(mols, basename="test"): """Helper to write RDKit mols to png files.""" filenames = [] for i, mol in enumerate(mols): filename = "MUV_%s%d.png" % (basename, i) Draw.MolToFile(mol, filename) filenames.append(filename) return filenames num_to_display = 12molecules = []for _, data in islice(dataset.iterrows(), num_to_display): molecules.append(Chem.MolFromSmiles(data["smiles"]))display_images(mols_to_pngs(molecules))
MUV_tasks = ['MUV-692', 'MUV-689', 'MUV-846', 'MUV-859', 'MUV-644', 'MUV-548', 'MUV-852', 'MUV-600', 'MUV-810', 'MUV-712', 'MUV-737', 'MUV-858', 'MUV-713', 'MUV-733', 'MUV-652', 'MUV-466', 'MUV-832'] featurizer = dc.feat.CircularFingerprint(size=1024)loader = dc.data.CSVLoader( tasks=MUV_tasks, smiles_field="smiles", featurizer=featurizer)dataset = loader.featurize(dataset_file)
Loading raw samples now.shard_size: 8192About to start loading CSV from medium_muv.csv.gzLoading shard 1 of size 8192.Featurizing sample 0Featurizing sample 1000Featurizing sample 2000Featurizing sample 3000Featurizing sample 4000Featurizing sample 5000Featurizing sample 6000Featurizing sample 7000Featurizing sample 8000TIMING: featurizing shard 0 took 25.886 sLoading shard 2 of size 8192.Featurizing sample 0Featurizing sample 1000TIMING: featurizing shard 1 took 5.656 sTIMING: dataset construction took 31.964 sLoading dataset from disk.
splitter = dc.splits.RandomSplitter(dataset_file)train_dataset, valid_dataset, test_dataset = splitter.train_valid_test_split( dataset)#NOTE THE RENAMING:valid_dataset, test_dataset = test_dataset, valid_dataset
Computing train/valid/test indicesTIMING: dataset construction took 0.639 sLoading dataset from disk.TIMING: dataset construction took 0.371 sLoading dataset from disk.TIMING: dataset construction took 0.263 sLoading dataset from disk.
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。