我有这段代码,但不知何故它并不像我想要的那样工作。基本上我读取.geno文件,从possib读取我想要保留的百分比,其他的我想要的是9。第一轮是完美的,但第二轮是错误的,就像它将数字逗号一样,但我每次循环都在数组中读取,所以我不明白为什么它不能工作。在此链接中有用于测试的.ind和.geno文件。
https://drive.google.com/drive/folders/15VqGbVib41a4bDSVuPiqy_hkcCGK7H4H?usp=sharing
import pandas as pd
import numpy as np
from time import process_time
import random
from argparse import ArgumentParser
ind = pd.read_csv("test_bckg100.ind", delimiter=r"\s+", header=None)
ind.columns = ['ID', 'Sex', 'Pop']
sample_size = [1] * len(ind)
sample_list = list(ind['ID'])
# make a list for all p values
possib = [0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05]
t1_start = process_time()
geno = pd.read_fwf("test_bckg100.geno", widths=sample_size, header=None, dtype=np.uint8)
geno_arr = geno.to_numpy(dtype=np.uint8)
# geno_arr[0,:] oszlopok
# geno_arr[:,0] sorok
for p in possib:
random.seed(1)
needed_nine = round(len(geno_arr[:, 0]) * (1 - p))
geno_temp = geno.to_numpy(dtype=np.uint8)
for b in range(0,len(sample_list)):
index = np.random.choice(np.arange(geno_temp.shape[0]), needed_nine, replace=False)
geno_temp[index,b] = 9
test_geno_arr = geno_temp[:, 0]
good_pos = np.where(test_geno_arr == 9)
print(needed_nine)
print(len(good_pos[0]))
#with open("BACKGROUND" + "_1_" + str(int(p*100))+".geno", 'w') as fout:
#np.savetxt(fout, geno_temp, delimiter="", fmt='%d')
t1_stop = process_time()
print("Finished genotype files: ", t1_stop - t1_start)
发布于 2021-06-09 09:19:20
解决方案是改变
geno_temp = geno.to_numpy(dtype=np.uint8)
至
geno_temp = np.copy(geno_arr)
https://stackoverflow.com/questions/67886020
复制相似问题