我需要从csv读取数据,如果和小于或等于5,则需要添加两行或多行,并将其保存到一个新的csv文件中。
预期输出模式:
Application time_taken
Serv6, Serv2 2.2
Serv1 2.3
Serv9 3.3
Serv4 4.4
Serv8 4.6
Serv10 5.0
Serv7 5.5
Serv3 6.0
Serv5 8.0错误:
Traceback (most recent call last):
File "<ipython-input-24-fc02128770c1>", line 1, in <module>
runfile('C:/Users/prasad.namala/Documents/Python Scripts/Programs/New_Csv.py', wdir='C:/Users/prasad.namala/Documents/Python Scripts/Programs')
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/prasad.namala/Documents/Python Scripts/Programs/New_Csv.py", line 36, in <module>
final_csv = value_checker(input_data)
File "C:/Users/prasad.namala/Documents/Python Scripts/Programs/New_Csv.py", line 32, in value_checker
return value_checker(new_df)
File "C:/Users/prasad.namala/Documents/Python Scripts/Programs/New_Csv.py", line 20, in value_checker
print(f"{df.loc[i,['Application']].item()}, {df.loc[i+1,['Application']].item()}")
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
return self._getitem_tuple(key)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 988, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1913, in _getitem_axis
return self._get_label(key, axis=axis)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 141, in _get_label
return self.obj._xs(label, axis=axis)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 3585, in xs
loc = self.index.get_loc(key)
File "C:\Users\prasad.namala\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 5**Data in input csv file:**
Application time_taken
Serv6 1.0
Serv2 1.2
Serv1 2.3
Serv9 3.3
Serv4 4.4
Serv8 4.6
Serv10 5.0
Serv7 5.5
Serv3 6.0
Serv5 8.0
**Here is the code:**
import pandas as pd
def value_checker(sf):
col = []
col1 = []
sf.sort_values(by='time_taken', inplace= True)
sf.reset_index(drop = True, inplace = True)
df = sf[sf['time_taken']<5]
print(sf)
if df.shape[0]>2:
for i in range(0, df.shape[0], 2):
if df.loc[i:i+1,['time_taken']].sum().item() <= 5:
print(f"{df.loc[i,['Application']].item()}, {df.loc[i+1,['Application']].item()}")
col.append(f"{df.loc[i,['Application']].item()}, {df.loc[i+1,['Application']].item()}")
# print(df.loc[i,['time_taken']].item() + df.loc[i+1,['time_taken']].item())
col1.append(df.loc[i,['time_taken']].item() + df.loc[i+1,['time_taken']].item())
print(col1)
else:
print(f"in else block {i}th iter ")
col.extend([df.loc[i,['Application']].item(), df.loc[i+1,['Application']].item()])
col1.extend([df.loc[i,['time_taken']].item(), df.loc[i+1,['time_taken']].item()])
new_df = pd.DataFrame(data={'Application':col, 'time_taken': col1})
new_df = pd.concat([new_df, sf[sf['time_taken']>=5]])
if new_df[new_df['time_taken']<5].shape[0]>=2:
return value_checker(new_df)
return sf
input_data = pd.read_csv('inputfile.csv')
final_csv = value_checker(input_data)
final_csv.to_csv("Outputfile.csv",index=False)
print("Task Completed")注意:忽略打印语句
如上文所述,预期的产出应该是模式的。如果任意两行的和小于或等于5,则再次添加这些数据,以此类推。
发布于 2019-07-03 20:50:21
新的行会被添加到数据挖掘的底部吗?还是在满足条件的两个值之间插入?
如果满足条件,第一列中的第一列的第一行值是多少?
在等待您的回复时,首先,我认为您必须在列表中进行二进制搜索。
现在编辑我的答案:))。试试这个:
# Placing your 2nd column's values in a list.
time_taken_vals = df.iloc[:,1].values.tolist()
# Doing the search for each entry and creating a new list of values for the result to be used as your 3rd column's values.
sum_result = []
for i in range(0, len(time_taken_vals)):
if time_taken_vals[i] + time_taken_vals[i+1] <= 5 and i != len(time_list)-1:
sum_result.append(time_taken_vals[i] + time_taken_vals[i+1])
else:
sum_result.append('')
sum_result.append('') # to equate the length of the two lists
# Now all you need to do is use the sum_result as value of your new column
df['Sum of time taken'] = sum_result因此,从您最初的DataFrame:
Application time_taken
Serv6 1.0
Serv2 1.2
Serv1 2.3
Serv9 3.3
Serv4 4.4
Serv8 4.6
Serv10 5.0
Serv7 5.5
Serv3 6.0
Serv5 8.0其结果将是:
Application time_taken Sum of time taken
Serv6 1.0 2.2
Serv2 1.2 3.5
Serv1 2.3
Serv9 3.3
Serv4 4.4
Serv8 4.6
Serv10 5.0
Serv7 5.5
Serv3 6.0
Serv5 8.0 因为只有1.0 + 1.2和1.2 + 2.3满足您的条件。
这是你要找的解决方案吗?无论如何,我希望这能给你一个解决问题的主意,祝你好运,别忘了提我的答案,谢谢:)
https://stackoverflow.com/questions/56876656
复制相似问题