我更喜欢SQL,但在一次面试中我被问到了一个让我困惑的问题。我把要点放在这里:有一个有两列的平面文件:'Course‘和'Student_id’,有几行Course:科学,数学,科学,历史,科学,数学Student_id: 101,103,102,101,103,101
您将如何使用没有包或库的基础student_id,按课程对学生进行分组,返回每门课程的学生数量,返回注册学生数量的“科学”,返回注册的每个python的“数学”
我知道如何在SQL和pandas中实现这一点,但不知道如何在没有包或库的基础python中实现这一点。请帮帮忙。
发布于 2018-07-12 04:52:05
编辑:
似乎我读错了你对文件格式的描述,如果你有两个列分隔值的行,这个解决方案是有效的-而不是很多行,每个行都有两个逗号分隔值。
把它留在文件格式的mvca中,我想你会面对的。
你可以这样做:
data = """Course: Science, Math, Science, History, Science, Math
Student_id: 101, 103, 102, 101, 103, 101"""
fn = "data.txt"
# write file
with open(fn,"w") as f:
f.write(data)
使用该文件,您可以:
# read file
d = {}
with open(fn,"r") as f:
for line in f:
c,cc = line.split(":")
d[c] = [x.strip() for x in cc.split(",")]
# create a (course,student)-tuple list
tups = list(zip( d["Course"],d["Student_id"]))
# create a dict of course : student_list
# you can streamline this using defaultdict from collections but that needs an import
courses = {}
for course,student in tups: # iterate, create course:pupillist dict
if course in courses:
courses[course].append(student)
else:
courses[course] = [student]
# print all (including Science) with amount of pupils
for k in courses:
print(k, len(courses[k]))
# print Math + StudentIds
print("Math: ", courses["Math"])
输出:
Science 3
Math 2
History 1
Math: ['103', '101']
https://stackoverflow.com/questions/51293820
复制相似问题