首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >从URL列表生成一棵树

从URL列表生成一棵树
EN

Stack Overflow用户
提问于 2019-08-09 18:44:55
回答 2查看 419关注 0票数 1

我有一个多个URL的列表,有些目录有多个扩展名不同的文件,等等。示例:

代码语言:javascript
运行
复制
    List = [
         "http://www.example.com/folder1",
         "http://www.example.com/folder1",
         "http://www.example.com/folder1/folder2",
         "http://www.example.com/folder1/folder2/folder3",
         "http://www.example.com/folder1/folder2",
         "http://www.example.com/folder1/folder2/image1.png",
         "http://www.example.com/folder1/folder2/image2.png",
         "http://www.example.com/folder1/folder2/file.txt",
         "http://www.example.com/folder1/folder2/folder3",
         "http://www.example.com/folder1/folder2/folder3/file1.txt",
         "http://www.example.com/folder1/folder2/folder3/file2.txt",
         "http://www.example.com/folder1/folder2/folder3/file3.txt",
         ...
    ]

我试图实现的是过滤这些URL,以便获得一个列表,其中只有文件夹的URL和每个不同扩展名的一个URL。就像这样:

代码语言:javascript
运行
复制
    List = [
         "http://www.example.com/folder1",
         "http://www.example.com/folder1/folder2",
         "http://www.example.com/folder1/folder2/image1.png",
         "http://www.example.com/folder1/folder2/file.txt",
         "http://www.example.com/folder1/folder2/folder3",
         "http://www.example.com/folder1/folder2/folder3/file1.txt",
         ...
    ]

目前,我还在研究如何用它生成某种树,这样我就可以遍历它并删除重复的文件。

我尝试过一些不同的方法,但我对Python还有点陌生。

谢谢:)

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-08-09 19:13:05

如果您的URL遵循这种简单的格式,则可以使用dict筛选列表,以跟踪使用了哪些目录:

代码语言:javascript
运行
复制
List = [
     "http://www.example.com/folder1",
     "http://www.example.com/folder1",
     "http://www.example.com/folder1/folder2",
     "http://www.example.com/folder1/folder2/folder3",
     "http://www.example.com/folder1/folder2",
     "http://www.example.com/folder1/folder2/image1.png",
     "http://www.example.com/folder1/folder2/image2.png",
     "http://www.example.com/folder1/folder2/file.txt",
     "http://www.example.com/folder1/folder2/folder3",
     "http://www.example.com/folder1/folder2/folder3/file1.txt",
     "http://www.example.com/folder1/folder2/folder3/file2.txt",
     "http://www.example.com/folder1/folder2/folder3/file3.txt",
     ...
]

dirnames = {}
filtered = []

for url in List:
    dirname = os.path.dirname(url)
    dirnames.setdefault(dirname, {})
    extension = os.path.splitext(url)[1]

    if extension not in dirnames[dirname]:
        dirnames[dirname][extension] = True
        filtered.append(url)

print(filtered)
票数 0
EN

Stack Overflow用户

发布于 2019-08-09 19:12:58

您可以在递归中使用itertools.groupby

代码语言:javascript
运行
复制
import itertools, re
data = ['http://www.example.com/folder1', 'http://www.example.com/folder1', 'http://www.example.com/folder1/folder2', 'http://www.example.com/folder1/folder2/folder3', 'http://www.example.com/folder1/folder2', 'http://www.example.com/folder1/folder2/image1.png', 'http://www.example.com/folder1/folder2/image2.png', 'http://www.example.com/folder1/folder2/file.txt', 'http://www.example.com/folder1/folder2/folder3', 'http://www.example.com/folder1/folder2/folder3/file1.txt', 'http://www.example.com/folder1/folder2/folder3/file2.txt', 'http://www.example.com/folder1/folder2/folder3/file3.txt']
def group(d, path = []):
   new_d = [[a, [j for _, *j in b]] for a, b in itertools.groupby(sorted(d, key=lambda x:x[0]), key=lambda x:x[0])]
   for a, c in new_d:
      _d, _fold, _path = [i[0] for i in c if len(i) == 1], [], []
      for i in _d:
        if not re.findall('\.\w+$', i):
          if i not in _fold:
             yield '/'.join(path+[a]+[i])
             _fold.append(i)
        else:
           if i.split('.')[-1] not in _path:
              yield '/'.join(path+[a]+[i])
              _path.append(i.split('.')[-1])
      r = [i for i in c if len(i) != 1]
      yield from group(r, path+[a])

_data = [[a, *b.split('/')] for a, b in map(lambda x:re.split('(?<=\.com)/', x), data)]
print(list(group(_data)))

输出:

代码语言:javascript
运行
复制
['http://www.example.com/folder1', 
 'http://www.example.com/folder1/folder2', 
 'http://www.example.com/folder1/folder2/folder3', 
 'http://www.example.com/folder1/folder2/image1.png', 
 'http://www.example.com/folder1/folder2/file.txt', 
 'http://www.example.com/folder1/folder2/folder3/file1.txt']
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57435627

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档