问如何仅使用python而不使用scala或创建临时表来加入2个csv文件
EN

Stack Overflow用户

提问于 2018-08-16 10:15:05

回答 1查看 31关注 0票数 -1

我对使用python的超级漏斗和web属性非常陌生，我正在尝试找到一个解决我的问题的方法。我有两个csv文件(网址和访问)

url.csv
short_code
full_url
time_created
user_id
premium_user
country

visits.csv
short_code
visit_time
browser_type
version
platform
ipaddress
country

我正在编写一段python代码，以获得以下内容

1. Return urls which only have visitors from the same country as the url was created from

2. Get the URL with the shortest time between when the URL was created and when the first visit was recorded

3. Gets a count of visits to each short code by each unique visitor

下面是我的代码，它只是从我的云中导入数据

链接到我的文件https://www.dropbox.com/s/u193iv6ybeges92/url.csv?dl=0 https://www.dropbox.com/s/u3vmdra41p3qjgv/visits.csv?dl=0

import csv
import requests

from pprint import pprint


def same_country_only(visits, urls):
    """Return urls which only have visitors from the same country as the url was created from"""
    pass


def shortest_first_visit(visits, urls):
    """Get the URL with the shortest time between when the URL was created and when the first visit was recorded"""
    pass


def unique_visitors(visits, urls):
    """Gets a count of visits to each short code by each unique visitor"""
    pass


if __name__ == '__main__':
    urls_response = requests.get('<<my_url>>').text
    urls_dr = csv.DictReader(urls_response.splitlines(), delimiter=',')    
    urls = [dict(url) for url in urls_dr]
    pprint(urls[0]) # example format

    print('\n' + '*' * 60 + '\n')

    visits_response = requests.get('<<my_url>>').text
    visits_dr = csv.DictReader(visits_response.splitlines(), delimiter=',')    
    visits = [dict(visit) for visit in visits_dr]
    pprint(visits[0]) # example format

    print('\n' + '*' * 60 + '\n')

    pprint(same_country_only(visits, urls))
    pprint(shortest_first_visit(visits, urls))
    pprint(unique_visitors(visits, urls))

任何帮助都是非常感谢的。

示例Csv(第一列是标题)

url.csv

id  short_cod   long_url    created_ti  creator_id  premium country
1   GTq6Bl  https://w   2018-07-2   78  FALSE   CA
2   EmazTI  https://as  2018-07-2   124 FALSE   GB
3   tT54Bl  https://bi  2018-07-2   97  FALSE   GBG4
4   6ZTSle  https://gi  2018-07-2   98  FALSE   US
5   3akWjJ  https://e   2018-07-2   11  FALSE   JP
6   m7NoUy  https://bl  2018-07-2   34  TRUE    JP
7   lszSBy  https://m   2018-07-2   90  FALSE   US
8   PnTavE  https://ha  2018-07-2   1   FALSE   GB
9   QkXxbV  https://d   2018-07-2   109 FALSE   CN

visits.csv

browser_t   visit_time  short_cod   country platform    ip_address
Chrome  2018-07-2   GTq6Bl  IT  Windows 78.110.51.215
Firefox 2018-07-2   GTq6Bl  IT  Linux   27.243.245.232
Chrome  2018-07-2   GTq6Bl  JP  Mac OS  97.155.155.73
Chrome  2018-07-2   GTq6Bl  RU  Linux   85.201.130.148
Chrome  2018-07-2   GTq6Bl  GB  Linux   26.90.189.168
Chrome  2018-07-2   GTq6Bl  CN  Android 58.203.242.175
Edge    2018-07-2   GTq6Bl  KR  Windows 84.11.120.228
Safari  2018-07-2   GTq6Bl  KR  iOS 46.72.81.132
Firefox 2018-07-2   GTq6Bl  IT  Linux   30.47.125.89F10
Safari  2018-07-2   GTq6Bl  CA  iOS 85.245.10.160
Firefox 2018-07-2   GTq6Bl  RU  Windows 43.13.144.48
Chrome  2018-07-2   GTq6Bl  IT  Android 65.74.182.22

python

python-3.x

python-2.7

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-08-19 00:48:47

使用panda来实现这一点。Panda's有一个基于公共列加入2个csv的功能。merge是用于连接的键。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51868723

复制

相似问题

问如何仅使用python而不使用scala或创建临时表来加入2个csv文件
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何仅使用python而不使用scala或创建临时表来加入2个csv文件EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何仅使用python而不使用scala或创建临时表来加入2个csv文件
EN