我正在AWS (Ubunut) EC2实例上运行一个脚本。这是一个使用selenium/chromedriver和无头铬来刮一些网页的web刮刀。我以前已经运行过这个脚本,没有任何问题,但是今天我遇到了一个错误。下面是剧本:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")
options.binary_location='/usr/bin/chromium-browser'
driver = webdriver.Chrome(chrome_options=options)
#Set base url (SAN FRANCISCO)
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
events = []
for i in range(1,90):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
当我从ubuntu运行这个脚本时,我会得到以下错误:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 91, in <module>
driver = webdriver.Chrome(chrome_options=options)
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
我确认我运行的是相同版本的Chromedriver/Chromium浏览器:
ChromeDriver 79.0.3945.130 (e22de67c28798d98833a7137c0e22876237fc40a-refs/branch-heads/3945@{#1047})
Chromium 79.0.3945.130 Built on Ubuntu , running on Ubuntu 18.04
值得注意的是,我已经在mac上运行了这个脚本,而且我确实有多个web抓取脚本,像这样的脚本运行在同一个EC2实例上(到目前为止只有2个脚本,所以没有那么多)。
更新
当我试图在ubuntu上运行这个脚本时,我现在也得到了这些错误:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 852, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
^[[B File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
^[[B^[[A^[[A _stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 39, in <module>
res = requests.get(url)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
最后,这里是我目前每月使用AWS的情况,它没有显示任何内存配额被超过。
发布于 2020-03-07 21:02:41
这个错误信息..。
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
...implies :操作系统无法分配内存来启动/生成新会话。
此外,这个错误消息..。
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
...implies表示您的程序已经成功地迭代到第5页,而在第6页上则会看到此错误。
我不认为您的代码块存在任何问题。我已经接受了您的代码,做了一些小的调整,下面是执行结果:
深潜
此错误来自subprocess.py
self.pid = _posixsubprocess.fork_exec(
args, executable_list,
close_fds, tuple(sorted(map(int, fds_to_keep))),
cwd, env_list,
p2cread, p2cwrite, c2pread, c2pwrite,
errread, errwrite,
errpipe_read, errpipe_write,
restore_signals, start_new_session, preexec_fn)
但是,根据OSError:[Errno 12]无法分配内存中的讨论,这个错误OSError: [Errno 12] Cannot allocate memory
与内存/交换有关。
交换空间
交换空间是系统硬盘中的内存空间,被指定为操作系统临时存储数据的地方,它无法在内存中保存这些数据。这使您能够增加您的程序可以在其工作的内存中保存的数据量。硬盘上的交换空间将主要用于当RAM中没有足够的空间来保存正在使用的应用程序数据时。但是,写入I/O的信息将比存储在RAM中的信息慢得多,但操作系统更愿意在内存中继续运行应用程序数据,并使用交换空间对旧数据进行交换。当您的系统内存耗尽时,部署交换空间是一种安全措施,以防止在具有非SSD存储的系统上出现内存不足问题。
系统检查
要检查系统是否已经有可用的交换空间,需要执行以下命令:
$ sudo swapon --show
如果您没有得到任何输出,这意味着您的系统目前没有可用的交换空间。还可以使用空闲实用程序验证是否存在活动的交换,如下所示:
$ free -h
如果系统中没有活动交换,您将看到输出如下:
Output
total used free shared buff/cache available
Mem: 488M 36M 104M 652K 348M 426M
Swap: 0B 0B 0B
创建交换文件
在这些情况下,您需要为交换分配空间,以便将其用作用于任务的单独分区,并且您可以创建驻留在现有分区上的交换文件。要创建一个1GB文件,您需要执行以下命令:
$ sudo fallocate -l 1G /swapfile
您可以通过执行以下命令来验证是否保留了正确的空间:
$ ls -lh /swapfile
#Output
$ -rw-r--r-- 1 root root 1.0G Mar 08 10:30 /swapfile
这证实了已经创建了交换文件,并预留了正确的空间。
启用交换空间
一旦正确的大小文件可用,我们需要实际将其转换为交换空间。现在,您需要锁定文件的权限,以便只有具有特定权限的用户才能读取内容。这将防止意外用户访问该文件,这将对安全产生重大影响。因此,您需要遵循以下步骤:
free
实用程序的输出,通过执行以下命令验证设置:
免费-h #样例输出总使用免费共享buff/缓存可用Mem: 488M 37M 96M 652K 354M 425 M交换: 1.0G 0B1.0G结论
一旦成功地设置了交换空间,底层操作系统将在必要时开始使用它。
发布于 2020-03-07 16:30:43
可能发生的情况是Chromium被更新了,现在需要更多的内存(或者可能是泄漏内存,worse..you没有说明它死前得到了多少urls )。
作为一个工作,启动一个更大的实例大小。不要说您正在使用的实例大小,但是如果您有t3,请尝试使用t3.media。
这里有一个很容易理解的图表,https://www.ec2instances.info/?region=eu-west-1
如果您已经启动了一个实例,并且希望在不从头开始重建的情况下重新调整它的大小,那么请使用控制台将它状态为停止,更改大小,然后重新启动。
https://stackoverflow.com/questions/60579270
复制相似问题