我目前正在开发一个Python程序,其中有一个片段,它使用Chrome和Selenium的无头版本来执行重复的过程。我的目标是在Lambda上运行这个程序。
整个程序有大约1GB的依赖项,因此使用标准方法使用.zip档案 (包含所有函数代码和依赖项)的选项不是一个选项,因为函数的总解压缩大小和所有层不能超过解压缩的部署包大小限制250 MB。
因此,这就是新的AWS Lambda -容器映像支持 (我使用这个链接教程来开发整个实现的地方,所以如果您需要更多信息,请阅读)。这允许我将Lambda函数打包并部署为10 GB大小的容器映像。
我使用的是运行Amazon 2的AWS提供的ECR中的基本映像。首先-在我的Dockerfile i中:
最后,我安装了Chrome (阅读时为87.0.4280.88)和Chromedriver (87.0.4280.88)。
这可能是问题所在,但我非常怀疑这两个版本都是相同的-- ChromeDriver使用与Chrome相同的版本号方案。
这是我的Dockerfile
# 1) DOWNLOAD BASE IMAGE.
FROM public.ecr.aws/lambda/python:3.8
# 2) DEFINE GLOBAL ARGS.
ARG MAIN_FILE="main.py"
ARG ENV_FILE="params.env"
ARG REQUIREMENTS_FILE="requirements.txt"
ARG FUNCTION_ROOT="."
ARG RUNTIME_VERSION="3.8"
# 3) COPY FILES.
# Copy The Main .py File.
COPY ${MAIN_FILE} ${LAMBDA_TASK_ROOT}
# Copy The .env File.
COPY ${ENV_FILE} ${LAMBDA_TASK_ROOT}
# Copy The requirements.txt File.
COPY ${REQUIREMENTS_FILE} ${LAMBDA_TASK_ROOT}
# Copy Helpers Folder.
COPY helpers/ ${LAMBDA_TASK_ROOT}/helpers/
# Copy Private Folder.
COPY priv/ ${LAMBDA_TASK_ROOT}/priv/
# Copy Source Data Folder.
COPY source_data/ ${LAMBDA_TASK_ROOT}/source_data/
# 4) INSTALL DEPENDENCIES.
RUN --mount=type=cache,target=/root/.cache/pip python3.8 -m pip install --upgrade pip
RUN --mount=type=cache,target=/root/.cache/pip python3.8 -m pip install wheel
RUN --mount=type=cache,target=/root/.cache/pip python3.8 -m pip install urllib3
RUN --mount=type=cache,target=/root/.cache/pip python3.8 -m pip install -r requirements.txt --default-timeout=100
# 5) DOWNLOAD & INSTALL CHROMEIUM + CHROMEDRIVER.
#RUN yum -y upgrade
RUN yum -y install wget unzip libX11 nano wget unzip xorg-x11-xauth xclock xterm
# Install Chrome
RUN wget https://intoli.com/install-google-chrome.sh
RUN bash install-google-chrome.sh
# Install Chromedriver
RUN wget https://chromedriver.storage.googleapis.com/87.0.4280.88/chromedriver_linux64.zip
RUN unzip ./chromedriver_linux64.zip
RUN rm ./chromedriver_linux64.zip
RUN mv -f ./chromedriver /usr/local/bin/chromedriver
RUN chmod 755 /usr/local/bin/chromedriver
# 5) SET CMD OF HANDLER.
CMD [ "main.lambda_handler" ]
这个映像总是在没有问题的情况下构建,并按照预期创建我的映像。
我的docker-compose.yml文件:
version: "3.7"
services:
lambda:
image: tbg-lambda:latest
build: .
ports:
- "8080:8080"
env_file:
- ./params.env
所以-现在映像已经构建好了,我可以在本地使用cURL进行测试。在这里,我传递一个空的JSON有效负载:
curl -XPOST "http://localhost:8080/2015-03-31/functions/function/invocations" -d '{}'
运行整个程序完美地开始到结束使用Chrome无头模式,没有错误。
如此伟大-码头集装箱工作在本地和预期。
让我们将它上传到ECR,这样我就可以将它与Lambda函数一起使用(为了安全起见,ECR改变了):
aws ecr create-repository --repository-name tbg-lambda:latest --image-scanning-configuration scanOnPush=true
docker tag tbg-lambda:latest 123412341234.dkr.ecr.sa-east-1.amazonaws.com/tbg-lambda:latest
aws ecr get-login-password | docker login --username AWS --password-stdin 123412341234.dkr.ecr.sa-east-1.amazonaws.com
docker push 123412341234.dkr.ecr.sa-east-1.amazonaws.com/tbg-lambda:latest
一切按预期进行--然后创建新的lambda函数,选择"Container“作为函数选项,并将IAM角色附加到我需要的所有权限:
我设置了内存最大值,以确保这不是问题所在:
好的-那么让我们进入到的故障点:
我使用一个测试事件通过控制台调用该函数:
所有东西都运行得很完美,直到它命中了用Chrome创建with驱动程序的代码:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--single-process')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--remote-debugging-port=9222')
options.add_argument('--disable-infobars')
driver = webdriver.Chrome(
service_args=["--verbose", "--log-path={}".format(logPath)],
executable_path=f"/usr/local/bin/chromedriver",
options=options
)
PS: logPath只是项目目录中的另一个文件夹--这里的日志输出与预期一样,日志如下所示。
Heres是Cloudwatch日志中突出显示错误的部分:
Caught WebDriverException Error: unknown error: Chrome failed to start: crashed.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
END RequestId: 7c933bca-5f0d-4458-9529-db28da677444
REPORT RequestId: 7c933bca-5f0d-4458-9529-db28da677444 Duration: 59104.94 ms Billed Duration: 59105 ms Memory Size: 10240 MB Max Memory Used: 481 MB
RequestId: 7c933bca-5f0d-4458-9529-db28da677444 Error: Runtime exited with error: exit status 1 Runtime.ExitError
,这里是完整的Chromedriver文件:
[1608748453.064][INFO]: Starting ChromeDriver 87.0.4280.88 (89e2380a3e36c3464b5dd1302349b1382549290d-refs/branch-heads/4280@{#1761}) on port 54581
[1608748453.064][INFO]: Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
[1608748453.064][INFO]: /dev/shm not writable, adding --disable-dev-shm-usage switch
[1608748453.679][SEVERE]: CreatePlatformSocket() failed: Address family not supported by protocol (97)
[1608748453.679][INFO]: listen on IPv6 failed with error ERR_ADDRESS_UNREACHABLE
[1608748454.432][INFO]: [13826d22c628514ca452d1f2949eb011] COMMAND InitSession {
"capabilities": {
"alwaysMatch": {
"browserName": "chrome",
"goog:chromeOptions": {
"args": [ "--no-sandbox", "--headless", "--single-process", "--disable-dev-shm-usage" ],
"extensions": [ ]
},
"platformName": "any"
},
"firstMatch": [ {
} ]
},
"desiredCapabilities": {
"browserName": "chrome",
"goog:chromeOptions": {
"args": [ "--no-sandbox", "--headless", "--single-process", "--disable-dev-shm-usage" ],
"extensions": [ ]
},
"platform": "ANY",
"version": ""
}
}
[1608748454.433][INFO]: Populating Preferences file: {
"alternate_error_pages": {
"enabled": false
},
"autofill": {
"enabled": false
},
"browser": {
"check_default_browser": false
},
"distribution": {
"import_bookmarks": false,
"import_history": false,
"import_search_engine": false,
"make_chrome_default_for_user": false,
"skip_first_run_ui": true
},
"dns_prefetching": {
"enabled": false
},
"profile": {
"content_settings": {
"pattern_pairs": {
"https://*,*": {
"media-stream": {
"audio": "Default",
"video": "Default"
}
}
}
},
"default_content_setting_values": {
"geolocation": 1
},
"default_content_settings": {
"geolocation": 1,
"mouselock": 1,
"notifications": 1,
"popups": 1,
"ppapi-broker": 1
},
"password_manager_enabled": false
},
"safebrowsing": {
"enabled": false
},
"search": {
"suggest_enabled": false
},
"translate": {
"enabled": false
}
}
[1608748454.433][INFO]: Populating Local State file: {
"background_mode": {
"enabled": false
},
"ssl": {
"rev_checking": {
"enabled": false
}
}
}
[1608748454.433][INFO]: Launching chrome: /usr/bin/google-chrome --disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --enable-automation --enable-blink-features=ShadowDOMV0 --enable-logging --headless --log-level=0 --no-first-run --no-sandbox --no-service-autorun --password-store=basic --remote-debugging-port=0 --single-process --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.xgjs0h data:,
mkdir: cannot create directory ‘/.local’: Read-only file system
touch: cannot touch ‘/.local/share/applications/mimeapps.list’: No such file or directory
/usr/bin/google-chrome: line 45: /dev/fd/62: No such file or directory
/usr/bin/google-chrome: line 46: /dev/fd/62: No such file or directory
prctl(PR_SET_NO_NEW_PRIVS) failed
[1223/183429.578846:FATAL:zygote_communication_linux.cc(255)] Cannot communicate with zygote
Failed to generate minidump.[1608748469.769][INFO]: [13826d22c628514ca452d1f2949eb011] RESPONSE InitSession ERROR unknown error: Chrome failed to start: crashed.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
[1608748469.769][DEBUG]: Log type 'driver' lost 0 entries on destruction
[1608748469.769][DEBUG]: Log type 'browser' lost 0 entries on destruction
我可能认为问题之一是lambda运行这个容器的方式,而不是我如何在本地运行它。
很多人建议而不是不要将chrome作为根运行-那么Lambda是否将容器作为根运行,这是什么原因呢?如果是这样的话,我如何告诉Lambda或Docker以非根用户的身份运行代码。
这里提到了这一点:https://github.com/heroku/heroku-buildpack-google-chrome/issues/46#issuecomment-484562558
自从AWS宣布了lambda容器以来,我一直在和这个错误做斗争,所以如果我错过了什么,请请求更多的信息!
提前谢谢。
发布于 2021-01-08 19:17:22
Pythonv3.6很好用。我有一个bin
目录,其中包含chromedriver v2.41
(linux64.zip)和headless-chrome v68.0.3440.84
(https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-53/stable-headless-chromium-amazonlinux-2017-03.zip)。
下面是我的Dockerfile,其中我将chromedriver
和headless-chrome
从源文件bin
目录复制到目标bin
目录。目标bin
目录的原因如下所述。
FROM public.ecr.aws/lambda/python:3.6
COPY app.py ${LAMBDA_TASK_ROOT}
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN --mount=type=cache,target=/root/.cache/pip python3.6 -m pip install --upgrade pip
RUN --mount=type=cache,target=/root/.cache/pip python3.6 -m pip install -r requirements.txt
RUN mkdir bin
ADD bin bin/
CMD [ "app.handler" ]
在我的python脚本中,我将在bin
目录(Docker )中通过775
权限将文件复制到/tmp/bin
目录(Amazon 2),因为在这里将执行lambda时,tmp
是唯一可以在Amazon 2中写入文件的目录。
BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd() + "/bin"
def _init_bin(executable_name):
if not os.path.exists(BIN_DIR):
logger.info("Creating bin folder")
os.makedirs(BIN_DIR)
logger.info("Copying binaries for " + executable_name + " in /tmp/bin")
currfile = os.path.join(CURR_BIN_DIR, executable_name)
newfile = os.path.join(BIN_DIR, executable_name)
shutil.copy2(currfile, newfile)
logger.info("Giving new binaries permissions for lambda")
os.chmod(newfile, 0o775)
在handler
函数中,使用下面的选项来避免铬驱动程序引发的少数异常。
def handler(event, context):
_init_bin("headless-chromium")
_init_bin("chromedriver")
options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu-sandbox')
options.add_argument("--single-process")
options.add_argument('window-size=1920x1080')
options.add_argument(
'"user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"')
options.binary_location = "/tmp/bin/headless-chromium"
browser = webdriver.Chrome(
"/tmp/bin/chromedriver", options=options)
发布于 2021-03-12 05:34:49
现在大家都知道,天然铬可以在蓝宝石上工作,你不再需要无锯齿铬了。在这个我的存储库中,最新的一个使用了本机铬,但是您也可以找到在旧版本中使用的无服务器铬。
https://github.com/umihico/docker-selenium-lambda/
到目前为止,我能找到的最新版本如下。
发布于 2021-01-17 16:25:33
桑迪普库马尔的解决方案(改进了它,但不工作,因为我是新用户)。
这是在基于容器的lambda中运行selenium的最小设置。
selenium==3.14.0
FROM public.ecr.aws/lambda/python:3.6
COPY app.py ${LAMBDA_TASK_ROOT}
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
RUN mkdir bin
ADD bin /bin/
RUN chmod 755 /bin/chromedriver
CMD [ "app.handler" ]
from selenium import webdriver
def handler(event, context):
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu-sandbox')
options.add_argument("--single-process")
options.add_argument('window-size=1920x1080')
options.add_argument(
'"user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36"')
options.binary_location = "/bin/headless-chromium"
browser = webdriver.Chrome(
executable_path="/bin/chromedriver", options=options)
browser.get("https://feng.lu")
print(browser.title)
browser.quit()
注意:我没有像桑迪普那样将内容复制到/tmp/bin文件夹中,只需使用bin文件夹,并在docker文件中更新CHMOD权限。
https://stackoverflow.com/questions/65429877
复制相似问题