首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Selenium Docker容器在EC2上运行,而在AWS上不运行

Selenium Docker容器在EC2上运行,而在AWS上不运行
EN

Stack Overflow用户
提问于 2021-09-03 15:31:50
回答 1查看 716关注 0票数 2

我想通过Docker容器在AWS Lamda上运行selenium脚本。

我使用AWS EC2构建,然后通过在本地测试容器。一旦测试成功,容器将在ECR上注册,以便输入AWS Lambda。

尽管RIE在EC2上的本地测试总是成功的,但我无法使Lambda正常工作。Lambda测试目前总是失败,有以下错误消息:

代码语言:javascript
运行
复制
{
  "errorMessage": "Message: session not created\nfrom tab crashed\n  (Session info: headless chrome=93.0.4577.63)\n",
  "errorType": "SessionNotCreatedException",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 32, in handler\n    driver = webdriver.Chrome(\n",
    "  File \"/var/task/selenium/webdriver/chrome/webdriver.py\", line 76, in __init__\n    RemoteWebDriver.__init__(\n",
    "  File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 157, in __init__\n    self.start_session(capabilities, browser_profile)\n",
    "  File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 252, in start_session\n    response = self.execute(Command.NEW_SESSION, parameters)\n",
    "  File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 321, in execute\n    self.error_handler.check_response(response)\n",
    "  File \"/var/task/selenium/webdriver/remote/errorhandler.py\", line 242, in check_response\n    raise exception_class(message, screen, stacktrace)\n"
  ]
}

在这里,您可以找到我实际使用的所有代码:

Dockerfile

代码语言:javascript
运行
复制
FROM public.ecr.aws/lambda/python:3.8

#Download and install Chrome
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm > ./google-chrome-stable_current_x86_64.rpm
RUN yum install -y ./google-chrome-stable_current_x86_64.rpm
RUN rm ./google-chrome-stable_current_x86_64.rpm

#Download and install chromedriver
RUN yum install -y unzip
RUN curl http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip > /tmp/chromedriver.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN rm /tmp/chromedriver.zip
RUN yum remove -y unzip

#Upgrade pip and install python dependences
RUN pip3 install --upgrade pip
RUN pip3 install selenium --target "${LAMBDA_TASK_ROOT}"

#Copy app.py
COPY app.py ${LAMBDA_TASK_ROOT}

CMD ["app.handler"]

app.py

代码语言:javascript
运行
复制
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

def handler(event, context):

    chrome_options = Options()
    
    chrome_options.add_argument("--allow-running-insecure-content")
    chrome_options.add_argument("--ignore-certificate-errors")
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--disable-dev-tools")
    chrome_options.add_argument("--no-zygote")
    chrome_options.add_argument("--v=99")
    chrome_options.add_argument("--single-process")

    chrome_options.binary_location = '/usr/bin/google-chrome-stable'

    capabilities = webdriver.DesiredCapabilities().CHROME
    capabilities['acceptSslCerts'] = True
    capabilities['acceptInsecureCerts'] = True
        
    driver = webdriver.Chrome(
        executable_path='/usr/local/bin/chromedriver',
        options=chrome_options,
        desired_capabilities=capabilities)

    if driver:
    
        response = {
            "statusCode": 200,
            "body": json.dumps("Selenium Driver Initiated")
        }
    
        return response

基于RIE的本地容器测试

代码语言:javascript
运行
复制
$ docker run -p 9000:8080 aws-scraper
  results in > time="2021-09-03T15:24:13.269" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"

$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
  results in > {"statusCode": 200, "body": "\"Selenium Driver Initiated\""}[

我真的搞不懂。我也试着跟踪Selenium works on AWS EC2 but not on AWS Lambda,但没有结果。

任何帮助都会受到极大的欢迎。提前谢谢你。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-09-15 14:45:53

通过从这个repo:https://github.com/rchauhan9/image-scraper-lambda-container.git借用dockerfile和选项来解决这个问题

Dockerfile现在看起来如下所示:

代码语言:javascript
运行
复制
# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.9"
ARG DISTRO_VERSION="3.12"


# Stage 1
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine

RUN apk add --no-cache \
    libstdc++

# Stage 2
FROM python-alpine AS build-image

RUN apk add --no-cache \
    build-base \
    libtool \
    autoconf \
    automake \
    libexecinfo-dev \
    make \
    cmake \
    libcurl

ARG FUNCTION_DIR
ARG RUNTIME_VERSION

RUN mkdir -p ${FUNCTION_DIR}

RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}


# Stage 3
FROM python-alpine as build-image2

ARG FUNCTION_DIR

WORKDIR ${FUNCTION_DIR}

COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}

RUN apk update \
    && apk add gcc python3-dev musl-dev \
    && apk add jpeg-dev zlib-dev libjpeg-turbo-dev

COPY requirements.txt .

RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}

# Stage 4
FROM python-alpine

ARG FUNCTION_DIR

WORKDIR ${FUNCTION_DIR}

COPY --from=build-image2 ${FUNCTION_DIR} ${FUNCTION_DIR}

RUN apk add jpeg-dev zlib-dev libjpeg-turbo-dev \
    && apk add chromium chromium-chromedriver

ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie

RUN chmod 755 /usr/bin/aws-lambda-rie

COPY app/* ${FUNCTION_DIR}
COPY entry.sh /

ENTRYPOINT [ "/entry.sh" ]

CMD [ "app.handler" ]

app.py现在看起来如下所示;

代码语言:javascript
运行
复制
import json

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

def handler(event, context):

    chrome_options = Options()
    
    chrome_options.add_argument('--autoplay-policy=user-gesture-required')
    chrome_options.add_argument('--disable-background-networking')
    chrome_options.add_argument('--disable-background-timer-throttling')
    chrome_options.add_argument('--disable-backgrounding-occluded-windows')
    chrome_options.add_argument('--disable-breakpad')
    chrome_options.add_argument('--disable-client-side-phishing-detection')
    chrome_options.add_argument('--disable-component-update')
    chrome_options.add_argument('--disable-default-apps')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--disable-domain-reliability')
    chrome_options.add_argument('--disable-extensions')
    chrome_options.add_argument('--disable-features=AudioServiceOutOfProcess')
    chrome_options.add_argument('--disable-hang-monitor')
    chrome_options.add_argument('--disable-ipc-flooding-protection')
    chrome_options.add_argument('--disable-notifications')
    chrome_options.add_argument('--disable-offer-store-unmasked-wallet-cards')
    chrome_options.add_argument('--disable-popup-blocking')
    chrome_options.add_argument('--disable-print-preview')
    chrome_options.add_argument('--disable-prompt-on-repost')
    chrome_options.add_argument('--disable-renderer-backgrounding')
    chrome_options.add_argument('--disable-setuid-sandbox')
    chrome_options.add_argument('--disable-speech-api')
    chrome_options.add_argument('--disable-sync')
    chrome_options.add_argument('--disk-cache-size=33554432')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--ignore-gpu-blacklist')
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--metrics-recording-only')
    chrome_options.add_argument('--mute-audio')
    chrome_options.add_argument('--no-default-browser-check')
    chrome_options.add_argument('--no-first-run')
    chrome_options.add_argument('--no-pings')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--no-zygote')
    chrome_options.add_argument('--password-store=basic')
    chrome_options.add_argument('--use-gl=swiftshader')
    chrome_options.add_argument('--use-mock-keychain')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--headless')

    chrome_options.add_argument('--user-data-dir={}'.format('/tmp/user-data'))
    chrome_options.add_argument('--data-path={}'.format('/tmp/data-path'))
    chrome_options.add_argument('--homedir={}'.format('/tmp'))
    chrome_options.add_argument('--disk-cache-dir={}'.format('/tmp/cache-dir'))
        
    driver = webdriver.Chrome(
        executable_path='/usr/bin/chromedriver',
        options=chrome_options)

    if driver:
        print("Selenium Driver Initiated")
    
    response = {
        "statusCode": 200,
        "body": json.dumps(html, ensure_ascii=False)
    }

    return response

老实说,我仍然不明白为什么这些修改会起作用。任何关于这一点的想法都是非常受欢迎的!

再次感谢大家的帮助和支持

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69047401

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档