Jsoup下载错误。说必须登录,但没有登录

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (1)
  • 关注 (0)
  • 查看 (23)

规格:我公司的服务器运行Jsoup,根据我提供的链接下载pdfs

我有时遇到这个问题,一个网站有一个文件(pdf或其他),我可以通常从我的浏览器下载,但通过我的抓取软件,它返回一个错误,如这

Something went wrong. Oh no! Something is not right! Try to log in again. If you continue to see this error, please contact us at support@agendapal.com Error description: MessageInvalid URI: The Authority/Host could not be parsed. TargetSiteVoid CreateThis(System.String, Boolean, System.UriKind) StackTrace at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind) at SWPalInc.WebHost.Controllers.DController.F(String u, String n) at lambda_method(Closure , ControllerBase , Object[] ) at System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary2 parameters) at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary2 parameters) at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClass15.b__12() at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters) at System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) at System.Web.Mvc.Controller.ExecuteCore() at System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) at System.Web.Mvc.MvcHandler.<>c__DisplayClass6.<>c__DisplayClassb.b__5() at System.Web.Mvc.Async.AsyncResultWrapper.<>c__DisplayClass1.b__0() at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) DataSystem.Collections.ListDictionaryInternal InnerException SourceSystem Click here and try to login again

当我尝试使用我的公司服务器https://meetings.municode.com/d/f?u=https://agendapalncus.blob.core.windows.net从这样的链接中提取pdf时,我收到了该错误。/paonia-pubu/MEET-Agenda-e11f135d48564ad983c6c46949e34894.pdf&n=Agenda-Regular%20Town%20Board%20Meeting-February%2026,%202019%206.30%20PM.pdf

我尝试过使用代理服务器,但是当我抓取它时,我遇到了同样的问题。有人知道这个问题的解决方案还是之前看过这个?

提问于
用户回答回答于

当我尝试使用Jsoup解析此URL时,它会抛出

Exception in thread "main" org.jsoup.UnsupportedMimeTypeException: Unhandled content type.
Must be text/*, application/xml, or application/xhtml+xml.

所以它似乎正在抛出适当的,明确的例外。尝试捕获并处理此异常。我就是这样用Java做的:

    try {
        doc = Jsoup.connect(url).get();
        (...)
    } catch (UnsupportedMimeTypeException ex) {
        // handle exception here
    }

扫码关注云+社区

领取腾讯云代金券