给定一个URL,我想提取域名(它不应该包含'www‘部分)。Url可以包含http/https。这是我写的java代码。尽管它似乎工作得很好,但有没有更好的方法,或者有一些边缘情况,这可能会失败。
public static String getDomainName(String url) throws MalformedURLException{
if(!url.startsWith("http") && !url.startsWith("https")){
url = "http://" + url;
}
URL netUrl = new URL(url);
String host = netUrl.getHost();
if(host.startsWith("www")){
host = host.substring("www".length()+1);
}
return host;
}
输出: google.com
发布于 2014-02-27 12:50:16
import java.net.*;
import java.io.*;
public class ParseURL {
public static void main(String[] args) throws Exception {
URL aURL = new URL("http://example.com:80/docs/books/tutorial"
+ "/index.html?name=networking#DOWNLOADING");
System.out.println("protocol = " + aURL.getProtocol()); //http
System.out.println("authority = " + aURL.getAuthority()); //example.com:80
System.out.println("host = " + aURL.getHost()); //example.com
System.out.println("port = " + aURL.getPort()); //80
System.out.println("path = " + aURL.getPath()); // /docs/books/tutorial/index.html
System.out.println("query = " + aURL.getQuery()); //name=networking
System.out.println("filename = " + aURL.getFile()); ///docs/books/tutorial/index.html?name=networking
System.out.println("ref = " + aURL.getRef()); //DOWNLOADING
}
}
发布于 2013-11-24 02:23:00
我写了一个方法(见下文),它提取一个url的域名,并使用简单的字符串匹配。它实际做的是提取第一个"://"
(如果没有包含"://"
,则为索引0
)和第一个后续"/"
(如果没有后续"/"
,则为索引String.length()
)之间的位。剩余的、在前的"www(_)*."
位被截断。我肯定会有这样做不够好的情况,但在大多数情况下应该足够好!
Mike Samuel在上面的帖子中说java.net.URI
类可以做到这一点(并且比java.net.URL
类更受欢迎),但是我遇到了URI
类的问题。值得注意的是,如果url不包括该方案,即"http(s)"
比特,则URI.getHost()
给出空值。
/**
* Extracts the domain name from {@code url}
* by means of String manipulation
* rather than using the {@link URI} or {@link URL} class.
*
* @param url is non-null.
* @return the domain name within {@code url}.
*/
public String getUrlDomainName(String url) {
String domainName = new String(url);
int index = domainName.indexOf("://");
if (index != -1) {
// keep everything after the "://"
domainName = domainName.substring(index + 3);
}
index = domainName.indexOf('/');
if (index != -1) {
// keep everything before the '/'
domainName = domainName.substring(0, index);
}
// check for and remove a preceding 'www'
// followed by any sequence of characters (non-greedy)
// followed by a '.'
// from the beginning of the string
domainName = domainName.replaceFirst("^www.*?\\.", "");
return domainName;
}
发布于 2014-11-04 19:13:07
在创建URI对象之后,我做了一个小处理
if (url.startsWith("http:/")) {
if (!url.contains("http://")) {
url = url.replaceAll("http:/", "http://");
}
} else {
url = "http://" + url;
}
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
https://stackoverflow.com/questions/9607903
复制相似问题