Extract domain name using Java regular expression

Friday, 17 June 2011 10:10

Extract domain name using Java regular expressions

In this sample we are using Java regular expressions to extract domain names.

Java method to extract domains

Let's define the regular expression pattern :

[a-z0-9\\-\\.]+\\.(com|org|net|mil|edu|(co\\.[a-z].))

Pattern Description Reference
[a-z0-9\\-\\.]+ one or more times a group of number,letter or hyphen
\\.
( Start of a group #1
com .com domain
|
org

.org domain

|
net

.net domain

|
mil .mil domain
|
edu .edu domain
|
(

Start of a group #1.1

co\\.[a-z]. Country code top level domain (e.g. england.co.uk)
) End of group # 1.1
) End of group #1

Java regex extract multiple domain names


private List<String> extractDomains(String value){
    if (value == null) throw new NullArgumentException("domains to extract");
    List<String> result = new ArrayList<String>();
    String domainPattern = "[a-z0-9\\-\\.]+\\.(com|org|net|mil|edu|(co\\.[a-z].))";
    Pattern p = Pattern.compile(domainPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(value);
    while (m.find()) {
        result.add(value.substring(m.start(0),m.end(0)));
    }
    return result;
}

Extracting the domain using our Pattern

If you execute our method using the following content :

www.subdomain.domain.com www.google.co www.google.co.in www.google.com www.facebook.com  www.google.co.tw

Using the following sample code to execute our method :


String content = " www.subdomain.domain.com www.google.co www.google.co.in www.google.com www.facebook.com www.google.co.tw "; 
List<string> domains = extractDomains(content); 
for (String domain : domains) 
{
    System.out.println("domain :" + domain); 
} 

regex domain extraction result

domain :www.subdomain.domain.com
domain :www.google.co.in
domain :www.google.com
domain :www.facebook.com
domain :www.google.co.tw

Regular expressions related articles

Tags: java , content , method , string , regular , domain , extract , expressions , group , start , www.google.co.tw , www.subdomain.domain.com

Add comment


Security code
Refresh