Extract urls using java regular expressions
|
|
|
Extract urls using Java regular expressions
In this sample we are using Java regular expressions to extract urls names.
Java method to extract urls
Let's define the regular expression pattern :
((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)
| Pattern | Description | Reference | ||
|---|---|---|---|---|
| ( |
Start of a group #1 |
|||
| ( | Start of a group #3 | |||
| https? | look for http or https | Litteral | ||
| | | ||||
| ftp | ftp protocol | l Litteral | ||
| | | ||||
| gopher | gopher protocol | Litteral | ||
| | | ||||
| telnet | telnet protocol | Litteral | ||
| | | ||||
| file | Litteral | |||
| ) | End of a group #3 | |||
| : | Semicolon separator | Litteral | ||
|
( |
Start of a group #4 | |||
| ( |
Start of a group #5 |
|||
|
// |
Double slash | Litteral | ||
|
) |
End of a group #5
|
|||
|
| |
||||
|
( |
Start of a group #5 |
|||
|
\\\\ |
Double backslash | |||
|
) |
|
End of a group #5 |
||
|
)+ |
|
End of a group #4 |
||
|
[ |
|
Start of a simple character class |
||
|
|
\\w |
Predefined character classes | ||
|
|
\\d |
Any digit |
Predefined character classes |
|
| : | Colon character | Litteral | ||
|
#@%/;$ ()~_?\\+-= |
Number sign or at symbol or percent sign or slash or semicolo or dollar sign or a parenthesis or tilde or underscore or question mark or �plus sign or minus sign or equal sign | Litteral | ||
|
|
\\\\\\ |
triple back slash | ||
|
|
.& |
a dot or an ampersand | Litteral | |
|
]* |
End of a simple character class | Character class | ||
| ) |
|
|||
Extracting the urls using our Pattern
If you execute our method using the following content :
http://www.ubiteck.com/test/mypage.jsf?param1=ok file://simpleFileUrl.txt file:\\\\backslashUrl.txt
Using the following sample code to execute our method :
url :http://www.ubiteck.com/test/mypage.jsf?param1=okTags: java , http , class , file , urls , regular , extract , character , group , litteral , start , sign
url :file://simpleFileUrl.txt
url :file:\\backslashUrl.txt
Comments
RSS feed for comments to this post