Let's take the following text as our example:
(leak from java heap) sucks ( rocks!)We are interested in:
leak from java heap rocks!People usually propose a simple regexp:
\(.+?\)1. ( and ) must be escaped with \ because they have special meaning in regexp.
2. The reluctant operator +? is used rather than greedy operator * because we are interested in the smallest possible matching.
However, the proposed regexp returns:
(leak from java heap)( rocks!)and we do not want to include ( ). We want to get rid of that ugly ( ) in one regexp pass. It can be done with look-behind operator. As it states, the operator just examines the predecessor of our target, but do not include it into the matching [info]:
(?<=\()[\w\s!]+
It says:
Find ( [do not include it] and go through next signs until you encounter something different than \w \s and !. ( can be replaced with desired delimiter.
Now, let's try a more difficult example - I had to deal with it at my work. I had to replace whitespaces within '...' with _.
Finding the whitespaces within '...':
(?<='[\w\s]{0,100})\s
It says:
Find ', continue with \w and \s [do not include them] and when you find \s - return it.
{0,100} is used because Java does not support +? or * in look-behind operator. Using them resulted in:
"Look-behind operator does not have an obvious maximum length".
No comments :
Post a Comment