Just thoughts

Friday, May 21, 2010

Regex matching code comments - Java

Okay. So I hate Regex for some reason, yet have to use it cos... cos I have to. I spent my whole day trying to figure out a Regex pattern which would match any code comment. I want to strip them off  (i.e. replaceAll(pattern, "")) from CSS files cos they're useless, they take up unnecessary space and bandwidth.

In English. I want this:


/*This is my comment
spanning across multiple lines*/
body {background-color: #000}

to become this:


body {background-color: #000}

Easy, isn't it? It's not that easy as it turned out. I came up with this pattern first; I think this was the child of my own brain, but after all the hours spent trying to find a working pattern, I really don't remember:

(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)

Humm. Looks OK, isn't it? It's not OK. Here's this multiline comment:


/* This is my comment
*  spanning across multiple lines
*  and having asterisks at every new line
*  cos that's cool
*/

If I use (?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*) like presented below on the above comment block, I will get an awesome stack overflow error (read, I get a nice HTTP500 on Ant):


public static String compress(String s){
  s = s.replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)", "");
  return s;
}

Guess I get an infinite loop.

So I need a better pattern else I'll get SO errors every once in a while. I'm not sure where but I got this pattern:

//.*|(\"(?:\\\\[^\"]|\\\\\"|.)*?\")|(?s)/\\*.*?\\*/

This one works surprisingly well for now. Guess this also has some limitations I'm not aware of right now but at least I don't get SO errors.

Update:
Okay, so it strips relative URLs as well. Awesome :/

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home