This simple Javascript function attempts to format lines within a <pre> tag. I needed a means of making blocks of code more readable when publishing snippets on this site and had seen that many other sites do the same.
At the moment my utility is very simple but without too much more work could be extended to format different languages.
Below, I'm using the code formatter to highlight the code of itself, so no separate demo is needed! Although, if you wish to use the code you should download it from here rather than try and copy if from below as I did have to modify the code to appear correctly. This was necessary due to HTML elements being output within the code that is then being displayed within a HTML page. The code formatter isn't that smart, yet.
The code considers all <pre> tags within the page and checks to see if a 'lang' (language) parameter is specified. If it is then the contents of this tag are passed across to the formatCode function for formatting.
The contents are split into individual lines and any line break characters removed. These are no longer needed as the final output will comprise a pair of spans for each line: one to hold a sequential line number and a second to hold the formatted content.
Some general replacements covering HTML elements and quoted strings are processed followed by language-specific items. At the moment this is simply a list of language keywords for Java-style languages, comments and HTML tags.
The formatted line number and content spans are accumulated and once all lines are processed, written directly back to the original container tag.
A function called colourKeywords does exactly as its name suggests, matching a line of code with a list of keywords and styling as needed by encapsulating the keyword within a span tag.
//Colour constants
var fc_cmt="#888";
var fc_html="#11a";
var fc_quot="#a24";
var fc_kwds="#008";
//Language keywords
var fc_java_kwds="public|int|float|double|private|new|
void|synchronized|if|for|byte|break|else";
var pres=document.getElementsByTagName("pre");
for (var a=0; a<pres.length; a++) {
var elem=pres[a];
if (elem.className.toLowerCase()=='code') formatCode(elem);
}
function formatCode(precode) {
var lang=precode.lang.toLowerCase();
var textlines=precode.innerHTML.split(/\r|\n/);
var linecount=1;
var newcode="";
//Process each line of text
for (var b=0; b<textlines.length; b++) {
var code=textlines[b];
//Remove line/form feed characters
code=code.replace(/\f|\n/g,"");
//Decode special HTML elements:- ampersand, less than, greater than
if (lang=="html") code=code.replace(/&/g,'&')
.replace(/</g,'<').replace(/>/g,'>');
//Double quoted string
code=code.replace(/(".+")/g,"<span
style=\"color: "+fc_quot+";\">$1</span>");
//Single quoted string
code=code.replace(/('.+')/g,"<span
style=\"color: "+fc_quot+";\">$1</span>");
//HTML
if (lang=="html") {
//tags
code=code.replace(/<(\S.*?)>/g,"<span
style=\"color: "+fc_html+";\"><$1></span>");
//comments
code=code.replace(/<!--/g,"<span
style=\"color: "+fc_cmt+";\"><!--");
code=code.replace(/-->/g,"--></span>");
}
//Java
if (lang=="java") {
//comments
code=code.replace(/(\/\/.*)/,"<span
style=\"color: "+fc_cmt+";\">$1</span>");
//keywords
code=colourKeywords(fc_java_kwds,code);
}
//Accumulate line numbers and reformatted text
var formatline=(" "+linecount).slice(-3);
newcode+="<span style=\"background: #bbb; color: #000;
border-right: solid 2px #2b2; padding-right: 7px;\">"+
formatline+"</span>"+code+"<br />";
linecount++;
}
//Assign formatted text back to PRE element
//The outerHTML is used for IE so that
//whitespace is retained.
if ("outerHTML" in elem) {
elem.outerHTML="<pre class='code'>"+newcode+"</pre>";
} else {
elem.innerHTML=newcode;
}
}
function colourKeywords(keywords,codeline) {
var wordre=new RegExp("("+keywords+") ","gi");
return codeline.replace(wordre,"<span
style=\"color: "+fc_kwds+";\">$1 </span>");
}
Hopefully you can see how support for additional languages could easily be added. Also, it would be useful if it catered for mixed/multiple languages, e.g. embedded HTML code within Javascript quoted strings.