TIDY up XML source

SCITE includes TIDY for cleaning up HTML/XML documents (rewrapping lines, cleaning up spacing, etc). If you’ve never used Scintilla based products before (and you have if you’ve used SEPY, FlashDevelop, Scite|Flash) it’s a great text editing library that is wrapped in an application layer (Scite). Try out one of the installers they include — they’re ultra lightweight and really nice to have around.

 Information on parameters used by TIDY are noted here. The default values of the cleanup never really worked with how I used the app so I’ve made modifications to the html.properties file in the Scintilla Text Editor folder.


command.name.1.$(file.patterns.xml)=Save and Indent XML
command.1.$(file.patterns.xml)=tidy -xml -indent -modify “$(FilePath)”
command.is.filter.1.$(file.patterns.xml)=1
command.save.before.1.$(file.patterns.xml)=1

 TIDY Quick Reference

File manipulation

-output <file>, -o <file>

write output to the specified <file> (output-file: <file>)

-config <file>

set configuration options from the specified <file>

-file <file>, -f <file>

write errors and warnings to the specified <file> (error-file: <file>)

-modify, -m

modify the original input files (write-back: yes)

Processing directives

-indent, -i

indent element content (indent: auto)

-wrap <column>, -w <column>

wrap text at the specified <column>. 0 is assumed if <column> is missing. When this option is omitted, the default of the configuration option “wrap” applies. (wrap: <column>)

-upper, -u

force tags to upper case (uppercase-tags: yes)

-clean, -c

replace FONT, NOBR and CENTER tags by CSS (clean: yes)

-bare, -b

strip out smart quotes and em dashes, etc. (bare: yes)

-numeric, -n

output numeric rather than named entities (numeric-entities: yes)

-errors, -e

show only errors and warnings (markup: no)

-quiet, -q

suppress nonessential output (quiet: yes)

-omit omit optional end tags (hide-endtags: yes)

-asxml, -asxhtml

convert HTML to well formed XHTML (output-xhtml: yes)

-ashtml

force XHTML to well formed HTML (output-html: yes)

-access <level>

do additional accessibility checks (<level> = 0, 1, 2, 3). 0 is assumed if <level> is missing. (accessibility-check: <level>)

Character encodings

-raw output values above 127 without conversion to entities

-latin0

use ISO-8859-15 for input, US-ASCII for output

-latin1

use ISO-8859-1 for both input and output

-iso2022

use ISO-2022 for both input and output

-utf8 use UTF-8 for both input and output

-win1252

use Windows-1252 for input, US-ASCII for output

-ibm858

use IBM-858 (CP850+Euro) for input, US-ASCII for output

-utf16le

use UTF-16LE for both input and output

-utf16be

use UTF-16BE for both input and output

-utf16 use UTF-16 for both input and output

-shiftjis

use Shift_JIS for both input and output

-language <lang>

set the two-letter language code <lang> (for future use) (language: <lang>)

Miscellaneous

-version, -v

show the version of Tidy

-help, -h, -?

list the command line options

-xml-help

list the command line options in XML format

-help-config

list all configuration options

-xml-config

list all configuration options in XML format

-show-config

list the current configuration settings

About Bela Korcsog

Proud father of two children, happy husband to one wife. I've been programming various technologies and leading the development of huge projects for most of the last ten years. I've got some specific likes and dislikes through my experiences in the web site business but generally I'm pretty straightforward about it. Not a huge fan of the latest and greatest shiny toy (it took me four years to show an interest in Flash) I'm more than happy to code in any language that comes along (Actionscript is just so darn fun).
This entry was posted in Uncategorized. Bookmark the permalink.

Comments are closed.