Recently I completed the upgrade of BlogCFC 4.0 (Beta) to the latest 5.9.8. AFAIK, there is no upgrade tool to facilitate a move across such a wide gap (effectively, it's about a 5 or 6 year gap in the code base, and there are some differences in the database schema as well). The impetus for the upgrade was so that I could change hosting providers (I'm now with Hostek, and love it!). As part of that change, I decided to switch from MSSQL to MySQL too. (BTW, I love the iPad client for MySQL, and Hostek has an app to access it's control panel too.)
During the upgrade process, I created some ColdFusion scripts to export from MSSQL as WDDX, then import that data into the the new BlogCFC schema that resides in MySQL. The data export was a simple SELECT * from each table that was fed into CFWDDX's action=cfml2wddx, and that allowed me to archive the backed up data locally as XML files (1 file per table).
What surprised me, however, was that during the data import I couldn't simply reverse the process by deserializing the WDDX XML file, running wddx2cfml, and inserting into the MySQL database. The CFWDDX tag threw a parse error referring to Unicode characters. The deserialization error baffled me as the initial serialization worked just fine.
Here's an abbreviated version of the error:
2
3coldfusion.wddx.WddxDeserializationException: WDDX packet parse error at line 957, column 31. An invalid XML character (Unicode: 0x14) was found in the element content of the document..
4 at coldfusion.wddx.DeserializerWorker.throwSAXException(DeserializerWorker.java:359)
5 at coldfusion.wddx.DeserializerWorker.fatalError(DeserializerWorker.java:245)
6 at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
7 at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
8 at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
9 at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
10 at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
11 at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
12 at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
13 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
14 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
15 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
16 at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
17 at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
18 at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
19 at coldfusion.wddx.DeserializerWorker.deserialize(DeserializerWorker.java:268)
20 at coldfusion.wddx.WddxDeserializer.deserialize(WddxDeserializer.java:96)
21 at coldfusion.tagext.lang.WddxTag.deserialize(WddxTag.java:266)
22 at coldfusion.tagext.lang.WddxTag.doStartTag(WddxTag.java:145)
As you can see, it's essentially a SAX Parser error. I tested a straight forward xmlparse() as well, and it produced the same exception, although with a much smaller stack trace.
2 at coldfusion.xml.XmlProcessor.parse(XmlProcessor.java:208)
3 at coldfusion.runtime.CFPage.XmlParse(CFPage.java:248)
After some thought, I realized that the error message was referring to Unicode characters by their HEX position. I found some lookup tables where I could translate the Unicode HEX value into the equivalent ASCII decimal value. In the case of "Unicode: 0x14", this was the same as the ASCII 20 character ("Device Control 4", whatever that is). I updated my script to replace ascii 20 with empty string, and the parsing got a bit further but then started to hit other Unicode characters. What I found was that it kept hitting Unicode that when converted to ASCII were non-printable characters between 0 and 31 (printable characters start at ASCII 32 with the "Space" character).
Finally, I realized that I could just iterate over the local WDDX file prior to deserialization and simply remove any and all ASCII characters between 0 and 31. That did the trick, and deserialization occurred correctly.
2<cffile action="read" file="/path/to/data/blog_entries.wddx" variable="f">
3
4<!--- replace lower, non-printable ascii chars --->
5<cfloop from="1" to="31" index="i">
6 <cfset f = replace(f,chr(i),"","all")>
7</cfloop>
8
9<!--- fix mixed occurances of ampersands by converting all to entities --->
10<cfset f = replace(f,'&',"&","all")>
11<cfset f = replace(f,"&","&","all")>
12
13<!--- deserialize the xml/wddx back to a query object --->
14<cfwddx action="wddx2cfml" input="#f#" output="q">
15<!--- <cfset x = xmlparse(f)> --->




