Unofficial Workaround for CFMX Crashes When Indexing PDF documents on Solaris.

If you've ever been interested in checking out the Verity tool "mkvdk" or if you are currently experiencing JVM crashes when indexing PDF files on Solaris with CFMX 6.0 or 6.1, then you may want to have a look at a custom tag cf_mkvdkcreated as replacement for using CFINDEX.

In the ColdFusion documentation the mkvdk utility is described, briefly, as:

"The mkvdk utility is an indexing application, provided with other Verity utilities, that you can use to create and maintain collections. It is a command-line utility that you can use within other applications or shell scripts to provide more sophisticated scheduling and other capabilities."

is custom tag is used in much the same way that CFINDEX is used to build and index collections. The tag is passed attribute values for the collection name, the directory to index (recursively), and an optional filter for file type. There are additional options that allow you to build a shell script to perform the same thing from the command line, and to save the output generated both by cfexecute (which calls mkvdk) and by mkvdk itself. The bug that this was developed to workaround occurs when indexing multiple PDF documents via CFINDEX on CFMX and Solaris. The crash may occur at seemingly random points when indexing some number of PDFs, sometimes just 30 docs, sometimes after 100 docs. In that case, CFMX will crash with a signal 11, create a core file, and create a hotspot crash log where the crash log (hs_err_pidNNNNN.log) shows:

An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 11 occurred at ....

The crash occurs in one of the built-in cfx tags used by ColdFusion as an intermediate to Verity. Circumventing the cfx tags by using cfexecute to operate on mkvdk, Verity collections can be made from PDF documents without the crash. Although the cf_mkvdk will index documents of all types just as CFINDEX would, you may wish to retain usage of CFINDEX for all non-PDF doc types. Alternatively, if you wish to see how mkvdk is used on any platform, Windows, Solaris, or Linux, then the tag can be enabled to save a script which can be run manually. You can pass options to cf_mkvdk which write the cfexecute output and the mkvdk logging output to files in the system temp directory. The mkvdk logging can be enabled at various levels, and can be quite verbose. If CFINDEX is having a small issue that you're trying to solve, then you could use cf_mkvdk and enable logging to see more details about what's going on. The custom tag can also save the "bulk file" which is used as standard input or filespec for mkvdk as a structured list of which files to index, as well as the key and the url fields for each file. The format of the bulk file is not in the ColdFusion Verity documentation, but rather I found a presentation on the web which described this. A small example of the format of a bulk file used in ColdFusion to index 2 documents is shown here:

CF_TITLE: Expressions.pdf
CF_URL: /pdfs/subdir/Expressions.pdf
CF_KEY: C:\CFusionMX\wwwroot\pdfs\subdir\Expressions.pdf
VdkVgwKey: C:\CFusionMX\wwwroot\pdfs\subdir\Expressions.pdf
CF_TITLE: Graphing.pdf
CF_URL: /pdfs/subdir/Graphing.pdf
CF_KEY: C:\CFusionMX\wwwroot\pdfs\subdir\Graphing.pdf
VdkVgwKey: C:\CFusionMX\wwwroot\pdfs\subdir\Graphing.pdf

Once the collection is created and indexed with cf_mkvdk it can be searched with CFSEARCH as would normally be done. The CFSEARCH result will contain the records searched, records matched, and the score, key, title, url, and summary for each match. If you would like to control or fine-tune the summary that is generated for the documents in the collection, see this TechNote and this document. The cf_mkvdk tag has some additional flexibility and can generate full urls with the domain, protocol and port, or just a relative absolute path similar to the output of cgi.script_name. The Verity tool used for searching is rcvdk. I had been hoping to develop a complete alternative for indexing and searching using the Verity tools mkvdk and rcvdk in order to provide those users of ColdFusion on platforms where Verity is unsupported (such as RH 7.3, 8, and 9) the ability to build and search collections.

However, the rcvdk utility requires interaction and I've not found any way of passing it all the necessary commands through standard input of a control file. All the research that I've done on rcvdk has indicated that the tool must be run interactively, and a simple bat file, shell script, or single command line execution won't cut it. There are ways to call it without human intervention, and I would imagine that the ColdFusion intermediate (libCFXNeo.dll, has the ability to do that. I have found one document that describes how to run rcvdk programmatically with Windows Script Host, but that won't solve the Linux problem. If you may know how to do this on Linux and would like to share, then please add a comment and let me know. Until then, you can get the cf_mkvdk tag here. It is commented well and comes with several examples of how to call it. Lastly, this custom tag is not supported by Macromedia, but if you have any suggestions or find a bug in it then please let me know. Thanks!

Update: I've completed a fully self-contained test application with its own variety of documents to index and search. This can be used as a standard of comparison if your having problems with using CFINDEX to build a collection. This test app has a view of the mkvdk logs and output from the client, so you don't have to dig through the file system.