Steven Erat's Blog Steven Erat Photography
 
 
Viewing By Entry
 
 

TalkingTree  CF_MKVDK: Programmatically Index Documents with Verity's Utility mkvdk

 

Unofficial Workaround for CFMX Crashes When Indexing PDF documents on Solaris.

If you've ever been interested in checking out the Verity tool "mkvdk" or if you are currently experiencing JVM crashes when indexing PDF files on Solaris with CFMX 6.0 or 6.1, then you may want to have a look at a custom tag cf_mkvdkcreated as replacement for using CFINDEX.

In the ColdFusion documentation the mkvdk utility is described, briefly, as:

"The mkvdk utility is an indexing application, provided with other Verity utilities, that you can use to create and maintain collections. It is a command-line utility that you can use within other applications or shell scripts to provide more sophisticated scheduling and other capabilities."
is custom tag is used in much the same way that CFINDEX is used to build and index collections. The tag is passed attribute values for the collection name, the directory to index (recursively), and an optional filter for file type. There are additional options that allow you to build a shell script to perform the same thing from the command line, and to save the output generated both by cfexecute (which calls mkvdk) and by mkvdk itself. The bug that this was developed to workaround occurs when indexing multiple PDF documents via CFINDEX on CFMX and Solaris. The crash may occur at seemingly random points when indexing some number of PDFs, sometimes just 30 docs, sometimes after 100 docs. In that case, CFMX will crash with a signal 11, create a core file, and create a hotspot crash log where the crash log (hs_err_pidNNNNN.log) shows:
An unexpected exception has been detected in native code outside the VM.<BR>Unexpected Signal : 11 occurred at ....

The crash occurs in one of the built-in cfx tags used by ColdFusion as an intermediate to Verity. Circumventing the cfx tags by using cfexecute to operate on mkvdk, Verity collections can be made from PDF documents without the crash. Although the cf_mkvdk will index documents of all types just as CFINDEX would, you may wish to retain usage of CFINDEX for all non-PDF doc types. Alternatively, if you wish to see how mkvdk is used on any platform, Windows, Solaris, or Linux, then the tag can be enabled to save a script which can be run manually. You can pass options to cf_mkvdk which write the cfexecute output and the mkvdk logging output to files in the system temp directory. The mkvdk logging can be enabled at various levels, and can be quite verbose. If CFINDEX is having a small issue that you're trying to solve, then you could use cf_mkvdk and enable logging to see more details about what's going on. The custom tag can also save the "bulk file" which is used as standard input or filespec for mkvdk as a structured list of which files to index, as well as the key and the url fields for each file. The format of the bulk file is not in the ColdFusion Verity documentation, but rather I found a presentation on the web which described this. A small example of the format of a bulk file used in ColdFusion to index 2 documents is shown here:

CF_TITLE: Expressions.pdf
CF_CUSTOM1:
CF_CUSTOM2:
CF_URL: /pdfs/subdir/Expressions.pdf
CF_KEY: C:\\CFusionMX\\wwwroot\\pdfs\\subdir\\Expressions.pdf
VdkVgwKey: C:\\CFusionMX\\wwwroot\\pdfs\\subdir\\Expressions.pdf
<<EOD>>
CF_TITLE: Graphing.pdf
CF_CUSTOM1:
CF_CUSTOM2:
CF_URL: /pdfs/subdir/Graphing.pdf
CF_KEY: C:\\CFusionMX\\wwwroot\\pdfs\\subdir\\Graphing.pdf
VdkVgwKey: C:\\CFusionMX\\wwwroot\\pdfs\\subdir\\Graphing.pdf
<<EOD>>

Once the collection is created and indexed with cf_mkvdk it can be searched with CFSEARCH as would normally be done. The CFSEARCH result will contain the records searched, records matched, and the score, key, title, url, and summary for each match. If you would like to control or fine-tune the summary that is generated for the documents in the collection, see this TechNote and this document. The cf_mkvdk tag has some additional flexibility and can generate full urls with the domain, protocol and port, or just a relative absolute path similar to the output of cgi.script_name. The Verity tool used for searching is rcvdk. I had been hoping to develop a complete alternative for indexing and searching using the Verity tools mkvdk and rcvdk in order to provide those users of ColdFusion on platforms where Verity is unsupported (such as RH 7.3, 8, and 9) the ability to build and search collections.

However, the rcvdk utility requires interaction and I've not found any way of passing it all the necessary commands through standard input of a control file. All the research that I've done on rcvdk has indicated that the tool must be run interactively, and a simple bat file, shell script, or single command line execution won't cut it. There are ways to call it without human intervention, and I would imagine that the ColdFusion intermediate (libCFXNeo.dll, libCFXNeo.so) has the ability to do that. I have found one document that describes how to run rcvdk programmatically with Windows Script Host, but that won't solve the Linux problem. If you may know how to do this on Linux and would like to share, then please add a comment and let me know. Until then, you can get the cf_mkvdk tag here. It is commented well and comes with several examples of how to call it. Lastly, this custom tag is not supported by Macromedia, but if you have any suggestions or find a bug in it then please let me know. Thanks!

Update: I've completed a fully self-contained test application with its own variety of documents to index and search. This can be used as a standard of comparison if your having problems with using CFINDEX to build a collection. This test app has a view of the mkvdk logs and output from the client, so you don't have to dig through the file system.

DOWNLOAD TEST APPLICATION

 


Comments

Hey,


First want to say, this is pretty cool.. looks to be some fun hacking. I have a question... I am working with verity K2 server on Solaris 8. I seem to be crashing the server with just several successive searches, suddenly JRun is dead and k2server is still happily processing. Max threads on CFMX are 20, Listeners and Threads on k2server.ini are 25 (should be plenty, but ive tried lots of settings for these to no avail).


We do have a lot of pdfs... Does this indexing 'crash' corrupt the data in our collection somehow, so that as I search it I am seeing a ghost of this same problem?


Any ideas you have would be helpful,


Joshua



I'm trying to use this tag on Unix.
I noticed that sometimes the collection folder is created and sometimes it's deleted.
I'm not sure what's going on here.
Can you please let me know.
Thanks


Hi Pushpa,

I really don't support this tag. Since you have the source, you're free to rewrite it to suit your needs.

In short:
- its called as a custom tag
- it does a recursive directory search for matching file names
- it creates a specially formatted bulk file with Verity input information
- it executes mkvdk utility and passes the bulk file as input

The mkvdk utility can create a Verity collection file structure using the -create switch but ColdFusion won't know about it. The way I wrote the custom tag, it tries to create a collection using CFCOLLECTION so that ColdFusion then knows about it and it shows up in the CFAdmin.

I recommend that you consider modifying the code to alter any behavior you currently see.

By the way, Blackstone (CF 7) will come with a new version of Verity that should largely eliminate lingering problems like this because Verity will no longer use Native Code that may be prone to crash.


Steven,

Do you have any more status on this problem? We use Verity k2e to index, not cfindex. But we do get at least 5 hotspot crashes everyday on random searches to these collections using cfsearch.


Tony,

While you might be using the K2 server with ColdFusion, the K2 server only does searching, not indexing. Indexing in CF5, CFMX 6.x, and CFMX 7 is handled by the utility mkvdk which is called by an intermediate libCFXNeo.so. The JVM runs java code which then calls libCFXNeo.so as "native code", and that in turn calls mkvdk.

After I got started building a custom tag to call mkvdk directly, it looked useful for overcoming the specific problem of the JVM crashing *when indexing pdf files in a loop*. That was the one known scenario where calls to Verity would crash the JVM.

So if you're not indexing pdf file in a loop, then you have a different problem. It could be this one which occurs on Linux since the 1.4 JVMs.
Hotpot crashes on Linux




Thanks for your work! We have been suffering from this bug too and your tag solves the problem perfectly.


 

 

Calendar

 
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Search This Site

 
This is an exact search only

About This Site

 
I live west of Boston and work as a Software Engineer with ColdFusion and Flex, specializing in Linux. Recently I graduated in Professional Digital Photography from CDIA.
More about me

Recent Entries

 
A ColdFusion Trick for Lost D..
Starting ColdFusion9 Solr: Us..
Adobe LiveCycle DataServices ..

Recent Comments

 
Posted By Aaron Longnion:
Thanks Steven, I just ran into this problem, remembered your tweet about it, and found your blog on it. :)

Posted By srinyvas:
Hai, This information is very useful and i like your excellent writing skill. Can i copy this Content to my website top management colleges ...

Posted By Steven Erat:
@Wade - Glad I could help! Thanks for letting me know it worked for you too.

recently played

 
The Candid Frame #70 - Greg Gorman
by Ibarionex R. Perello
on The Candid Frame: A Photography Podcast

now playing, a plug-in for itunes

Categories

 
RSS Adobe (34)
RSS Bicycling (9)
RSS Blogging (39)
RSS Books (13)
RSS Breeze (13)
RSS CFMX Podcasts (10)
RSS ColdFusion (427)
RSS Computer Technology (51)
RSS Events (26)
RSS Flex (20)
RSS Gadgets (10)
RSS HiTech Industry (16)
RSS Java (25)
RSS Learning (57)
RSS Linux (70)
RSS Mac OS X (22)
RSS Macromedia (27)
RSS Meetup (35)
RSS New England (62)
RSS Odds & Ends (25)
RSS Outdoors (32)
RSS Personal (29)
RSS Photography (111)
RSS Photoshop (29)
RSS Podcasts (18)
RSS Rants (19)
RSS Restaurants (8)
RSS Science (34)
RSS Spain (16)
RSS Travel (42)
RSS Twitter (10)
RSS Video (20)
RSS Webcam (3)
RSS Writing (10)

Blogs I Read

 
Terrence Ryan
Ben Forta
Ray Camden
Kinky Solutions
Dan Vega
Gary Gilbert
Simeon Bateman
Red Hat Blogs
O'Reilly Digital Media
O'Reilly Radar
John Nack
The Strobist
Scott Kelby
Matt Kloskowski
Joe McNally
Digital Photography School
Engadget
Science Blog

RSS

 


Add to Google
Add to My Yahoo!

Aggregated By

 


Consumed By Feed-Squirrel.com
Aggregated by ColdFusionBlogger.org

Credits and Stuff

 
BlogCFC - Free ColdFusion Powered Blog Software
CJM Group - ColdFusion Website Hosting


 
 
blog | photos | flickr | referers | webcam | stats | about | contact
 
Copyright © 2010 Steven Erat. All rights reserved.
This is a personal weblog. The opinions expressed here represent my own and not those of my employer