Steven Erat's Blog
 
 
Viewing By Entry
 
 

TalkingTree  Usage of Verity's vspider utility with CFMX 7 on Unix or Linux

 

An article was published today that provides a fix for the Verity spidering utility known as vspider when used with ColdFusion MX 7. That article provides some additional style files will help populate the appropriate metadata fields of a collection such as Title, URL, Size, etc... The article also comes with an example of using vspider on Windows.

ColdFusion MX 7: Additional files for using Verity Spider

It should be noted that when running vpsider on Unix or Linux, that the environmental variable for LD_LIBRARY_PATH must also be set to include the location of core Verity binary files. Its often useful to create a script to set up all the vspider commands, and in that script you can set the LD_LIBRARY_PATH to include the Verity bin directory.

An example of running vspider without first adjusting LD_LIBRARY_PATH follows. Note that it fails with a missing dependency even though the dependency library is in the same directory.

bash-2.03# pwd
/opt/coldfusionmx7/verity/k2/_ssol26/bin

bash-2.03# ./vspider
ld.so.1: ./vspider: fatal: libvdk30.so: open failed: No such file or directory
Killed

bash-2.03# ls -l libvdk30.so
-rwxrwxr-x 1 nobody other 3893632 Sep 24 2004 libvdk30.so

bash-2.03# ldd vspider
libvdk30.so => (file not found)
libvdiag.so => (file not found)
libsocket.so.1 => /usr/lib/libsocket.so.1
libnsl.so.1 => /usr/lib/libnsl.so.1
libresolv.so.2 => /usr/lib/libresolv.so.2
libthread.so.1 => /usr/lib/libthread.so.1
libc.so.1 => /usr/lib/libc.so.1
libdl.so.1 => /usr/lib/libdl.so.1
libm.so.1 => /usr/lib/libm.so.1
libCrun.so.1 => /usr/lib/libCrun.so.1
libw.so.1 => /usr/lib/libw.so.1
libmp.so.2 => /usr/lib/libmp.so.2
/usr/platform/SUNW,UltraAX-i2/lib/libc_psr.so.1

An example script is shown below that is used to run the vspider utility to spider localhost. This example restricts the spidering to the web documents under /vspider_target/, which is a test directory containing a mixture of files of various extensions and content. Note the usage of the -start and -include switches to contain the spidering activity. I personally like to pipe the results to an output file (>> out.txt)as a convenient record of events. Note also how LD_LIBRARY_PATH is set to contain the Verity bin directory.

#!/bin/sh
CFVERITY=/opt/coldfusionmx7/verity;export CFVERITY
CFMXPORT=8501;export CFMXPORT
PATH=$PATH:$CFVERITY/k2/_ssol26/bin;export PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CFVERITY/k2/_ssol26/bin;export LD_LIBRARY_PATH
# set the collection name as an input argument and pass to the COLL variable
COLL=$1
# the following commands should be contained all on one line
vspider -style $CFVERITY/Data/stylesets/ColdFusionVspider -collection $CFVERITY/collections/$COLL
-include "*/vspider_target*" -start "http://localhost:$CFMXPORT/vspider_target/" >> out.txt

For Linux, use _ilnx21 in place of _ssol26.

The script is made executable while logged in as root with chmod u+x, and then executed:

bash-2.03# ./run_vspider.sh newSpiderTest1

The contents of the output file are shown here. Observe the Inserting of files, and the summary at the end:

bash-2.03# cat out.txt
vspider - Verity, Inc. Version K5.5.0 (_ssol26, Sep 24 2004)
2005/03/31 14:53:52 Info: [vspider] (ind006000) Message database loaded from [/opt/coldfusionmx7/verity/k2/common/ind.msg].
2005/03/31 14:53:55 Info: [vspider] (ind006001) License loaded from [/opt/coldfusionmx7/verity/k2/common/runtime.lic].
2005/03/31 14:53:55 Info: [vspider] (ind005005) Licensed for local spidering.
2005/03/31 14:53:55 Info: [vspider] (ind005008) Not licensed for remote spidering.
2005/03/31 14:53:55 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/].
2005/03/31 14:53:55 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/CodeSweeper.log].
2005/03/31 14:53:55 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/bar.cfm].
2005/03/31 14:53:55 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/baz.cfm].
2005/03/31 14:53:55 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/foo.cfm].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/foo.htm].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/foo.html].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/foo.pdf].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/foo.doc].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/foo.txt].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/qux.cfm].
2005/03/31 14:53:56 Progress: [vspider] (ind031000) Inserting [http://localhost:8501/vspider_target/search.cfm].
2005/03/31 14:54:03 Progress: [vspider] (ind002115) Optimizing VDK collection [/opt/coldfusionmx7/verity/collections/newSpiderTest1].
Progress: [vspider] (ind010020) Vspider summary: Submitted 12 documents for insert,0 documents for deletion,0 documents for update;
Progress: [vspider] (ind010021) Vspider summary: Indexed 12 documents, Deleted 0 documents, 0 bad documents;
Progress: [vspider] (ind010022) Vspider summary: Skipped 1 keys, including 0 duplicate documents rejected;
Progress: [vspider] (ind010023) Vspider summary: Failed to fetch 0 keys.
vspider done

When indexing with Vspider, it will create the collection if the collection does not already exist. Looking at the collection files created, you'll see they are owned by the user and group root/other, which differs from collections generated and indexed through the ColdFusion Administrator alone where those will have the user/group set as the ColdFusion runtime user, in this case nobody/nobody for the bookclub collection:

bash-2.03# ls -l /opt/coldfusionmx7/verity/collections/
total 14
drwxr-xr-x 12 nobody nobody 512 Mar 30 13:54 bookclub
-rwxrwxr-x 1 nobody other 0 Jun 5 2003 empty.txt
drwxr-xr-x 12 root other 512 Mar 31 14:53 newSpiderTest1

The permissions here should generally be ok, but if you find any problems in the CFAdmin then you should chown -R the collection directory to set the user as the ColdFusion runtime user.

The vspider utility does not update the ColdFusion configuration files to make it aware of the new collection. Following along with the Technote instructions, enter the CFAdmin and add a collection having the same name. The collection directory will not be overwritten or touched, but doing this will cause a corresponding entry in the config file neo-verity.xml so that your application can reference the collection by name. Do not run any other operations on the collection from the CF Admin, such as indexing, repairing, or purging according to the Technote instructions.

The vspider collection is now searchable via CFSEARCH from your application.

A word of caution: Do not turn on Directory Browsing in your webserver when getting started with vspider. If you don't restrict vspider's search scope properly with the -start and -include options, you could easily end up having vspider index and run every single document on your webserver. Since I work in tech support, I have a multitude of files that do destructive operations like deleting files or deleting records from a table, or annoying operations like sending out lots of email to myself. I made this mistake early on and ended up causing all kinds of havoc on my system and also managed to spam myself pretty good.

 


Comments

Zuh? Ouch,my brain........must drink beer to take away pain.....


Good to know the technical proficieny of my blog's readership is so exhastive and comprehensive!


See also this usage guide for the search utility rcvdk

http://www.talkingtree.com/blog/index.cfm?mode=alias&alias=CFMX7rcvdk


There's a very well written tutorial for getting started with VSpider on MonkeyFlash blog:
http://www.monkeyflash.com/archives/2006/11/05/ind...


 

 

Calendar

 
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Search This Site

 
This is an exact search only

About This Site

 
I live west of Boston and work for Adobe with ColdFusion and Flex, and specialize in Linux. I'm also interested in travel and science, and I'm studyng photography at CDIA. Curious about my banner image?

More about me

Recent Entries

 
No recent entries.

Recent Comments

 
Posted By Steven:
Cool. For better SEO results, I'll link back to you at the CDIA Blog [link] ...

Posted By CDIA:
We are excited to hear that you are enjoying the school. You might want to check out the new CDIA blog at www.cdiabu.com/blog

Posted By Ilia:
I used to think this way back in high school, and occasionally even try and convince some of my peers. I went to a 'selective' school, so most people ...

recently played

 
Episode 18 - StackOverflow
by Jeff Atwood and Joel Spolsky
on IT Conversations
IT Conversations, Jeff Atwood and Joel Spolsky

now playing, a plug-in for itunes

Categories

 
RSS Adobe (29)
RSS Bicycling (9)
RSS Blogging (37)
RSS Books (13)
RSS Breeze (12)
RSS CFMX Podcasts (10)
RSS ColdFusion (417)
RSS Computer Technology (49)
RSS Events (25)
RSS Flash (3)
RSS Flex (17)
RSS Gadgets (10)
RSS HiTech Industry (16)
RSS Java (25)
RSS Learning (54)
RSS Linux (70)
RSS Mac OS X (21)
RSS Macromedia (28)
RSS Meetup (34)
RSS New England (60)
RSS Odds & Ends (25)
RSS Outdoors (32)
RSS Personal (26)
RSS Photography (105)
RSS Photoshop (28)
RSS Podcasts (18)
RSS Rants (19)
RSS Restaurants (8)
RSS Science (34)
RSS Spain (16)
RSS Travel (42)
RSS Video (20)
RSS Webcam (3)
RSS Writing (10)

Blogs I Read

 
Scrum Sucks
Ben Forta
Ray Camden
Kinky Solutions
Gary Gilbert
Red Hat Blogs
O'Reilly Digital Media
O'Reilly Radar
John Nack
The Strobist
Scott Kelby
Matt Kloskowski
Joe McNally
Digital Photography School
Engadget
Science Blog

RSS

 


Add to Google
Add to My Yahoo!

Aggregated By

 


Consumed By Feed-Squirrel.com
Aggregated by ColdFusionBlogger.org

Credits and Stuff

 
BlogCFC - Free ColdFusion Powered Blog Software
CJM Group - ColdFusion Website Hosting


 
 
blog | photos | flickr | referers | webcam | stats | about | contact
 
Copyright © 2008 Steven Erat. All rights reserved.
This is a personal weblog. The opinions expressed here represent my own and not those of my employer