Searching \ for '[PIC]: how to index pdf files?' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: techref.massmind.org/techref/microchip/devices.htm?key=pic
Search entire site for: 'how to index pdf files?'.

Exact match. Not showing close matches.
PICList Thread
'[PIC]: how to index pdf files?'
2003\06\13@133544 by Tom Messenger

flavicon
face
Problem: over 4000 pdf files - mostly PIC datasheets and apnotes along with
other component data sheets - with non-descriptive filenames like
39272.pdf, etc.  How to find the one I want?

Question: does anyone know of pdf indexing/management programs?  I would
like to see something perhaps like MS Windows Explorer format - columns
with filename followed by title and other info of interest.

With knowledge of the pdf format, perhaps this would be easy to write in
some hll, perhaps not. Any ideas?

Thanks!
Tom M.

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@135428 by Harold Hallikainen

picon face
I use the Webinator from http://www.webinator.com as a search engine on my website. It supports a variety of file formats, including pdf.  They will index a site for you up to some number of pages.

However, this, I believe, requires the files to be on a web server, not a local file or MS network file.

But, it's a great program!

Harold


FCC Rules Online at http://www.hallikainen.com/FccRules/



--- Tom Messenger <spam_OUTkristTakeThisOuTspamTHEGRID.NET> wrote:


Problem: over 4000 pdf files - mostly PIC datasheets and apnotes along with
other component data sheets - with non-descriptive filenames like
39272.pdf, etc.  How to find the one I want?

Question: does anyone know of pdf indexing/management programs?  I would
like to see something perhaps like MS Windows Explorer format - columns
with filename followed by title and other info of interest.

With knowledge of the pdf format, perhaps this would be easy to write in
some hll, perhaps not. Any ideas?

Thanks!
Tom M.

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.


________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit http://www.juno.com to sign up today!

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@143442 by Frank Uzzolino

picon face
Try the Open Source DocSearcher at:

http://sourceforge.net/projects/docsearcher/

-Frank

----- Original Message -----
From: Tom Messenger <.....kristKILLspamspam@spam@THEGRID.NET>
Date: Friday, June 13, 2003 1:35 pm
Subject: [PIC]: how to index pdf files?

{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@175241 by Jochen Feldhaar

flavicon
face
Hi Harold and the list,

I have the same pronblem!
I have about 70000 PDFs on my PC (23GBytes, yeah!), and I also would
like to look behind some of the cryptic names for the files, looking for
a certain text string. ST Thomson semi for example has only four-digit
file names, with no correlation at all to the compoment in the data
sheet, TI is not much better (Microchip has 5 digits and a letter, hmmm).
Is there a file that will either:
- list the text contained in a PDF (or a subdirectory full with PDFs) in
one or more separate files, to be searched with a text editor?
- directly search a PDF (or a subdirectory full of them) for a text string

I would like any information you have, even if just a faint hope....

TIA

Jochen Feldhaar DH6FAZ

Harold Hallikainen wrote:

{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@180816 by Patrick B. Murphy

flavicon
face
Hi Tom,

I don't have "PaperPort Pro 9 Office" but according to PC Magazine
(June 17, 2003 and probably on the web at <http://www.pcmag.com>,) PaperPort
will allow you to search for text within PDFs. The review says,

 "If you have ScanSoft's OmniPage OCR software installed, you can
 even index PDF text for PaperPort's new SimpleSearch feature, which
 locates documents quickly."

$199.99 direct; Windows only

--
Best regards,
Patrick Murphy
James Valley Colony

Friday, June 13, 2003, 11:35:35 AM, you wrote:

TM> Problem: over 4000 pdf files - mostly PIC datasheets and apnotes along with
TM> other component data sheets - with non-descriptive filenames like
TM> 39272.pdf, etc.  How to find the one I want?

TM> Question: does anyone know of pdf indexing/management programs?  I would
TM> like to see something perhaps like MS Windows Explorer format - columns
TM> with filename followed by title and other info of interest.

TM> With knowledge of the pdf format, perhaps this would be easy to write in
TM> some hll, perhaps not. Any ideas?

TM> Thanks!
TM> Tom M.

TM> --
TM> http://www.piclist.com hint: The PICList is archived three different
TM> ways.  See http://www.piclist.com/#archives for details.

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@182308 by Igor Pokorny

flavicon
face
Guys, what you are talking about? I save every downloaded file into a
specified directory with a long name descripting it. It's the easest method
I do know.

Igor


{Original Message removed}

2003\06\13@185254 by Tom Messenger

flavicon
face
At 12:22 AM 6/14/03 +0200, you wrote:
>Guys, what you are talking about? I save every downloaded file into a
>specified directory with a long name descripting it. It's the easest method
>I do know.
>
>Igor

That's a good idea, Igor, especially if only downloading one or two.
But...  I have one directory with the entire 2CD set from Microchip in it
with 1311 pdf files, all needing a descriptive name. These were copied off
the CD directly, not downloaded.  To name them, I would have to open each
one up, look at it's "real" name or subject matter, close it, then rename
it.  One down, 1310 to go.

Text search is not what I'm after either.  What would work well is an
indexing program that produces a list of pdf files and their "common names".

Harold's suggestion is interesting to me; perhaps I'll setup a spare old
slow pc as a server and use his idea.

Thanks to all who made suggestions.  And if anyone at Microchip is
listening, think about naming files "PIC18F452 Data Sheet" instead of
39564B.PDF!!!  ;) ;) ;)

Tom M.

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@203712 by M. Adam Davis

flavicon
face
Fortunately for you, Microchip has been responsible, and set the Title
field of the document properties in the PDF (at leat the one I'm looking
at).

You can get the PDF format from http://wotsit.org/ and there are several
utilities 'out there' for PDF manipulation, which should allow you to
index all those pesky PDFs, pulling titles where they exist, and
speculating on content where they don't.

It really shouldn't be hard to whip up a simple program that only gets
the title property from the document to see how common it is to have
them set correctly.

-Adam

Tom Messenger wrote:

{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@204332 by M. Adam Davis

flavicon
face
Try http://www.wotsit.org/

Unfortunately too many people don't alias the plain domain and require
the www dot before the URL - technically correct, but 4 uneeded
characters still.

-Adam

M. Adam Davis wrote:

{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\13@211723 by M. Adam Davis

flavicon
face
Try pdfindex http://www.pdfzone.com/toolbox/tool_PDFindex.html

Also, go to http://www.pdfzone.com/ for a fairly comprehensive list of
PDF tools, including many (MANY) for indexing, archival, retrieval and
search.

They even let you search by OS and license type (free, commercial,
shareware, etc).

I did take a gander at the PDF format, and then browsed a pdf file in a
text editor.  It's not trivial, but it's not hard to deal with either.
No wonder everyone zips them up, it's all text.

-Adam

Tom Messenger wrote:

{Quote hidden}

--
http://www.piclist.com hint: The PICList is archived three different
ways.  See http://www.piclist.com/#archives for details.

2003\06\16@040317 by Nigel Orr

flavicon
face
pic microcontroller discussion list <> wrote on Friday, June 13, 2003 10:41
PM:

> letter, hmmm). Is there a file that will either:
> - list the text contained in a PDF (or a subdirectory full with PDFs)
> in one or more separate files, to be searched with a text editor?

pdf2asc is the Unix way, available in Windows using cygwin.  I'm sure some
of the Adobe Acrobat tools would do the same, and there is bound to be
native Windows pdf conversion freeware out there.

> - directly search a PDF (or a subdirectory full of them) for a text
> string

Windows search does this, doesn't it?  It certainly has worked for me in
the past.

Like Igor, I try to give each pdf a descriptive title when I allow it space
on my hard disc.  Usually partname, description and then the original
datasheet name.

As most datasheets are distributed on CDs these days, and there still isn't
a real multiplatform 'standard' for long file names (Joliet, Rock Ridge etc
are each available natively on only some platforms), maybe there's still a
reason for 8.3 filenames!

Nigel
--
Nigel Orr, Design Engineer                 .....nigelKILLspamspam.....axoninstruments.co.uk
Axon Instruments Ltd., Wardes Road,Inverurie,Aberdeenshire,UK,AB51 3TT
              Tel:+44 1467 622332 Fax:+44 1467 625235
                  http://www.axoninstruments.co.uk

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2003\06\16@045623 by Alan B. Pearce

face picon face
>Problem: over 4000 pdf files - mostly PIC datasheets and apnotes along
>with other component data sheets - with non-descriptive filenames like
>39272.pdf, etc.  How to find the one I want?

<grin> when I download files I alter the filename as I save them so that
they become something more relevant. Microchip files get the model of PIC in
them as well as the document number that Microchip uses. Saves a lot of
looking later on. App notes get the appnote title along with the number.

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2003\06\16@053608 by Fredrik Axtelius

picon face
Microsoft Index Server (included in 2000 and maybe XP) handles PDFs if
you install a PDF plugin (free) from Adobe.

/frax

citerar Nigel Orr <EraseMEnigelspam_OUTspamTakeThisOuTAXONINSTRUMENTS.CO.UK>:

> pic microcontroller discussion list <> wrote on Friday, June 13, 2003
10:41
> PM:
>
> > letter, hmmm). Is there a file that will either:
> > - list the text contained in a PDF (or a subdirectory full with
PDFs)
> > in one or more separate files, to be searched with a text editor?
>
> pdf2asc is the Unix way, available in Windows using cygwin.  I'm sure
some
> of the Adobe Acrobat tools would do the same, and there is bound to be
> native Windows pdf conversion freeware out there.
>
> > - directly search a PDF (or a subdirectory full of them) for a text
> > string
>
> Windows search does this, doesn't it?  It certainly has worked for me
in
> the past.
>
> Like Igor, I try to give each pdf a descriptive title when I allow it
space
> on my hard disc.  Usually partname, description and then the original
> datasheet name.
>
> As most datasheets are distributed on CDs these days, and there still
isn't
> a real multiplatform 'standard' for long file names (Joliet, Rock
Ridge etc
> are each available natively on only some platforms), maybe there's
still a
{Quote hidden}

--
http://www.piclist.com hint: PICList Posts must start with ONE topic:
[PIC]:,[SX]:,[AVR]: ->uP ONLY! [EE]:,[OT]: ->Other [BUY]:,[AD]: ->Ads

2003\06\16@064129 by Reinaldo Alvares

picon face
Adobe Catalog is included with Acrobat 5.0(not the reader).
Check under the tools tab. It's very efficient and you can make separate
indexes for different folders if you want.
Best regards
RA
{Original Message removed}

2003\06\16@091442 by Bob Ammerman

picon face
Use Acrobat Catalog that is a component of full Adobe Acrobat. This is a
full text indexing tool.

Bob Ammemran
RAm Systems

{Original Message removed}

2003\06\22@073337 by

flavicon
face
I find it far more easier to let the *producers* take
care of the cataloging and indexing of the PDF's !

With 70.000 PDF's how do you know that you are actualy
looking at the most reasent version each time you need
it ? Without looking it up at the producers site anyway ?

If you actualy look at 10 PDF's a day, 365 days/year, you'll
need close to 20 years to read them all. I'll bett that at
least *some* of the PDF's will be outdated by then :-)

And about the Microchip CD's, don't they have some kind of
search tool to read them ? Can't that tool be copied to
your harddisk together with the PDF's ?

Jan-Erik.


Jochen Feldhaar wrote:
> I have about 70000 PDFs on my PC (23GBytes, yeah!),...

--
http://www.piclist.com hint: The list server can filter out subtopics
(like ads or off topics) for you. See http://www.piclist.com/#topics

2003\06\22@135821 by Alan Melia

flavicon
face
If you really have the need then check out
http://www.adobe.com/support/salesdocs/fa82.htm

Adobe have an IFilter module for the Windows NT 4 Server/ 2000 Server /XP
Index Server component.  Once you have it installed just head off to be and
hope...

Alan Melia
Melmac Solutions Ltd.
http://www.melmac.co.uk
{Original Message removed}

More... (looser matching)
- Last day of these posts
- In 2003 , 2004 only
- Today
- New search...