![]() Hence, if you want, you can always use option -O to analyze PDFs. Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:įrom this output, we also see that object 43 is inside stream object 16. Now I can see that there is a /URI inside the PDF (object 43). With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid: Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects. ![]() In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis). There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).įirst I start the analysis with pdfid.py: This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.Īnd finally, I started to add a man page (option -m), like I do with many of my other tools. However, always including option -O is tedious and error prone. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. ![]() It’s actually best to always parse stream objects, i.e. This is useful for option -O, an option to parse stream objects. There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. Option -o can now be used to select multiple objects: separate the indices by a comma. I often get good ideas from my students, and sometimes, even I get a good idea in class ?. This will avoid calling pdfinfo on directories and such.There are a couple of bug fixes for pdf-parser and pdfid.Īnd 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. name \*.pdf -print -exec pdfinfo \ | grep -E '^(\.|Pages)'Ĭonsider -type f as the first test in case some non-regular file matches -name \*.pdf by chance. grep -E '^(\.|Pages)' matches lines with a literal dot at the beginning or the string Pages at the beginning. Use the fact every pathname considered by find. In the context of your question I personally prefer the pathname first, so -print -exec …. In general one uses -exec … -print when -exec is used as a test. If you prefer the pathname after the output of the respective pdfinfo then you may try -exec … -print, but note in this case -print will be performed iff -exec (i.e. ), but the presence of -exec suppresses the default. This way the pathname will be printed by find before the respective pdfinfo prints its output.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |