Input file for pdftk

If none of the specific sub-forums seem right for your thread, ask here.
Message
Author
User avatar
ramack
Posts: 500
Joined: 2008-01-28 15:31
Location: Centennial, CO
Has thanked: 6 times

Input file for pdftk

#1 Post by ramack »

This is a question for those that use pdftk. I've been using it for about the last year or so to combine manipulate PDFs and it is awesome. Until today, mostly what I've been using it for has been for less than five files. Today I have been working on a manual at work which consists of about 100 separate files with a total page count of 167. The pages are a mix of 8.5x11(inch) and 11x17(inch). The pages needed to be in a specific order, the only way I could think of to do this instead doing this all in the CLI, I created a text file which contained a list or sequence of the files. To make it easier for me to read and group, I first put each file on a separate line.

chapter1_separation_page.pdf
file1.pdf
file2.pdf
file3.pdf

chapter2_separation_page.pdf
file4.pdf
file5.pdf
file6.pdf
and so on

Then once I had everything in the correct order, I had to remove the line breaks and replace them with spaces, then copy the text and paste it into the CLI.

Code: Select all

 pdftk chapter1_separation_page.pdf file1.pdf file2.pdf file3.pdf chapter2_separation_page.pdf file4.pdf file5.pdf file6.pdf [a whole bunch more] cat output output.pdf 
Can a separate input file be specified and read? Or is there a better way to list all the files to be combined? I'm trying to keep the file sequence somewhat easy to read and follow. For a few files, this isn't bad, but if I have 100+, it gets cluttered quickly.

Rich
homemade AMD64, Acer AspireOne 150, Asus eeePC 900, i386; Testing
i386,Dell Vostro 1000 AMD64, Dell Inspiron 1100; Sid
XFCE on all.

jw013
Posts: 161
Joined: 2009-08-18 21:00

Re: Input file for pdftk

#2 Post by jw013 »

I checked the pdftk manpage and there doesn't seem to be anything about reading file names from anywhere other than the command line.

I'd recommend using shell features as a workaround for this. If you are using bash and have all the file names in the correct order in a file like "list.txt" do something like

Code: Select all

pdftk $(< list.txt) output output.pdf

jmtd
Posts: 35
Joined: 2010-10-14 22:21

Re: Input file for pdftk

#3 Post by jmtd »

jw013 wrote:I'd recommend using shell features as a workaround for this. If you are using bash and have all the file names in the correct order in a file like "list.txt" do something like

Code: Select all

pdftk $(< list.txt) output output.pdf
I'd second this recommendation. If you aren't using bash (but are using something sh-compatible), the slightly more verbose

Code: Select all

pdftk $(cat list.txt) output output.pdf
Will also work.

User avatar
ramack
Posts: 500
Joined: 2008-01-28 15:31
Location: Centennial, CO
Has thanked: 6 times

Re: Input file for pdftk

#4 Post by ramack »

Yes, I had tried this, but it reads the input file, expecting a PDF format. I'm going to play around a bit more with some scripting, I think it can be done, I just haven't figured out how, ha. I may end oup contacting PDF Labs for some input as well.

Rich
homemade AMD64, Acer AspireOne 150, Asus eeePC 900, i386; Testing
i386,Dell Vostro 1000 AMD64, Dell Inspiron 1100; Sid
XFCE on all.

jw013
Posts: 161
Joined: 2009-08-18 21:00

Re: Input file for pdftk

#5 Post by jw013 »

If you use either of the suggested command snippets it should work. It really is that simple, no need to play around with scripting. The only possible issues arise if any of the file names in your list have problematic names (contain shell characters, spaces, etc). Can you be more specific as to what you are seeing that is not working?

jmtd
Posts: 35
Joined: 2010-10-14 22:21

Re: Input file for pdftk

#6 Post by jmtd »

Actually, I think we've got our pdftk command lines wrong (missing the operation). the manpage leads me to believe that they need to include an operation (cat):

Code: Select all

pdtk $(< list.txt) cat output output.pdf

User avatar
ramack
Posts: 500
Joined: 2008-01-28 15:31
Location: Centennial, CO
Has thanked: 6 times

Re: Input file for pdftk

#7 Post by ramack »

The format for combining multiple PDFs is:

pdftk pdf1.pdf pdf2.pdf pdf3.pdf cat output output.pdf.

I've made a little headway by filling an array with the list of PDFs.

Code: Select all

array=(pdf1.pdf pdf2.pdf pdf3.pdf)

Code: Select all

pdftk ${array[*]} cat output output.pdf
So now all I should have to do is fill the array with the contents of list.txt.
homemade AMD64, Acer AspireOne 150, Asus eeePC 900, i386; Testing
i386,Dell Vostro 1000 AMD64, Dell Inspiron 1100; Sid
XFCE on all.

User avatar
ramack
Posts: 500
Joined: 2008-01-28 15:31
Location: Centennial, CO
Has thanked: 6 times

Re: Input file for pdftk

#8 Post by ramack »

jw013 wrote:If you use either of the suggested command snippets it should work. It really is that simple, no need to play around with scripting. The only possible issues arise if any of the file names in your list have problematic names (contain shell characters, spaces, etc). Can you be more specific as to what you are seeing that is not working?
Thanks for all of your inputs!

I have created created a test input file; list.txt, which contains:

Code: Select all

 $ cat list.txt 
pdf1.txt pdf2.pdf pdf3.pdf pdf4.pdf
And if I:

Code: Select all

$ pdftk $(< list.txt) cat output output.pdf
pdf1.txt not found as file or resource.
Error: Failed to open PDF file: 
   pdf1.txt
Errors encountered.  No output created.
Done.  Input errors, so no output created.
Now if I create an array with the contents of list.txt:

Code: Select all

$ array=(pdf1.pdf pdf2.pdf pdf3.pdf pdf4.pdf)

$ echo ${array[*]}
pdf1.pdf pdf2.pdf pdf3.pdf pdf4.pdf
Now inserting the array into the pdftk command:

Code: Select all

pdftk ${array[*]} cat output output.pdf
No errors, and the combined PDF, output.pdf is created.

I can get my results if I use an array. So I now if I fill my array with the contents of list.txt, this should work.
homemade AMD64, Acer AspireOne 150, Asus eeePC 900, i386; Testing
i386,Dell Vostro 1000 AMD64, Dell Inspiron 1100; Sid
XFCE on all.

jw013
Posts: 161
Joined: 2009-08-18 21:00

Re: Input file for pdftk

#9 Post by jw013 »

ramack wrote:

Code: Select all

 $ cat list.txt 
pdf1.txt pdf2.pdf pdf3.pdf pdf4.pdf
It's probably safer to put 1 file name per line, but that should be fine.
ramack wrote: And if I:

Code: Select all

$ pdftk $(< list.txt) cat output output.pdf
pdf1.txt not found as file or resource.
Error: Failed to open PDF file: 
   pdf1.txt
Errors encountered.  No output created.
Done.  Input errors, so no output created.
Now the problem here is not the command itself, it's the inclusion of pdf1.txt, since it's not a pdf file. Did you mean to type pdf1.pdf?
I notice you have pdf1.pdf in the array version, but pdf1.txt here.

Also, arrays is another perfectly good way to do it. It's just that you still need to put the whole list of names on the command line by hand to assign it to an array, or you could write a little bit of shell to a file one line at a time and make an array of lines.

User avatar
ramack
Posts: 500
Joined: 2008-01-28 15:31
Location: Centennial, CO
Has thanked: 6 times

Re: Input file for pdftk

#10 Post by ramack »

jw013 wrote:Did you mean to type pdf1.pdf?
Oh man! Thanks for catching my typo mistake. Yes, pdf1.txt should be pdf1.pdf. When I corrected the error, it works correctly as you suggest. Thank you for the suggestions AND catching my mistake!
homemade AMD64, Acer AspireOne 150, Asus eeePC 900, i386; Testing
i386,Dell Vostro 1000 AMD64, Dell Inspiron 1100; Sid
XFCE on all.

User avatar
ramack
Posts: 500
Joined: 2008-01-28 15:31
Location: Centennial, CO
Has thanked: 6 times

Re: Input file for pdftk

#11 Post by ramack »

I just implemented the changes that all of you suggested on my manual at work. It works awesome! And it's much much easier to read when I'm able to group the individual pages by sections/chapters.

Thank you!

Rich
homemade AMD64, Acer AspireOne 150, Asus eeePC 900, i386; Testing
i386,Dell Vostro 1000 AMD64, Dell Inspiron 1100; Sid
XFCE on all.

PDA123
Posts: 78
Joined: 2021-04-24 01:55
Been thanked: 1 time

Re: Input file for pdftk

#12 Post by PDA123 »

Back from the dead....

This topic has been very helpful. I'm having one problem. If the files listed in the array are located in the folder where there pdftk command is executed then they are nicely cat'd into the output file.

However, in my case, if the files to be cat'd into the pdf output file are located in another folder I have no idea how to exactly list them in the array.txt file. Even when I use the following format I always get the same error message- file not found.

Example of one of the array.txt files- /home/Mine/My Documents/The Three Stooges Episodes.pdf

Even if I use this format the same error message appears (file not found);

/home/Mine/My Documents/The\ Three\ Stooges\ Episodes.pdf

I've tried a lot of variations of the filename to no success.

Any ideas?

P.S. I don't need a list of Stooges episodes- I have them memorized.

CwF
Global Moderator
Global Moderator
Posts: 3073
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 63 times
Been thanked: 254 times

Re: Input file for pdftk

#13 Post by CwF »

PDA123 wrote: 2023-01-06 22:38Any ideas?
Nope, not really.
More of a scripting question here...
If done only once I think I'd past together a paragraph of a command line for the purpose.
Depending on the content type of the pdf's I would maybe import them into a CherryTree file, with its Import/Export abilities and tight single file format. If it's actually episode data in text form I'd maybe use AtomicParsley to embed them into the episode video!

I'd assume the full path of each file would work for the array.
Mottainai

PDA123
Posts: 78
Joined: 2021-04-24 01:55
Been thanked: 1 time

Re: Input file for pdftk

#14 Post by PDA123 »

The array text file contains the name and location of pdf's that I want cat'd into one pdf.

The full path of each pdf file contained within the array.txt file....that's the problem. Terminal returns the error of "file not found" when referencing the pdfs in the array.txt file. Therefore, I'm not using the correct syntax or whatever it's called.

CwF
Global Moderator
Global Moderator
Posts: 3073
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 63 times
Been thanked: 254 times

Re: Input file for pdftk

#15 Post by CwF »

PDA123 wrote: 2023-01-07 01:13 "file not found"
Try quoting each full path; are there spaces in names; are the locations within permissions?
Mottainai

PDA123
Posts: 78
Joined: 2021-04-24 01:55
Been thanked: 1 time

Re: Input file for pdftk

#16 Post by PDA123 »

CwF wrote: 2023-01-07 01:34
PDA123 wrote: 2023-01-07 01:13 "file not found"
Try quoting each full path; are there spaces in names; are the locations within permissions?
Here's the filename in the array;

/home/hello/work_area/howdy 2024.pdf

here's the error message;

$ pdftk $(cat files.txt) cat output out.pdf
Error: Unable to find file.
Error: Failed to open input PDF file:
/home/hello/work_area/howdy
Error: Unable to find file.
Error: Failed to open input PDF file:
2024.pdf
Errors encountered. No output created.
Done. Input errors, so no output created.

CwF
Global Moderator
Global Moderator
Posts: 3073
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 63 times
Been thanked: 254 times

Re: Input file for pdftk

#17 Post by CwF »

PDA123 wrote: 2023-01-07 01:53 /home/hello/work_area/howdy 2024.pdf
There is a space 'y 2' so it needs quotes

Code: Select all

 "/home/hello/work_area/howdy 2024.pdf"
Mottainai

PDA123
Posts: 78
Joined: 2021-04-24 01:55
Been thanked: 1 time

Re: Input file for pdftk

#18 Post by PDA123 »

CwF wrote: 2023-01-07 01:57
PDA123 wrote: 2023-01-07 01:53 /home/hello/work_area/howdy 2024.pdf
There is a space 'y 2' so it needs quotes

Code: Select all

 "/home/hello/work_area/howdy 2024.pdf"
Same error message except it now includes the quotes

CwF
Global Moderator
Global Moderator
Posts: 3073
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 63 times
Been thanked: 254 times

Re: Input file for pdftk

#19 Post by CwF »

PDA123 wrote: 2023-01-07 01:53 $ pdftk $(cat files.txt) cat output out.pdf
Also place quotes in the command...I think...

Code: Select all

$ pdftk $("cat files.txt") cat output out.pdf
echo $(cat files.txt) returns what?
Mottainai

PDA123
Posts: 78
Joined: 2021-04-24 01:55
Been thanked: 1 time

Re: Input file for pdftk

#20 Post by PDA123 »

CwF wrote: 2023-01-07 02:17
PDA123 wrote: 2023-01-07 01:53 $ pdftk $(cat files.txt) cat output out.pdf
Also place quotes in the command...I think...

Code: Select all

$ pdftk $("cat files.txt") cat output out.pdf
echo $(cat files.txt) returns what?
$ echo $(cat files.txt)
"/home/hello/work_area/howdy 2024.pdf" /home/hello/work_area/2020.pdf /home/hello/work_area/2021.pdf /home/hello/work_area/2022.pdf /home/hello/work_area/2023.pdf


$ pdftk $("cat files.txt") cat output out.pdf
bash: cat files.txt: command not found
bash: $: command not found

Post Reply