History of Changes

RSS

Version 0.7.1 (04/10/2005)

  • add[ 1119408 ] Support named target for Bookmark extraction.(BJL)
  • addCreated Resources/PDFBox_External_Fonts.properties to create a mapping for non-embedded fonts(BJL)
  • addAdded implementation for PDF page articles(BJL)
  • addCreated TextToPDF command line application(BJL)
  • addCreated ImageToPDF example(BJL)
  • update[ 1119420 ] Extract and Update the Meta-Information as XML(BJL)
  • update[ 1119410 ] Extract text in/between bookmarks(BJL)
  • update[ 1164476 ] XFDFImport should fail with non XFDF document(BJL)
  • update**API Change** Renamed PDField.getName() to PDField.getPartialName(), added method getFullyQualifiedName() (BJL)
  • update**API Change** Renamed PDWidget to PDAnnotationWidget for naming consistency(BJL)
  • updateText is now extracted from embedded form xobjects.(BJL)
  • updateDeployed site to new hosting vendor.(BJL)
  • updatecommitted code for PDFHighlighter to highlight words in a PDF document.(BJL)
  • updateAdded command line application org.pdfbox.PDFToImage(BJL)
  • updateImplemented runlength decoding(BJL)
  • updateAdded patch from Jorge Hernández Sellés to append content streams to existing page.(BJL)
  • update**API Change**renamed package from pdmodel.graphics.image to pdmodel.graphics.xobject(BJL)
  • update**API Change**Removed PDRadioButton, should use PDCheckbox instead(BJL)
  • update**API Change**COSStream now extends COSDictionary instead of containing a dictionary(BJL)
  • update[ 1021241 ] Text extraction should follow PDF article divisions(BJL)
  • fix[ 1170068 ] text field is not found(BJL)
  • fixfixed NPE issue where an image did not have any applied filters(BJL)
  • fixFixed issue where extra spaces were being added during text extraction for type3 fonts(BJL)
  • fixfixed parsing of header where a trailing % exists(BJL)
  • fix[ 1110029 ] Character ">" not quoted in COSName::writePDF(BJL)

Version 0.7.0 (1/22/2005)

  • addAdded implementation for PDF Bookmarks(BJL)
  • addAdded implementation for PDF Destinations(BJL)
  • updatecommitted [ 1097913 ] Enhance LucenePDFDocument streams(thanks to Olivier Parent)(BJL)
  • updateUpdated website for better format for documentation(BJL)
  • fixNow ExportFDF and ExportXFDF will default output files to pdfname.fdf and pdfname.xfdf(BJL)
  • fix[ 1046278 ] ClassCastException when doing FDF/XFDF(BJL)
  • fixExtractText now allows you to extract text if you decrypt with the owner password(BJL)
  • fixAdded PDF 1.5 Object Stream support(BJL)
  • fixAdded pdmodel.common.PDStream to represent COSStream(BJL)
  • fixchanged PDPage.getContents to use PDStream instead of COSStream(BJL)
  • fixUpdated LucenePDFDocument Javadoc to tell which Lucene fields it populates(BJL)
  • fixmoved HelloWorld example from persistence to pdmodel and updated to use new PD Model features(BJL)
  • fixRefactored PDFStreamEngine based on contributions from Christophe Huault(BJL)
  • fixThis class no longer uses a gigantic if/else statement for all of the operators they are defined as properties when instantiating the class(BJL)
  • fixUpdated AFM resources to be ones released on Adobe's site, include AFM license as well(BJL)
  • fixAdded ability to embed TTF fonts, only WinAnsiEncoding is supported at this time(BJL)
  • fixAdded ability to extract images, thanks to contributions by Brigitte Mathiak(BJL)
  • fixCOSWriter now generates the document id if it does not already exist(BJL)
  • fiximproved performance for text extraction(BJL)
  • fix[ 1058693 ] TextPosition does not take account of tz operator(BJL)
  • fixupgraded to log4j-1.2.9(BJL)
  • fixinclude package-list for javadocs(BJL)
  • fix[ 1037145 ] Infinite loop in PDFParser.parseObject(BJL)
  • fixfixed error where spaces before integers was causing parse errors(BJL)

Version 0.6.7 (10/09/2004)

  • addAdded the following command line applications (BJL)
  • fixRevamped the way character spacing and font information is obtained(BJL)
  • fixImproved location information about a character drawn on the screen.(BJL)
  • fixChanged the PDFStreamEngine.showString to showCharacter to support the newly improved location information. This will now only show one character at a time.(BJL)
  • fixFixed bug in PDDocument.isOwnerPassword and isUserPassword that was using the wrong length for the encryption key(BJL)
  • fixUpgraded to ant 1.6.2(BJL)
  • fixUpgraded to checkstyle-3.4(BJL)
  • fixUpgraded to JUnit-3.8.1(BJL)
  • fixUpgraded to lucene-1.4.2(BJL)
  • fixIntegrated patch(1016603) for issue 943319 to fix parsing of open office documents(BJL)
  • fixPatch:985347 No longer throw exception for "No 'ToUnicode' and no 'Encoding' for Font"(BJL)
  • fixPatch:996191 Fixed case statement with missing break(BJL)
  • fixPatch:996781 Fixed null pointer exception in acroform fields(BJL)
  • fixRenamed DecryptDocument to DocumentEncryption to support encryption and decryption(BJL)
  • fixAdded load/save/encrypt/decrypt convenience methods on the PDDocument class(BJL)
  • fixCOSWriter now attempts to keep object numbers from parsed documents and writes 'free' entries in the xref if necessary(BJL)
  • fixAdded the ability to set the word separator on the PDFTextStripper(BJL)
  • fixFixed issue where PDFBox would throw an IOException if a PDF was incorrectly missing an endobj tag(BJL)
  • fixFixed 918220 where PDFBox would freeze when parsing certain cmap files(BJL)
  • fixAdded initial colorspace support(BJL)
  • fixFixed issue where AppendDoc was throwing ClassCastException(BJL)
  • fixFixed 1013163 Can't parse filters that use filter abbreviation(BJL)
  • fixFixed 1011244 Where encrypting then decrypting was causing a problem(BJL)
  • fixrenamed TextPosition.getWidth to TextPosition.getCombinedHorizontalDisplacement to better reflect its actual value(BJL)
  • fixFixed 919215 PDFBox now support stream replacement(BJL)
  • fixFixed 955043 Added support for 'ETenms-B5-H' encoding(BJL)
  • fixFixed 996050 Class Cast exception when importing(BJL)
  • fixAdded support for Font descriptors(BJL)
  • fixFixed spacing issues when doing textfield FDF import(BJL)
  • fixFixed 1017175 Large number converted when re-written(BJL)
  • fixFixed 1029873 PDFBox now allows for multiple xref sections(BJL)
  • fixAdded support for document Viewer Preferences(BJL)
  • fixMade currentDocument and pdfDocument protected in util.Splitter to allow easier subclassing(BJL)
  • fixFixed 1034427 After Splitting page orientation is lost(BJL)

Version 0.6.6 (07/20/2004)

  • fixImproved support for setting of checkbox fields(FDF import)(BJL)
  • fixAdded the org.pdfbox.PDFSplit utility to split a single document into many documents(BJL)
  • fixPDFBox now ignore the Length field that is associated with a stream, it has been found to be wrong in some documents(BJL)
  • fixFixed bug when writing out PDF documents and the document contained an non alphabetic character such as ( or )(BJL)
  • fixFixed bug in PDFont where dictionary encodings where not being processed correctly(BJL)
  • fixFixed bug in COSDocument.isEncrypted which was comparing COSNull to the wrong object(BJL)
  • fixIntegrated patch for supporting multiple lines in the appearance stream(BJL)
  • fixUpgraded to lucene-1.4-final(BJL)
  • fixorg.pdfbox.ExtractText now uses the system encoding as the default encoding instead of ISO-8859-1(BJL)

Version 0.6.5 (03/08/2004)

  • fixFixed bug in revision 3 encryption algorithm(BJL)
  • fixadded support for CIDFontType0 glyph widths, which fixed issue with spaces being during text extraction(BJL)
  • fixFixed infinite loop when parsing a corrupt content stream(BJL)
  • fixAdd characterspacing + wordspacing when determining the width of a space character(BJL)
  • fixAdded support for more font types(BJL)
  • fixrefactored the pdmodel.interactive package, form fields use object delegation instead of inheritance for the widget, see PDField.getWidget and PDField.getKids(BJL)
  • fixFixed bug where an inheritable cropbox would cause stackoverflow exception(BJL)
  • fixChanged usage of PDField/PDWidget to look like object delegation instead of inheritance by adding a PDField.getWidget instead of extending PDWidget(BJL)
  • fixrefactored interactive package, this will break any existing code that uses the PDField/PDAnnotation classes. You will need to adjust your package names!!(BJL)
  • fixNow uses StandardEncoding as the default encoding(BJL)
  • fixBug in AppendDoc example that did not take into account groups of pages(BJL)
  • fixPDFont now also tries the bootstrap classloader when loading AFM resources(BJL)
  • fixadded -startPage and -endPage command line options to org.pdfbox.ExtractText(BJL)
  • fixAdded support for corrupt PDFs with garbage before the header(BJL)
  • fixFixed bug where there was whitespace instead of garbage characters in front of the first object(BJL)
  • fixperformance improvements for the Matrix implementation(BJL)
  • fixupgraded to lucene 1.3(BJL)
  • fixfixed bug in cmap parser for cmap files that all ended in 'def'(BJL)
  • fixRemoved createObject method from COSDocument, COSWriter will handle all object references for you(BJL)
  • fixUpdated AppendDoc to use PDDocument instead of COSDocument and a couple bug fixes(BJL)
  • fixPDFParser now closes the document if there were parse errors(BJL)
  • fixTextPosition now has the PDFont that is associated with the piece of text(BJL)
  • fixAdded initial version of org.pdfbox.PDFViewer, a GUI application to view the internal structure of a PDF document. This can be used for debugging purposes at this time but may end up being a Adobe Reader like application if there is enough interest(BJL)
  • fixChanged COSNumber/COSInteger/COSFloat interface to have both intValue and longValue(BJL)
  • fixAdded methods isUserPassword & isOwnerPassword to PDDocument(BJL)
  • fixAdded cmap files for CJK languages, please give me some feedback(BJL)

Version 0.6.4 (11/02/2003)

  • fixFixed bug which caused infinite loop(BJL)
  • fixFixed bug in encoding where DictionaryEncoding kept a reference instead of making a copy leading to encoding problems(BJL)
  • fixAdded PDFTextStripper.(get|set)PageSeparator, which will allow the user to output a string after every page(BJL)
  • fixrefactored text stripping code to separate the logic processing of PDF operators and the logic of extracting text(BJL)
  • fixran findbugs on source code and fixed a couple minor issues(BJL)
  • fixRefactored font functionality to PDFont, some API methods are no longer available in COSObject(BJL)
  • fixchanged name of org.pdfbox.Main to org.pdfbox.ExtractText(BJL)
  • fixadded contribution of org.pdfbox.Overlay from Mario Ivankovits(BJL)
  • fixadded log.isDebugEnabled checks to log4j calls(BJL)
  • fixadded better escaping when writing COSNames(BJL)
  • fixfixed bug where encryption dictionary is sometimes set to COSNull instead of not being present(BJL)

Version 0.6.3 (09/13/2003)

  • fixNow contains the ability to import/set FDF data thanks to a contribution from Stefan Uldum Grinsted(BJL)
  • fixNo longer throw an error when stream is not followed by 0A or 0D0A to allow more PDFs to be parsed(BJL)
  • fixAdded -encoding argument to org.pdfbox.Main to control the encoding of the output(BJL)
  • fixRemove Prev entry from trailer if it exists because PDFBox automatically clears all old entries, only an issue when modifying/saving an existing PDF document(BJL)
  • fixFixed bug in master password encryption algorithm for Revision 3 encrypted documents(BJL)
  • fixCOSString no longer uses UTF-8 when encoding the byte array(BJL)
  • fixAdded PDDocument.getPageCount()(BJL)
  • fixFixed bug in PDFEncryption where(BJL)
  • fixNow enforces text extraction permissions(BJL)

Version 0.6.2 (4/18/2003)

  • addAdded required libraries to CVS(BJL)
  • addAdded log4j logging(BJL)
  • addAdded automated tests and test data for text extraction(BJL)
  • updateSignificant text extraction work(BJL)
  • fixModified build so that build.properties settings are no longer required(BJL)
  • fixAdded automatic handling of files encrypted with the empty password(BJL)
  • fixRemoved unimplemented decoders from filters test(BJL)
  • fixFixed several LZW decode bugs introduced after 0.5.6(BJL)
  • fixFixed bugs relating to processing out of spec PDF's with bad # escaping in the name ("java.io.IOException: Error: expected hex number" bug)(BJL)
  • fixFixed Lucene UID generation bug(BJL)
  • fixFixed GetFontWidths null pointer exception bug(BJL)

Version 0.6.1 (3/9/2003)

  • fixFixed bug in parsing stream objects which led to "Unexpected end of ZLIB input stream"(BJL)
  • fixChanged license from LGPL to BSD to allow pdfbox to be used easily in Apache projects(BJL)

Version 0.6.0 (3/5/2003)

  • addAdded PDF document summary fields to the lucene document(BJL)
  • fixMassive improvements to memory footprint(BJL)
  • fixMust call close() on the COSDocument(LucenePDFDocument does this for you)(BJL)
  • fixReally fixed the bug where small documents were not being indexed(BJL)
  • fixFixed bug where no whitespace existed between obj and start of object. Exception in thread "main" java.io.IOException: expected='obj' actual='obj<</Pro(BJL)
  • fixFixed issue with spacing where textLineMatrix was not being copied properly(BJL)
  • fixFixed 'bug' where parsing would fail with some pdfs with double endobj definitions(BJL)

Version 0.5.6 (11/28/2002)

  • addFixed bug in LucenePDFDocument where stream was not being closed and small documents were not being indexed (BJL)
  • addFixed a spacing issue for some PDF documents (BJL)
  • addFixed error while parsing the version number (BJL)
  • addFixed NullPointer in persistence example (BJL)
  • addCreate example lucene IndexFiles class which models the demo from lucene (BJL)
  • addFixed bug where garbage at the end of file caused an infinite loop (BJL)
  • addFixed bug in parsing boolean values with stuff at the end like "true>>" (BJL)

Version 0.5.5 (10/03/2002)

  • addAdded example of printing document signature(BJL)
  • addAdded example to print out form fields values(BJL)
  • fixFixed bug when appending documents(BJL)
  • fixVarious other bug fixes(BJL)

Version 0.5.4 (09/17/2002)

  • fixFixed bug in text output where '?' instead of the proper character(BJL)
  • fixFixed bug where sections of text were not being output at all(BJL)

Version 0.5.3 (09/13/2002)

  • fixFixed bug in 128 bit encryption(BJL)

Version 0.5.2 (09/06/2002)

  • updateCatch all NumberFormatExceptions and wrap them with IOExceptions(BJL)
  • fixFixed bug where FDF documents could not be appended to PDF Documents(BJL)

Version 0.5.1 (09/04/2002)

  • addNow supports unicode for the document summary(BJL)
  • updateBetter support for Type0 fonts(BJL)
  • fixFixed bug with an empty LZW stream(BJL)
  • fixFixed parsing error for ID operator(BJL)

Version 0.5.0 (08/31/2002)

  • addNow supports unicode for the document summary(BJL)
  • updateBetter support for Type0 fonts(BJL)
  • fixFixed bug with an empty LZW stream(BJL)
  • fixFixed parsing error for ID operator(BJL)

Version 0.4.1 (07/25/2002)

  • fixFixed bug where .notdef was being output as document text(BJL)

Version 0.4.0 (07/23/2002)

  • addAdded extract text ant task(BJL)
  • addImplemented AFM(Adobe Font Metrics) resource loading(BJL)
  • updateChanged project from pdfparser to pdfbox to better reflect future needs(BJL)
  • fixFixed numerous bugs submitted by users(BJL)

Version 0.3.0 (07/09/2002)

  • addAdded indexer for the lucene project(BJL)
  • fixInitial implementation of PDF encryption(not working yet)(BJL)

Version 0.2.0 (06/03/2002)

  • addAdded support for the various encodings(BJL)
  • fixImproved the accuracy of the text output(BJL)

Version 0.1.0 (05/25/2002)

  • addInitial Version(BJL)