Friday, September 11, 2009

C++ API for PDF to Word

Solid Framework core features like PDF to Word conversion can now be used outside of the .NET environment by native C++ projects.


In truth, this has always been the case but until now Solid Documents has not published these interfaces. In an effort to avoid dependencies between customer product versions and Solid Framework versions, we've published source code to this interface (rather than simply .h and .lib files). The linking is done using LoadLibrary/GetProcAddress so it does not depend on project specific setting like structure alignment, etc.

You can always get the latest version of the lightweight C++ wrapper by downloading solidframeworknative.zip. This API will work with the current version of the regular Solid Framework download.

Getting Started
Getting started is surprisingly simple:

1. Create a new C++ project in Visual Studio. Make sure to set C++ options to (these are the defaults anyway):
  • Use Unicode Character Set
  • Use Multi-threaded DLL Runtime Library
2. Add the .cpp and .h files to your project from solidframeworknative.zip
3. #include "solidframeworknative.h" where you use the native API
4. Unzip the contents of SolidFramework.dll to the executable folder of your project (typically .\Debug or .\Release). For more information on how to extract the contents of this DLL, read this earlier blog post about extracting the contents of SolidFramework.
5. Write some code that calls the Solid Framework APIs.
6. Run.

For step 4, you could use the sample code below to get started (obviously you'll set the source file to a PDF on your machine). Here is the entire sample project (without the Solid Framework files). This is a VS 2005 project which means it will correctly import into later versions like VS 2008 and Express.


More Samples
We've named the classes, methods and properties in the C++ API to correspond very closely to their C# counterparts in Solid Framework. enum values for option properties have the same order and naming as in C#.
This was done deliberately in order to make all the C# documentation and samples apply equally to the C++ interface. As with the C# interfaces, we've made a concious effort to follow Microsoft's .NET component design guidelines. The classes follow the usual create-set-call pattern of:
  • create object
  • set properties
  • do action
  • use results
In addition, we've made sure that the naming conventions are intuitive. Often using the Object Browser and IntelliSense in Visual Studio is enough to answer the obvious "what now" questions.

Limitations
Based on customer demand, this initial release implements:
  • License class
  • PdfToWordConvert class (including supporting enums and event handlers)
  • PdfToExcelConverter class
  • PdfToTextConverter class
  • PdfToPdfAConverter class
  • ConversionResult class

Friday, September 4, 2009

Convert PDF pages to Image files

Another comment question we get in email is if Solid Framework can convert PDF pages into image files. Solid Framework can be used to convert PDF pages into image files, and we use this feature to create page thumbnail images and the main page view for PDF Navigator. Here is a diagram of how this works:


You can download the sample project [zip file] to see this in action yourself. The project contains both Visual Studio 2005 and Visual Studio 2008 solutions. Those without Microsoft Visual Studio can use Visual C# 2008 Express Edition for free to work with the sample project.

Earlier we talked about using a C# class library to allow you to use the scripting functionality of Solid PDF Tools Scan to PDF from the command line. We use this class again to parse out the command line arguments we need to convert the pages into image files:


  Arguments CommandLine = new Arguments(args);‍
  if (CommandLine["f"] == null)‍
  {
    ShowUsage();‍
    return -1;
  }‍
  else
    pdfFile = CommandLine["f"];‍

  if (CommandLine["p"] != null)‍
    password = CommandLine["p"];

‍  if (CommandLine["o"] == null)
  {‍
    ShowUsage();
    return -2;‍
  }
  else
    outputfolder = CommandLine["o"];

‍  // Note: We default to 96 dpi if the parameter was not provided.
  if (CommandLine["d"] != null)‍
    dpi = Convert.ToInt32(CommandLine["d"]);

  if (CommandLine["t"] != null)
  {‍
    switch (CommandLine["t"].ToUpper())
    {‍
      case "TIF":
      case "TIFF":‍
        imagetype = ImageType.TIFF;
        break;‍
      case "BMP":
        imagetype = ImageType.BMP;‍
        break;
      case "JPEG":‍
      case "JPG":
        imagetype = ImageType.JPG; ‍
      break;
      case "PNG":‍
      default:
        imagetype = ImageType.PNG;‍
        break;
    }‍
  }

  if (CommandLine["r"] != null)
  {‍
    pagerange = CommandLine["r"];
  }‍

  DoConversion(pdfFile, password, outputfolder, dpi, pagerange, imagetype);‍

The code above takes care of setting up the arguments to hand off to DoConversion. So lets say we have a pdf file at c:\mypdfs\pdftest.pdf that is encrypted with a user password of "mypassword" and we want to make JPEG images of pages 1-5, 7, 8 with a dpi of 127 and put these images in c:\myimages. The commandline would look like this:

PDFtoImage.exe -f:c:\mypdfs\pdftest.pdf -p:mypassword -o:c:\myimages -d:127
-t:JPG -r:1-5,7,8


Note: -p -d -t and -r are optional. No password is used if -p is missing. DPI will default to 96, and image type will default to PNG. If -r is missing, all pages will be used to make images.

The DoConversion function is the meat of the project. First we set the trial license:

  // Setup the license
  SolidFramework.License.ActivateDeveloperLicense();

It then loads the PDF file with password if supplied:

  // Load up the document
  SolidFramework.Pdf.PdfDocument doc =
    new SolidFramework.Pdf.PdfDocument(file, password);

  doc.Open();

After the document is open, we check to see if the output folder exists, and if it doesn't, we create it:

  // Setup the outputfolder
  if (!Directory.Exists(folder))
  {
    Directory.CreateDirectory(folder);
  }
  // Setup the file string.
  string filename = folder + Path.DirectorySeparatorChar +
    Path.GetFileNameWithoutExtension(file);

Now walk the Pages dictionary and finds the page items by following the references.

  // Get our pages.
  List<SolidFramework.Pdf.Plumbing.PdfPage> Pages =
    new List<SolidFramework.Pdf.Plumbing.PdfPage>(doc.Catalog.Pages.PageCount);

  SolidFramework.Pdf.Catalog catalog =
    (SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);

  SolidFramework.Pdf.Plumbing.PdfPages pages =
    (SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;

  ProcessPages(ref pages, ref Pages)

Then if a page range is specified, parse the argument into page number integers. For each page that is specified, or all if not specified.

  // Check for page ranges
  PageRange ranges = null;
  bool bHaveRanges = false;
  if (!string.IsNullOrEmpty(pagerange))
  {
    bHaveRanges = PageRange.TryParse(pagerange, out ranges);
  }

  if (bHaveRanges)
  {
    int[] pageArray = ranges.ToArray();
    foreach (int number in pageArray)
    {
      CreateImageFromPage(Pages[number], dpi, filename, number, extension, format);
      Console.WriteLine(string.Format("Processed page {0} of {1}", number,
      Pages.Count));
    }
  }
  else
  {
    // For each page, save off a file.
    int pageIndex = 0;
    foreach (SolidFramework.Pdf.Plumbing.PdfPage page in Pages)
    {
      // Update the page number.
      pageIndex++;

      CreateImageFromPage(page, dpi, filename, pageIndex, extension, format);
      Console.WriteLine(string.Format("Processed page {0} of {1}", pageIndex,
      Pages.Count));
    }
  }

We load each requested Page object and request a bitmap from that object. We then request that the bitmap object save itself to a file in the output directory with the requested ImageFormat type.

  private static void   CreateImageFromPage(SolidFramework.Pdf.Plumbing.PdfPage page,
    int dpi, string filename, int pageIndex, string extension,
    System.Drawing.Imaging.ImageFormat format)
  {
    // Create a bitmap from the page with set dpi.
    Bitmap bm = page.DrawBitmap(dpi);

    // Setup the filename.
    string filepath = string.Format(filename + "-{0}.{1}", pageIndex, extension);
    // If the file exits already, delete it. I.E. Overwrite it.
    if (File.Exists(filepath))
      File.Delete(filepath);

    // Save the file.
    bm.Save(filepath, format);

    // Cleanup.
    bm.Dispose();
  }

And there you have it. The requested images should have been created in the specified output directory. Since we are using the free developer trial license, each page image will have a watermark at the bottom if the page. To remove this watermark, read more about an annual license for the Solid Framework Tools Edition here ($250 or $500 per year depending on distribution, no royalties).

Have any thoughts that you'd like to share? Please contact us with your feedback.