[Ipe-discuss] Re: Plot data extraction from PDF

T T t34www at googlemail.com
Tue Feb 16 18:40:11 CET 2010


On 28 December 2009 15:49, T T <t34www at googlemail.com> wrote:
> Hi,
>
> I'm in need of a tool that can extract XY data from (vector!) plots in
> PDF documents. Failing to find such a tool I'm considering writing one
> and Ipe looks attractive, because:
> - it can read PDF files (with pdftoipe tool)
> - it is scriptable with Lua and I happen to know this language a little bit
>
> As I'm unfamiliar with implementation and object hierarchy of Ipe, I
> would appreciate some guidance in how to implement a plug-in with the
> following methods:
> (1) Coordinate system: establish the plot coordinate system from two
> selected marks (if not given use canvas coordinate system)
> (2) Line plot: get points from selected path(s), transform them to the
> plot coordinate system and display in the text box
> (3) Scatter plot: given a selected path, find all other paths that
> differ only by translation and display "center of mass" points of
> those paths in the text box.

I implemented (1) and (2) from the above a couple of weeks ago but had
no time to write about it earlier. Even though the implementation is
quite basic at this point, I think it is quite usable already. I
attach the resulting ipelet in hope that others find it useful as
well.

I would like to thank the author of Ipe for providing such a flexible
tool. If not for Lua plug-in system, I would have never ventured into
writing this sort of extension -- it would be just too time consuming.
My thanks also go to Jan Hlavacek for his 'plots' ipelet, which proved
very helpful in implementing coordinate system transformation.

Although, 'digitize-plot' is intended mainly for vector plots, it can
actually be used also for raster plots. Just insert an image with a
plot into an Ipe document, put markers at interesting points, draw a
path connecting them and the coordinates of this path can be then
digitized as in the vector plot case. This is not an automated
process, but majority of other software for that purpose is no
different in this regard.

Currently, only straight line type plots are supported, but I would
like to add support for scatter plots as well when I find some spare
time but I'm still stuck with how to implement (3). How can I compare
if two objects/paths have the same structure and attributes except for
their coordinates? It seems, that this would require a separate
comparison code/routine for each object type, as I couldn't find any
generic method for traversing object structure. Is that right? Also,
what types of objects can I expect after pdf -> ipe conversion? I
would like to focus only on those objects at first.

Cheers,

Tomek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: digitize-plot.lua
Type: application/octet-stream
Size: 7528 bytes
Desc: not available
Url : http://lists.science.uu.nl/pipermail/ipe-discuss/attachments/20100216/f9828ae9/attachment.obj 


More information about the Ipe-discuss mailing list