marshal vs pickle - Python
This is a discussion on marshal vs pickle - Python ; The documentation for marshal makes it clear that there are no
guarantees about being able to correctly deserialize marshalled data
structures across Python releases. It also implies that marshal is not
a general "persistence" module. On the other hand, the ...
-
marshal vs pickle
The documentation for marshal makes it clear that there are no
guarantees about being able to correctly deserialize marshalled data
structures across Python releases. It also implies that marshal is not
a general "persistence" module. On the other hand, the documentation
seems to imply that marshalled objects act more or less like pickled
objects.
Can anyone elaborate more on the difference between marshal and
pickle. In what conditions would using marshal be unsafe? If one can
guarantee that the marshalled objects would be created and read by the
same version of Python, is that enough?
--
Evan Klitzke <evan@yelp.com>
-
Re: marshal vs pickle
Evan Klitzke wrote:
> Can anyone elaborate more on the difference between marshal and
> pickle. In what conditions would using marshal be unsafe? If one
> can guarantee that the marshalled objects would be created and
> read by the same version of Python, is that enough?
Just use pickle. From the docs:
| The marshal module exists mainly to support reading and writing
| the ``pseudo-compiled'' code for Python modules of .pyc files.
| Therefore, the Python maintainers reserve the right to modify the
| marshal format in backward incompatible ways should the need
| arise. If you're serializing and de-serializing Python objects,
| use the pickle module instead.
Regards,
Björn
--
BOFH excuse #421:
Domain controller not responding
-
Re: marshal vs pickle
On Oct 31, 3:31 am, "Evan Klitzke" <e...@yelp.com> wrote:
> Can anyone elaborate more on the difference between marshal and
> pickle. In what conditions would using marshal be unsafe? If one can
> guarantee that the marshalled objects would be created and read by the
> same version of Python, is that enough?
Yes, I think that's enough. I like to use
marshal a lot because it's the absolutely fastest
way to store and load data to/from Python. Furthermore
because marshal is "stupid" the programmer has complete
control. A lot of the overhead you get with the
pickles which make them generally much slower than
marshal come from the cleverness by which pickle will
recognized shared objects and all that junk. When I
serialize, I generally don't need
that because I know what I'm doing.
For example both gadfly SQL
http://gadfly.sourceforge.net
and nucular full text/fielded search
http://nucular.sourceforge.net
use marshal as the underlying serializer. Using cPickle
would probably make serialization worse than 2x slower.
This is one of the 2 or 3 key tricks which make these
packages as fast as they are.
-- Aaron Watters
===
http://www.xfeedme.com/nucular/gut.p...TEXT=halloween
-
Re: marshal vs pickle
On Oct 31, 6:45 am, Aaron Watters <aaron.watt...@gmail.com> wrote:
> I like to use
> marshal a lot because it's the absolutely fastest
> way to store and load data to/from Python. Furthermore
> because marshal is "stupid" the programmer has complete
> control. A lot of the overhead you get with the
> pickles which make them generally much slower than
> marshal come from the cleverness by which pickle will
> recognized shared objects and all that junk. When I
> serialize,
I believe this FUD is somewhat out-of-date. Marshalling
became smarter about repeated and shared objects. The
pickle module (using mode 2) has a similar implementation
to marshal and both use the same tricks, but pickle is
much more flexible in the range of objects it can handle
(i.e. sets became marshalable only recently while deques
can pickle but not marshal)
For the most part, users are almost always better-off
using pickle which is version independent, fast, and
can handle many more types of objects than marshal.
Also FWIW, in most applications of pickling/marshaling,
the storage or tranmission times dominate computation
time. I've gotten nice speed-ups by zipping the pickle
before storing, transmitting, or sharing (RPC apps
for example).
Raymond
-
Re: marshal vs pickle
On Oct 31, 1:37 pm, Raymond Hettinger <pyt...@rcn.com> wrote:
> On Oct 31, 6:45 am, Aaron Watters <aaron.watt...@gmail.com> wrote:
>
> > I like to use
> > marshal a lot because it's the absolutely fastest
> > way to store and load data to/from Python....
>
> I believe this FUD is somewhat out-of-date. Marshalling
> became smarter about repeated and shared objects. The
> pickle module (using mode 2) has a similar implementation
> to marshal
Raymond: happy days! We are both right!
I just ran some tests from the test suite for
http://nucular.sourceforge.net with marshalling
and pickling switched in and out and to my
surprise I didn't find too much difference
on the "load" end (marshalling 10% faster),
but for the "bigLtreeTest.py" I found that
the build ("dump") process was about 1/3
slower with cPickle (mode 2/python2.4). For
the more complex tests (mondial and gutenberg)
I found that the speed up for using marshal was
in the 1-2% range (and sometimes inverted
because of processor load I think, on a shared
hosting machine).
I'm pretty sure things were much worse for cPickle
many moons ago. Nice to see that some things
get better
. It makes sense that the
"dump" side would be slower because that's
where you need to remember all the objects
in case you see them again...
Anyway since it's easy and makes sense I think
the next version of nucular will have a
switchable option between marshal and cPickle
for persistant storage.
Thanks! -- Aaron Watters
===
The pursuit of hypothetical performance
improvements is the root of all evil.
-- Bill Tutt
http://www.xfeedme.com/nucular/pydis...?FREETEXT=tutt
-
Re: marshal vs pickle
On Oct 31, 12:27 pm, Aaron Watters <aaron.watt...@gmail.com> wrote:
> Anyway since it's easy and makes sense I think
> the next version of nucular will have a
> switchable option between marshal and cPickle
> for persistant storage.
Makes more sense to use cPickle and be done with it.
FWIW, I've updated the docs to be absolutely clear on the subject:
'''
This is not a general "persistence" module. For general persistence
and
transfer of Python objects through RPC calls, see the
modules :mod:`pickle` and
:mod:`shelve`. The :mod:`marshal` module exists mainly to support
reading and
writing the "pseudo-compiled" code for Python modules of :file:`.pyc`
files.
Therefore, the Python maintainers reserve the right to modify the
marshal format
in backward incompatible ways should the need arise. If you're
serializing and
de-serializing Python objects, use the :mod:`pickle` module instead --
the
performance is comparable, version independence is guaranteed, and
pickle
supports a substantially wider range of objects than marshal.
... warning::
The :mod:`marshal` module is not intended to be secure against
erroneous or
maliciously constructed data. Never unmarshal data received from
an
untrusted or unauthenticated source.
Not all Python object types are supported; in general, only objects
whose value
is independent from a particular invocation of Python can be written
and read by
this module. The following types are supported: ``None``, integers,
long
integers, floating point numbers, strings, Unicode objects, tuples,
lists,
dictionaries, and code objects, where it should be understood that
tuples, lists
and dictionaries are only supported as long as the values contained
therein are
themselves supported; and recursive lists and dictionaries should not
be written
(they will cause infinite loops).
... warning::
Some unsupported types such as subclasses of builtins will appear
to marshal
and unmarshal correctly, but in fact, their type will change and
the
additional subclass functionality and instance attributes will be
lost.
... warning::
On machines where C's ``long int`` type has more than 32 bits (such
as the
DEC Alpha), it is possible to create plain Python integers that are
longer
than 32 bits. If such an integer is marshaled and read back in on a
machine
where C's ``long int`` type has only 32 bits, a Python long integer
object
is returned instead. While of a different type, the numeric value
is the
same. (This behavior is new in Python 2.2. In earlier versions,
all but the
least-significant 32 bits of the value were lost, and a warning
message was
printed.)
'''
-
Re: marshal vs pickle
En Wed, 31 Oct 2007 19:10:48 -0300, Raymond Hettinger <python@rcn.com>
escribió:
> FWIW, I've updated the docs to be absolutely clear on the subject:
As you are into it, the list of supported types should be updated too:
> The following types are supported: ``None``, integers,
> long
> integers, floating point numbers, strings, Unicode objects, tuples,
> lists,
> dictionaries, and code objects,
boolean, complex, set and frozenset are missing.
--
Gabriel Genellina
-
Re: marshal vs pickle
Raymond Hettinger <python@rcn.com> writes:
> ''' This is not a general "persistence" module. For general
> persistence and transfer of Python objects through RPC calls, see
> the modules :mod:`pickle` and :mod:`shelve`.
That advice should be removed since Python currently does not have a
general persistence or transfer module in its stdlib. There's been an
open bug/RFE about it for something like 5 years. The issue is that
any sensible general purpose RPC mechanism MUST make reasonable
security assertions that nothing bad happens if you deserialize
untrusted data. The pickle module doesn't make such guarantees and in
fact its documentation explicitly warns against unpickling untrusted
data. Therefore pickle should not be used as a general RPC
mechanism.
-
Re: marshal vs pickle
On Nov 1, 12:04 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Raymond Hettinger <pyt...@rcn.com> writes:
> > ''' This is not a general "persistence" module. For general
> > persistence and transfer of Python objects through RPC calls, see
> > the modules :mod:`pickle` and :mod:`shelve`.
>
> That advice should be removed since Python currently does not have a
> general persistence or transfer module in its stdlib. There's been an
> open bug/RFE about it for something like 5 years. The issue is that
> any sensible general purpose RPC mechanism MUST make reasonable
> security assertions that nothing bad happens if you deserialize
> untrusted data. The pickle module doesn't make such guarantees and in
> fact its documentation explicitly warns against unpickling untrusted
> data. Therefore pickle should not be used as a general RPC
> mechanism.
This is absolutely correct. Marshal is more secure than pickle
because marshal *cannot* execute code automatically whereas pickle
does. The assertion that marshal is less secure than pickle is
absurd.
This is exactly why the gadfly server mode uses marshal and not
pickle.
-- Aaron Watters
===
why do you hang out with that sadist?
beats me! -- kliban
-
Re: marshal vs pickle
On Oct 31, 6:10 pm, Raymond Hettinger <pyt...@rcn.com> wrote:
> On Oct 31, 12:27 pm, Aaron Watters <aaron.watt...@gmail.com> wrote:
>
> Makes more sense to use cPickle and be done with it.
>
> FWIW, I've updated the docs to be absolutely clear on the subject:
>
> '''
> This is not a general "persistence" module. For general persistence
> and...
Alright already. Here is the patched file you want
http://nucular.sourceforge.net/kisstree_pickle.py
This will make all your nucular indices portable across python
versions and machine architectures. I'll add this to the
next release with a bunch of other stuff too.
By the way there is another module that uses marshal for
strictly temporary storage in http://nucular.sourceforge.net
-- but if I change that one the build time for nucular indices
fully DOUBLES!! That's too much pain for me. Sorry.
Also, it's always been a mystery to me why Python can't
keep the marshal module backwards compatible and portable.
You folks seem like pretty smart programmers to me. If
you need help, let me know. It's a damn shame Python doesn't
have a serialization module with the safety, speed, and
simplicity of marshal and also the portability of pickle.
I guess I have to live with it
.
-- Aaron Watters
===
Wow, do you play basketball?
No, do you play miniature golf?
-- seen in Newsweek years ago
Similar Threads
-
By Application Development in forum Python
Replies: 0
Last Post: 11-01-2007, 04:34 PM
-
By Application Development in forum Python
Replies: 0
Last Post: 11-01-2007, 03:59 PM
-
By Application Development in forum Python
Replies: 3
Last Post: 10-04-2007, 12:27 PM
-
By Application Development in forum Python
Replies: 1
Last Post: 07-07-2007, 12:59 PM