MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeListenumerator?? - XML SOAP

This is a discussion on MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeListenumerator?? - XML SOAP ; When walking a large DomNodeList using the IEnumVariant interface (I've included VB code below, but obviously it's the same in C++ using COM enumerators), the cost of going to the next element seems to be proportional to the TOTAL length ...

+ Reply to Thread
Results 1 to 2 of 2

MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeListenumerator??

  1. Default MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeListenumerator??

    When walking a large DomNodeList using the IEnumVariant interface
    (I've included VB code below, but obviously it's the same in C++ using
    COM enumerators), the cost of going to the next element seems to be
    proportional to the TOTAL length of the list!! Is this right?

    I haven't found this mentioned anywhere, maybe I'm just not googling
    right

    i.e. walking the first 1000 nodes of a 2000 node list is massively
    cheaper than walking the first 1000 nodes of a 20000 node list. - in
    practice this means if you're traversing the whole list your cost is
    n^2, which on the large lists I'm dealing with is causing a bit of
    pain....

    This is pretty confusing - it doesn't look like the list is being
    generated lazily (and even if it was I'd not expect linear behaviour
    like this for accessing a prefix) - it almost looks like it's re-
    evaluating the predicate each iteration step!

    Anyone know anything about this? Behaviour doesn't seem to differ
    between msxml4 / 6...

    Thanks,

    Lee.

    -----8<---------- sample VBA to demo, needs ref to msxml4/6.

    Private Declare Function QueryPerformanceCounter& Lib
    "kernel32.dll" (l As pbdw)
    Private Declare Function QueryPerformanceFrequency& Lib
    "kernel32.dll" (l As pbdw)

    Private Type pbdw
    bottom As Long
    top As Long
    End Type

    Public Function tickspers() As Double
    Static l As Long
    If l = 0 Then
    Dim ll As pbdw
    Call QueryPerformanceFrequency(ll)
    l = ll.bottom
    End If
    ticksperms = l
    End Function

    Public Function ticks() As Double
    Dim ll As pbdw
    Call QueryPerformanceCounter(ll)
    ticks = ll.bottom
    End Function

    Private Sub add_node(ByRef s As String)
    s = s & "<node><a>1</a><b>2</b></node>"
    End Sub

    Private Sub check_timing(ByVal s As String)
    s = s & "</doc>"

    Dim d As DOMDocument
    Set d = New DOMDocument40 ' or domdocument60,
    If Not d.loadXML(s) Then Call Err.Raise(10101, , "Didn't load")

    Dim n As IXMLDOMNode
    Set n = d.documentElement
    Dim dn As IXMLDOMNodeList

    Set dn = n.selectNodes("node")
    If dn.Length < 1000 Then Call Err.Raise(10102, , "Odd...")

    Dim dstart As Double, dstop As Double
    dstart = ticks()

    Dim i As Long
    i = 0
    Dim count As Long
    For Each n In dn
    i = i + 1
    count = count + Len(n.Text)
    If (i = 1000) Then
    GoTo breakout
    End If
    Next n
    breakout:
    dstop = ticks()
    Debug.Print Join(Array(dn.Length, (dstop - dstart) / tickspers,
    count), " , ")
    End Sub

    Public Sub test()
    Dim doc As String
    doc = "<doc>"
    Dim l As Long
    For l = 1 To 1000
    Call add_node(doc)
    Next l

    Dim step_ As Long
    step_ = 1000

    Dim stepstr As String
    Dim x As Long
    For x = 1 To step_
    Call add_node(stepstr)
    Next x

    For l = 1000 To 40000 Step step_
    Call check_timing(doc)
    doc = doc & stepstr
    Next l
    End Sub



    ----8<---------- sample output #nodes in total list, #s to iterate
    first 1000, #chars in first 100 nodes

    1000 , 1.12989220697044E-02 , 2000
    2000 , 2.13613741411269E-02 , 2000
    3000 , 3.05105816521374E-02 , 2000
    4000 , 4.01073320771215E-02 , 2000
    5000 , 4.96102158235195E-02 , 2000
    6000 , 5.94773916796688E-02 , 2000
    7000 , 6.90878309952801E-02 , 2000
    8000 , 7.86281496670666E-02 , 2000
    9000 , 8.94758970763044E-02 , 2000
    10000 , 9.86164442687548E-02 , 2000
    11000 , 0.107572331120296 , 2000
    12000 , 0.118514783303465 , 2000
    13000 , 0.128915267163843 , 2000
    14000 , 0.137097033282163 , 2000
    15000 , 0.147874939412691 , 2000
    16000 , 0.15577845787663 , 2000
    17000 , 0.167114814871723 , 2000
    18000 , 0.180319007024636 , 2000
    19000 , 0.185794842640615 , 2000
    20000 , 0.196314336039916 , 2000
    21000 , 0.206365613506745 , 2000
    22000 , 0.215159747956793 , 2000
    23000 , 0.233987280506321 , 2000
    24000 , 0.2455398661003 , 2000
    25000 , 0.2592050106927 , 2000
    26000 , 0.282794880354905 , 2000
    27000 , 0.318379849952997 , 2000
    28000 , 0.307542997783238 , 2000
    29000 , 0.345976374092238 , 2000
    30000 , 0.35645787383592 , 2000
    31000 , 0.349666787259275 , 2000
    32000 , 0.394941535865592 , 2000
    33000 , 0.407140013605081 , 2000
    34000 , 0.371464250344667 , 2000
    35000 , 0.395082615248586 , 2000
    36000 , 0.420603177219451 , 2000
    37000 , 0.48740580157534 , 2000
    38000 , 0.500694361992935 , 2000
    39000 , 0.527952574978105 , 2000
    40000 , 0.522535685401357 , 2000

  2. Default Re: MSXML - LINEAR cost taking a SINGLE step of IXMLDomNodeList enumerator??


    "lab27" <lee.benfield@gmail.com> wrote in message
    news:795f1bf0-544a-481f-ba38-2649a6f67728@p69g2000hsa.googlegroups.com...
    > When walking a large DomNodeList using the IEnumVariant interface
    > (I've included VB code below, but obviously it's the same in C++ using
    > COM enumerators), the cost of going to the next element seems to be
    > proportional to the TOTAL length of the list!! Is this right?
    >
    > I haven't found this mentioned anywhere, maybe I'm just not googling
    > right
    >
    > i.e. walking the first 1000 nodes of a 2000 node list is massively
    > cheaper than walking the first 1000 nodes of a 20000 node list. - in
    > practice this means if you're traversing the whole list your cost is
    > n^2, which on the large lists I'm dealing with is causing a bit of
    > pain....
    >
    > This is pretty confusing - it doesn't look like the list is being
    > generated lazily (and even if it was I'd not expect linear behaviour
    > like this for accessing a prefix) - it almost looks like it's re-
    > evaluating the predicate each iteration step!
    >
    > Anyone know anything about this? Behaviour doesn't seem to differ
    > between msxml4 / 6...
    >
    > Thanks,
    >
    > Lee.
    >
    > -----8<---------- sample VBA to demo, needs ref to msxml4/6.
    >
    > Private Declare Function QueryPerformanceCounter& Lib
    > "kernel32.dll" (l As pbdw)
    > Private Declare Function QueryPerformanceFrequency& Lib
    > "kernel32.dll" (l As pbdw)
    >
    > Private Type pbdw
    > bottom As Long
    > top As Long
    > End Type
    >
    > Public Function tickspers() As Double
    > Static l As Long
    > If l = 0 Then
    > Dim ll As pbdw
    > Call QueryPerformanceFrequency(ll)
    > l = ll.bottom
    > End If
    > ticksperms = l
    > End Function
    >
    > Public Function ticks() As Double
    > Dim ll As pbdw
    > Call QueryPerformanceCounter(ll)
    > ticks = ll.bottom
    > End Function
    >
    > Private Sub add_node(ByRef s As String)
    > s = s & "<node><a>1</a><b>2</b></node>"
    > End Sub
    >
    > Private Sub check_timing(ByVal s As String)
    > s = s & "</doc>"
    >
    > Dim d As DOMDocument
    > Set d = New DOMDocument40 ' or domdocument60,
    > If Not d.loadXML(s) Then Call Err.Raise(10101, , "Didn't load")
    >
    > Dim n As IXMLDOMNode
    > Set n = d.documentElement
    > Dim dn As IXMLDOMNodeList
    >
    > Set dn = n.selectNodes("node")
    > If dn.Length < 1000 Then Call Err.Raise(10102, , "Odd...")
    >
    > Dim dstart As Double, dstop As Double
    > dstart = ticks()
    >
    > Dim i As Long
    > i = 0
    > Dim count As Long
    > For Each n In dn
    > i = i + 1
    > count = count + Len(n.Text)
    > If (i = 1000) Then
    > GoTo breakout
    > End If
    > Next n
    > breakout:
    > dstop = ticks()
    > Debug.Print Join(Array(dn.Length, (dstop - dstart) / tickspers,
    > count), " , ")
    > End Sub
    >
    > Public Sub test()
    > Dim doc As String
    > doc = "<doc>"
    > Dim l As Long
    > For l = 1 To 1000
    > Call add_node(doc)
    > Next l
    >
    > Dim step_ As Long
    > step_ = 1000
    >
    > Dim stepstr As String
    > Dim x As Long
    > For x = 1 To step_
    > Call add_node(stepstr)
    > Next x
    >
    > For l = 1000 To 40000 Step step_
    > Call check_timing(doc)
    > doc = doc & stepstr
    > Next l
    > End Sub
    >
    >



    The culprit is this line in your check_timing routine:-

    count = count + Len(n.Text)

    Specifically the use of n.Text. It also isn't related to the number of
    nodes selected try this change:-

    Set dn = n.selectNodes("node[position() < 1001]")

    You'll notice that dn.Length is consitently 1000 yet the growth in time
    taken is still proportional to the full set of nodes. IOW, the cost of
    retrieving .Text is proportional to the size of document that the node
    belongs to. You can try this change to keep it to a single simple element:-

    Set dn = n.selectNodes("node[position() < 1001]/a")

    It's still bad. Take it a step further lets get the text node itself:-

    Set dn = n.selectNodes("node[position() < 1001]/a/text()")

    Nope still just as bad.

    Now make this change

    count = count + Len(n.nodeValue)

    Suddenly it much quicker and consistent. Make these changes:-

    Set dn = n.selectNodes("node[position() < 1001]/a")

    count = count + Len(n.firstChild.nodeValue)

    Again consistent, lets go back to selecting the node and build the text
    string ourselves.

    Set dn = n.selectNodes("node[position() < 1001]")

    count = count + Len(n.firstChild.firstChild.nodeValue &
    n.firstChild.nextSibling.nodeValue)

    Ugly but still a couple of orders of magnitude faster than .Text and
    consistant despite document size.

    Finally remove the position predicate. Just as fast.


    Why would using the Text property be so costly? I haven't got the foggiest,
    very odd and annoying.


    --
    Anthony Jones - MVP ASP/ASP.NET



+ Reply to Thread

Similar Threads

  1. Re: Pyrex installation on windows XP: step-by-step guide
    By Application Development in forum Python
    Replies: 0
    Last Post: 09-26-2007, 03:53 AM
  2. Replies: 0
    Last Post: 01-12-2007, 02:55 AM
  3. Replies: 1
    Last Post: 09-17-2006, 12:12 AM
  4. Replies: 1
    Last Post: 09-07-2004, 10:50 PM
  5. Re: Step by step procedure to install Exchange in cluster
    By Application Development in forum Microsoft Exchange
    Replies: 0
    Last Post: 10-01-2003, 05:20 PM